# Performance & Rate Limiting

Optimize your VBAPI integration with rate limiting strategies, performance monitoring, and scaling best practices for production deployments.

Performance Disclaimer
All code examples on this page are illustrative pseudo-code and are not intended for direct production use.

## Understanding Rate Limits

VBAPI enforces throttling using a **token bucket–style algorithm** to ensure fair usage and system stability.

### Default Limits

- **25 requests per second** sustained rate
- **50 requests per second** burst rate
- **Additional bursts** allowed within internal smoothing windows
- **Per-client limits** based on API key


When limits are exceeded, VBAPI returns `429 Too Many Requests`.

## Performance Optimization Strategies

### Use Batch Operations


```python
# Instead of multiple individual creates:
for county in counties:
    create_county(county)  # 10 requests

# Use batch create:
create_counties_batch(counties)  # 1 request
```

### Implement Efficient Pagination


```python
def get_all_members_efficiently(page_size=500):
    """Efficiently retrieve all members with optimal page size"""
    members = []
    page = 1
    
    while True:
        response = api_client.list_members(
            page=page, 
            pageSize=page_size,
        )
        
        batch = response['data']
        if not batch:
            break
            
        members.extend(batch)
        page += 1
        
        # Respect rate limits
        time.sleep(0.1)  # 10 requests per second
    
    return members
```

## Rate Limiting Strategies

### Adaptive Rate Limiting


```python
import time
from collections import deque

class AdaptiveRateLimiter:
    """Smart rate limiter that adapts to API responses"""
    
    def __init__(self, initial_rate=20):
        self.current_rate = initial_rate  # requests per second
        self.request_times = deque()
        self.last_429 = None
        self.consecutive_success = 0
        
    def wait_if_needed(self):
        """Wait if necessary to respect current rate limit"""
        now = time.time()
        
        # Remove old request times (older than 1 second)
        while self.request_times and self.request_times[0] <= now - 1:
            self.request_times.popleft()
        
        # If we're at the limit, wait
        if len(self.request_times) >= self.current_rate:
            sleep_time = 1 - (now - self.request_times[0])
            if sleep_time > 0:
                time.sleep(sleep_time)
        
        self.request_times.append(time.time())
    
    def handle_429_response(self):
        """Reduce rate when we get 429"""
        self.last_429 = time.time()
        self.current_rate = max(1, int(self.current_rate * 0.5))
        self.consecutive_success = 0
        print(f"Rate limited! Reducing to {self.current_rate} req/sec")
    
    def handle_success_response(self):
        """Gradually increase rate on success"""
        self.consecutive_success += 1
        
        if (self.last_429 and 
            time.time() - self.last_429 > 60 and 
            self.consecutive_success >= 20):
            # Gradually increase rate after 1 minute of no 429s
            self.current_rate = min(25, self.current_rate + 1)
            self.last_429 = None
            self.consecutive_success = 0

# Usage
rate_limiter = AdaptiveRateLimiter()

def make_api_call(url, headers, data):
    rate_limiter.wait_if_needed()
    
    response = requests.post(url, headers=headers, json=data)
    
    if response.status_code == 429:
        rate_limiter.handle_429_response()
        raise RateLimitExceeded()
    else:
        rate_limiter.handle_success_response()
    
    return response
```

### Parallel Processing with Rate Limiting


```python
import asyncio
import aiohttp
from asyncio import Semaphore

class AsyncVBAPIClient:
    def __init__(self, max_concurrent=10, requests_per_second=20):
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = asyncio.create_task(self._rate_limiter(requests_per_second))
        self.request_queue = asyncio.Queue()
        
    async def _rate_limiter(self, rps):
        """Release requests at specified rate"""
        while True:
            await self.request_queue.put(None)
            await asyncio.sleep(1 / rps)
    
    async def make_request(self, method, url, **kwargs):
        async with self.semaphore:
            # Wait for rate limiter
            await self.request_queue.get()
            
            async with aiohttp.ClientSession() as session:
                async with session.request(method, url, **kwargs) as response:
                    return await response.json()

# Process many requests in parallel while respecting rate limits
async def process_claims_parallel(claim_ids):
    client = AsyncVBAPIClient(max_concurrent=5, requests_per_second=20)
    
    tasks = [
        client.make_request('GET', f'/claims/{claim_id}')
        for claim_id in claim_ids
    ]
    
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return results
```

### Token Bucket Rate Limiter


```python
import time
import threading

class TokenBucketRateLimiter:
    """Token bucket rate limiter implementation"""
    
    def __init__(self, rate, capacity=None):
        self.rate = rate  # tokens per second
        self.capacity = capacity or rate * 2  # bucket size
        self.tokens = self.capacity
        self.last_update = time.time()
        self.lock = threading.Lock()
    
    def acquire(self, tokens=1):
        """Acquire tokens, blocking if necessary"""
        with self.lock:
            now = time.time()
            
            # Add tokens based on time elapsed
            elapsed = now - self.last_update
            self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
            self.last_update = now
            
            if self.tokens >= tokens:
                self.tokens -= tokens
                return True
            else:
                # Calculate wait time
                wait_time = (tokens - self.tokens) / self.rate
                return wait_time
    
    def wait_and_acquire(self, tokens=1):
        """Acquire tokens, waiting if necessary"""
        result = self.acquire(tokens)
        if result is not True:
            time.sleep(result)
            return self.acquire(tokens)
        return True

# Usage
bucket = TokenBucketRateLimiter(rate=25, capacity=50)

def rate_limited_request(url, headers, data):
    bucket.wait_and_acquire()  # Wait if necessary
    return requests.post(url, headers=headers, json=data)
```

## Increasing Rate Limits

Clients may request a Service Level Review via VBA Support for increased rate limits.

### Approval Factors:

- **Expected production load** - Documented traffic projections
- **Integration design** - Efficient use of batch operations and pagination
- **Test results** - Performance testing in lower environments
- **Business justification** - Critical business processes requiring higher throughput


### Request Process:

1. **Document current usage patterns** and bottlenecks
2. **Provide load testing results** from development/test environments
3. **Detail optimization efforts** (batching, caching, etc.)
4. **Submit request to VBA Support** with business justification


## Performance Checklist

### Pre-Production

- [ ] Load testing completed with realistic data volumes
- [ ] Rate limiting strategies implemented and tested
- [ ] Caching strategy implemented for frequently accessed data
- [ ] Error handling and retry logic tested


### Production Deployment

- [ ] Performance baselines established
- [ ] Resource limits appropriate for expected load


### Ongoing Optimization

- [ ] Regular performance reviews scheduled
- [ ] Rate limit utilization tracked
- [ ] API usage patterns analyzed for improvements


This comprehensive performance guide ensures your VBAPI integration scales efficiently and maintains optimal performance under production loads.