Rate Limits#

WatchLLM enforces rate limits at multiple layers to ensure fair usage and protect infrastructure.

Plan-Based Rate Limits#

Each plan has specific rate limits and monthly quotas:

Plan	Price	Requests/Minute	Monthly Quota	Data Retention
Free	$0/mo	10 rpm	50,000 requests	7 days
Starter	$29/mo	50 rpm	250,000 requests	30 days
Pro	$49/mo	200 rpm	1,000,000 requests	90 days

How Rate Limiting Works#

Per-Key Rate Limiting#

Rate limits are applied per API key using a sliding window algorithm:

Each API key has a Redis counter: ratelimit:{keyId}:minute
The counter increments with each request
The window resets every 60 seconds
When the limit is reached, requests return 429 Too Many Requests

IP-Based Rate Limiting#

An additional defense-in-depth layer limits requests per IP address:

120 requests/minute per IP
30 requests/10 seconds burst limit
IPs are hashed (SHA-256) for privacy
After 5 violations in 5 minutes → 5-minute IP block

Rate Limit Response Headers#

Every response includes rate limit information:

HTTP/1.1 200 OK
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1706054460

When rate limited:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706054460
 
{
  "error": {
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Monthly Quota#

In addition to per-minute rate limits, each plan has a monthly request quota:

Counted requests: All requests that hit the proxy (including cache hits)
Quota reset: First day of each calendar month at 00:00 UTC
Overage behavior:
- Free plan: Requests are blocked after quota is reached
- Starter plan: $0.50 per 1,000 additional requests (up to 200k overage)
- Pro plan: $0.40 per 1,000 additional requests (up to 750k overage)

Handling Rate Limits#

Recommended Client-Side Strategy#

import OpenAI from 'openai';
 
const client = new OpenAI({
  apiKey: 'lgw_proj_your_key',
  baseURL: 'https://proxy.watchllm.dev/v1',
  maxRetries: 3, // Built-in retry with exponential backoff
});
 
// The SDK automatically retries on 429 responses
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Custom Retry Logic#

async function withRetry(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const retryAfter = error.headers?.['retry-after'] || 60;
        console.log(`Rate limited. Retrying in ${retryAfter}s...`);
        await new Promise(r => setTimeout(r, retryAfter * 1000));
      } else {
        throw error;
      }
    }
  }
}

Python Retry Example#

import time
from openai import OpenAI, RateLimitError
 
client = OpenAI(
    api_key="lgw_proj_your_key",
    base_url="https://proxy.watchllm.dev/v1",
    max_retries=3,  # Built-in retry
)
 
# Or handle manually:
def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
        except RateLimitError as e:
            if attempt < max_retries - 1:
                wait = int(e.response.headers.get("retry-after", 60))
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise

Best Practices#

Use exponential backoff — Don't retry immediately after a 429
Cache client-side too — Avoid sending duplicate requests
Monitor your usage — Check the dashboard analytics for usage trends
Batch requests — Group multiple prompts when possible
Upgrade your plan — If you consistently hit limits, consider upgrading

Need Higher Limits?#

If your use case requires higher rate limits or quotas beyond the Pro plan, contact us to discuss Enterprise options with custom limits.

Rate Limits