Rate Limits

Official WatchLLM documentation for rate limits.

Rate Limits#

WatchLLM enforces rate limits at multiple layers to ensure fair usage and protect infrastructure.

Plan-Based Rate Limits#

Each plan has specific rate limits and monthly quotas:

Plan Price Requests/Minute Monthly Quota Data Retention
Free $0/mo 10 rpm 50,000 requests 7 days
Starter $29/mo 50 rpm 250,000 requests 30 days
Pro $49/mo 200 rpm 1,000,000 requests 90 days

How Rate Limiting Works#

Per-Key Rate Limiting#

Rate limits are applied per API key using a sliding window algorithm:

  1. Each API key has a Redis counter: ratelimit:{keyId}:minute
  2. The counter increments with each request
  3. The window resets every 60 seconds
  4. When the limit is reached, requests return 429 Too Many Requests

IP-Based Rate Limiting#

An additional defense-in-depth layer limits requests per IP address:

  • 120 requests/minute per IP
  • 30 requests/10 seconds burst limit
  • IPs are hashed (SHA-256) for privacy
  • After 5 violations in 5 minutes → 5-minute IP block

Rate Limit Response Headers#

Every response includes rate limit information:

HTTP/1.1 200 OK
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1706054460

When rate limited:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706054460
 
{
  "error": {
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Monthly Quota#

In addition to per-minute rate limits, each plan has a monthly request quota:

  • Counted requests: All requests that hit the proxy (including cache hits)
  • Quota reset: First day of each calendar month at 00:00 UTC
  • Overage behavior:
    • Free plan: Requests are blocked after quota is reached
    • Starter plan: $0.50 per 1,000 additional requests (up to 200k overage)
    • Pro plan: $0.40 per 1,000 additional requests (up to 750k overage)

Handling Rate Limits#

import OpenAI from 'openai';
 
const client = new OpenAI({
  apiKey: 'lgw_proj_your_key',
  baseURL: 'https://proxy.watchllm.dev/v1',
  maxRetries: 3, // Built-in retry with exponential backoff
});
 
// The SDK automatically retries on 429 responses
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Custom Retry Logic#

async function withRetry(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const retryAfter = error.headers?.['retry-after'] || 60;
        console.log(`Rate limited. Retrying in ${retryAfter}s...`);
        await new Promise(r => setTimeout(r, retryAfter * 1000));
      } else {
        throw error;
      }
    }
  }
}

Python Retry Example#

import time
from openai import OpenAI, RateLimitError
 
client = OpenAI(
    api_key="lgw_proj_your_key",
    base_url="https://proxy.watchllm.dev/v1",
    max_retries=3,  # Built-in retry
)
 
# Or handle manually:
def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
        except RateLimitError as e:
            if attempt < max_retries - 1:
                wait = int(e.response.headers.get("retry-after", 60))
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise

Best Practices#

  1. Use exponential backoff — Don't retry immediately after a 429
  2. Cache client-side too — Avoid sending duplicate requests
  3. Monitor your usage — Check the dashboard analytics for usage trends
  4. Batch requests — Group multiple prompts when possible
  5. Upgrade your plan — If you consistently hit limits, consider upgrading

Need Higher Limits?#

If your use case requires higher rate limits or quotas beyond the Pro plan, contact us to discuss Enterprise options with custom limits.

© 2026 WatchLLM. All rights reserved.