Developer Cheat Sheet

Official WatchLLM documentation for developer cheat sheet.

Developer Cheat Sheet#

Quick reference for integrating and managing WatchLLM.

Base URLs#

Environment Base URL
Production https://proxy.watchllm.dev/v1
Local Dev http://localhost:8787/v1

Authentication#

All requests require a Bearer token in the Authorization header.

Authorization: Bearer lgw_proj_...

Model Names#

BYOK (Bring Your Own Key) Models#

When using your own API keys, use native provider model names:

Provider Model Examples
OpenAI gpt-4o, gpt-4-turbo, gpt-3.5-turbo
Anthropic claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
Groq llama2-70b-4096, mixtral-8x7b-32768, gemma-7b-it

OpenRouter Models#

For broader access without BYOK setup:

Provider Model Examples
OpenAI openai/gpt-4o, openai/gpt-3.5-turbo
Anthropic anthropic/claude-3-opus, anthropic/claude-3-sonnet
Others meta-llama/llama-2-70b, google/gemini-pro

SDK Integration#

Node.js (OpenAI SDK)#

import OpenAI from 'openai';
 
const client = new OpenAI({
  apiKey: 'lgw_proj_your_key_here',
  baseURL: 'https://proxy.watchllm.dev/v1'
});
 
// BYOK models (requires provider keys configured)
const response = await client.chat.completions.create({
  model: 'gpt-4o', // Direct OpenAI
  messages: [{ role: 'user', content: 'Hello' }],
  temperature: 0.7
});
 
// OpenRouter models (works without BYOK)
const orResponse = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [{ role: 'user', content: 'Hello' }]
});

Python (OpenAI SDK)#

from openai import OpenAI
 
client = OpenAI(
    api_key="lgw_proj_your_key_here",
    base_url="https://proxy.watchllm.dev/v1"
)
 
# BYOK models
response = client.chat.completions.create(
    model="claude-3-sonnet-20240229",  # Direct Anthropic
    messages=[{"role": "user", "content": "Hello"}]
)
 
# OpenRouter models
or_response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

Response Headers#

WatchLLM attaches metadata to every proxy response.

Header Description
X-WatchLLM-Cache HIT, HIT-SEMANTIC, or MISS.
X-WatchLLM-Cost-USD Estimated cost of the request.
X-WatchLLM-Latency-Ms Total processing time in milliseconds.
X-WatchLLM-Provider The upstream provider (openai, anthropic, groq).
X-WatchLLM-Tokens-Saved Number of tokens served from cache.

Semantic Caching Configuration#

Threshold Settings#

Adjust similarity thresholds in project settings:

Use Case Threshold Description
Strict 0.95 Only near-identical prompts match
Balanced 0.85 Good balance of hits vs accuracy
Permissive 0.75 Catches more variations

Prompt Normalization Features#

WatchLLM automatically normalizes:

  • Case: HELLOhello
  • Whitespace: Multiple spaces → single space
  • Punctuation: Smart removal of filler punctuation
  • Questions: What is X?what is x
  • Math: 2 + 2 = ?2+2=?

Common Error Codes#

Code Status Description
invalid_api_key 401 The API key is missing, invalid, or revoked.
rate_limit_exceeded 429 Your project or IP has reached its rate limit.
insufficient_quota 403 Your monthly usage limit has been reached.
provider_error 502 Upstream AI provider returned an error.
model_not_found 404 Model name not recognized (check BYOK setup).

CLI / cURL Examples#

BYOK Request#

curl https://proxy.watchllm.dev/v1/chat/completions \
  -H "Authorization: Bearer lgw_proj_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello from BYOK"}]
  }'

OpenRouter Request#

curl https://proxy.watchllm.dev/v1/chat/completions \
  -H "Authorization: Bearer lgw_proj_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-sonnet",
    "messages": [{"role": "user", content": "Hello from OpenRouter"}]
  }'

Check Cache Status#

curl -I https://proxy.watchllm.dev/v1/chat/completions \
  -H "Authorization: Bearer lgw_proj_..." \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Test"}]}'
# Look for X-WatchLLM-Cache header in response

© 2025 WatchLLM. All rights reserved.