Developer Cheat Sheet#

Quick reference for integrating and managing WatchLLM.

Base URLs#

Environment	Base URL
Production	`https://proxy.watchllm.dev/v1`
Local Dev	`http://localhost:8787/v1`

Authentication#

All requests require a Bearer token in the Authorization header.

Authorization: Bearer lgw_proj_...

Model Names#

BYOK (Bring Your Own Key) Models#

When using your own API keys, use native provider model names:

Provider	Model Examples
OpenAI	`gpt-4o`, `gpt-4-turbo`, `gpt-3.5-turbo`
Anthropic	`claude-3-opus-20240229`, `claude-3-sonnet-20240229`, `claude-3-haiku-20240307`
Groq	`llama2-70b-4096`, `mixtral-8x7b-32768`, `gemma-7b-it`

OpenRouter Models#

For broader access without BYOK setup:

Provider	Model Examples
OpenAI	`openai/gpt-4o`, `openai/gpt-3.5-turbo`
Anthropic	`anthropic/claude-3-opus`, `anthropic/claude-3-sonnet`
Others	`meta-llama/llama-2-70b`, `google/gemini-pro`

SDK Integration#

Node.js (OpenAI SDK)#

import OpenAI from 'openai';
 
const client = new OpenAI({
  apiKey: 'lgw_proj_your_key_here',
  baseURL: 'https://proxy.watchllm.dev/v1'
});
 
// BYOK models (requires provider keys configured)
const response = await client.chat.completions.create({
  model: 'gpt-4o', // Direct OpenAI
  messages: [{ role: 'user', content: 'Hello' }],
  temperature: 0.7
});
 
// OpenRouter models (works without BYOK)
const orResponse = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [{ role: 'user', content: 'Hello' }]
});

Python (OpenAI SDK)#

from openai import OpenAI
 
client = OpenAI(
    api_key="lgw_proj_your_key_here",
    base_url="https://proxy.watchllm.dev/v1"
)
 
# BYOK models
response = client.chat.completions.create(
    model="claude-3-sonnet-20240229",  # Direct Anthropic
    messages=[{"role": "user", "content": "Hello"}]
)
 
# OpenRouter models
or_response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

Response Headers#

WatchLLM attaches metadata to every proxy response.

Header	Description
`X-WatchLLM-Cache`	`HIT`, `HIT-SEMANTIC`, or `MISS`.
`X-WatchLLM-Cost-USD`	Estimated cost of the request.
`X-WatchLLM-Latency-Ms`	Total processing time in milliseconds.
`X-WatchLLM-Provider`	The upstream provider (openai, anthropic, groq).
`X-WatchLLM-Tokens-Saved`	Number of tokens served from cache.

Semantic Caching Configuration#

Threshold Settings#

Adjust similarity thresholds in project settings:

Use Case	Threshold	Description
Strict	`0.95`	Only near-identical prompts match
Balanced	`0.85`	Good balance of hits vs accuracy
Permissive	`0.75`	Catches more variations

Prompt Normalization Features#

WatchLLM automatically normalizes:

Case: HELLO → hello
Whitespace: Multiple spaces → single space
Punctuation: Smart removal of filler punctuation
Questions: What is X? → what is x
Math: 2 + 2 = ? → 2+2=?

Common Error Codes#

Code	Status	Description
`invalid_api_key`	401	The API key is missing, invalid, or revoked.
`rate_limit_exceeded`	429	Your project or IP has reached its rate limit.
`insufficient_quota`	403	Your monthly usage limit has been reached.
`provider_error`	502	Upstream AI provider returned an error.
`model_not_found`	404	Model name not recognized (check BYOK setup).

CLI / cURL Examples#

BYOK Request#

curl https://proxy.watchllm.dev/v1/chat/completions \
  -H "Authorization: Bearer lgw_proj_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello from BYOK"}]
  }'

OpenRouter Request#

curl https://proxy.watchllm.dev/v1/chat/completions \
  -H "Authorization: Bearer lgw_proj_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-sonnet",
    "messages": [{"role": "user", content": "Hello from OpenRouter"}]
  }'

Check Cache Status#

curl -I https://proxy.watchllm.dev/v1/chat/completions \
  -H "Authorization: Bearer lgw_proj_..." \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Test"}]}'
# Look for X-WatchLLM-Cache header in response

Developer Cheat Sheet