Getting Started with WatchLLM#

Welcome to WatchLLM! This guide will help you integrate semantic caching into your LLM application in under 5 minutes.

What is WatchLLM?#

WatchLLM is an edge proxy that sits between your application and AI providers (OpenAI, Anthropic, Groq). It automatically caches repetitive queries, reducing costs by 30-50% without code changes.

Quick Start (2 Minutes)#

Go to watchllm.dev and sign up
Create a new project in the dashboard
Generate an API key (starts with lgw_proj_)

2. Choose Your Integration Method#

Option A: Bring Your Own Key (BYOK) (Recommended)

You keep your existing OpenAI/Anthropic/Groq accounts
WatchLLM acts as a caching layer
You control billing directly with providers

Option B: OpenRouter

Use OpenRouter for unified access to multiple models
Single integration point
Simpler setup for prototyping

3. Update Your Code (3 Lines)#

Node.js Example (BYOK):

import OpenAI from "openai";
 
const client = new OpenAI({
  apiKey: "lgw_proj_YOUR_KEY_HERE",  // Your WatchLLM key
  baseURL: "https://proxy.watchllm.dev/v1"  // Point to proxy
});
 
// Use normally - caching is automatic
const response = await client.chat.completions.create({
  model: "gpt-4o",  // Direct provider model name
  messages: [{ role: "user", content: "Hello!" }],
});

Python Example (BYOK):

from openai import OpenAI
 
client = OpenAI(
    api_key="lgw_proj_YOUR_KEY_HERE",
    base_url="https://proxy.watchllm.dev/v1"
)
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

4. Configure Provider Keys (BYOK Only)#

If using BYOK, add your provider API keys in the dashboard:

Go to Settings → AI Providers
Add your OpenAI key (starts with sk-)
Add your Anthropic key (starts with sk-ant-)
Add your Groq key (starts with gsk_)

That's it! Your requests now automatically cache at the edge.

How Caching Works#

WatchLLM uses semantic similarity to match queries:

"What is Python?" and "Explain Python programming" → Cache Hit
"What is Python?" and "What is JavaScript?" → Cache Miss

You can adjust the similarity threshold in project settings.

View Savings & Analytics#

Go to Dashboard → Analytics
See real-time metrics:
- Total requests vs cached requests
- Money saved (estimated)
- Cache hit rate
- Average latency

Next Steps#

Code Examples - More integration patterns
API Reference - Complete endpoint documentation
Troubleshooting - Common issues & fixes
BYOK Setup - Detailed BYOK configuration guide

Need Help?#

Email: kiwi092020@gmail.com
Discord: Join our community (link in dashboard)
GitHub: Report issues or contribute

Pro Tip: Start with the free tier (50k requests/month) to validate savings before upgrading. Most users see 30-40% cost reduction in the first week.

Getting Started with WatchLLM