Getting Started with WatchLLM#
Welcome to WatchLLM! This guide will help you integrate semantic caching into your LLM application in under 5 minutes.
What is WatchLLM?#
WatchLLM is an edge proxy that sits between your application and AI providers (OpenAI, Anthropic, Groq). It automatically caches repetitive queries, reducing costs by 30-50% without code changes.
Quick Start (2 Minutes)#
1. Sign Up & Create a Project#
- Go to watchllm.dev and sign up
- Create a new project in the dashboard
- Generate an API key (starts with
lgw_proj_)
2. Choose Your Integration Method#
Option A: Bring Your Own Key (BYOK) (Recommended)
- You keep your existing OpenAI/Anthropic/Groq accounts
- WatchLLM acts as a caching layer
- You control billing directly with providers
Option B: OpenRouter
- Use OpenRouter for unified access to multiple models
- Single integration point
- Simpler setup for prototyping
3. Update Your Code (3 Lines)#
Node.js Example (BYOK):
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "lgw_proj_YOUR_KEY_HERE", // Your WatchLLM key
baseURL: "https://proxy.watchllm.dev/v1" // Point to proxy
});
// Use normally - caching is automatic
const response = await client.chat.completions.create({
model: "gpt-4o", // Direct provider model name
messages: [{ role: "user", content: "Hello!" }],
});Python Example (BYOK):
from openai import OpenAI
client = OpenAI(
api_key="lgw_proj_YOUR_KEY_HERE",
base_url="https://proxy.watchllm.dev/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)4. Configure Provider Keys (BYOK Only)#
If using BYOK, add your provider API keys in the dashboard:
- Go to Settings → AI Providers
- Add your OpenAI key (starts with
sk-) - Add your Anthropic key (starts with
sk-ant-) - Add your Groq key (starts with
gsk_)
That's it! Your requests now automatically cache at the edge.
How Caching Works#
WatchLLM uses semantic similarity to match queries:
- "What is Python?" and "Explain Python programming" → Cache Hit
- "What is Python?" and "What is JavaScript?" → Cache Miss
You can adjust the similarity threshold in project settings.
View Savings & Analytics#
- Go to Dashboard → Analytics
- See real-time metrics:
- Total requests vs cached requests
- Money saved (estimated)
- Cache hit rate
- Average latency
Next Steps#
- Code Examples - More integration patterns
- API Reference - Complete endpoint documentation
- Troubleshooting - Common issues & fixes
- BYOK Setup - Detailed BYOK configuration guide
Need Help?#
- Email: kiwi092020@gmail.com
- Discord: Join our community (link in dashboard)
- GitHub: Report issues or contribute
Pro Tip: Start with the free tier (50k requests/month) to validate savings before upgrading. Most users see 30-40% cost reduction in the first week.