Rate Limits
Understand and work within API rate limits to build reliable applications.
Overview
Rate limits protect the API from abuse and ensure fair usage for all users. Limits are applied per API key and measured in:
- Requests per minute (RPM) — How many API calls you can make
- Tokens per minute (TPM) — How many tokens you can process
- Requests per day (RPD) — Daily request quota
Default Limits
| Tier | RPM | TPM | RPD |
|---|---|---|---|
Free New accounts | 20 | 40,000 | 500 |
Pro €20+ spend | 100 | 200,000 | 5,000 |
Business €100+ spend | 500 | 1,000,000 | 25,000 |
Enterprise Contact sales | Custom | Custom | Unlimited |
Your tier automatically upgrades based on your cumulative spending. Check your current limits in the dashboard.
Rate Limit Headers
Every response includes headers to help you track your usage:
| Header | Description |
|---|---|
x-ratelimit-limit-requests | Maximum requests per minute |
x-ratelimit-limit-tokens | Maximum tokens per minute |
x-ratelimit-remaining-requests | Requests remaining this window |
x-ratelimit-remaining-tokens | Tokens remaining this window |
x-ratelimit-reset-requests | Time until request limit resets (seconds) |
x-ratelimit-reset-tokens | Time until token limit resets (seconds) |
retry-after | Seconds to wait before retrying (on 429) |
// Check rate limit headers in the response
const response = await fetch('https://api.llmhub.dev/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer your-api-key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }]
})
});
// Rate limit headers
console.log('Requests remaining:', response.headers.get('x-ratelimit-remaining-requests'));
console.log('Tokens remaining:', response.headers.get('x-ratelimit-remaining-tokens'));
console.log('Limit resets at:', response.headers.get('x-ratelimit-reset-requests'));Rate Limit Errors
When you exceed rate limits, the API returns a 429 Too Many Requests response:
{
"error": {
"message": "Rate limit exceeded. Please retry after 30 seconds.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}Retry with Exponential Backoff
Implement automatic retries with exponential backoff for resilient applications:
async function callWithRetry(
fn: () => Promise<Response>,
maxRetries = 3,
baseDelay = 1000
): Promise<Response> {
let lastError: Error | null = null;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fn();
// Success
if (response.ok) {
return response;
}
// Rate limited - wait and retry
if (response.status === 429) {
const retryAfter = response.headers.get('retry-after');
const delay = retryAfter
? parseInt(retryAfter) * 1000
: baseDelay * Math.pow(2, attempt);
console.log(`Rate limited. Retrying in ${delay}ms...`);
await sleep(delay);
continue;
}
// Other error - don't retry
throw new Error(`API error: ${response.status}`);
} catch (error) {
lastError = error as Error;
// Network error - retry with backoff
const delay = baseDelay * Math.pow(2, attempt);
console.log(`Request failed. Retrying in ${delay}ms...`);
await sleep(delay);
}
}
throw lastError || new Error('Max retries exceeded');
}
const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));Request Throttling
Proactively limit your request rate to avoid hitting limits:
class RateLimiter {
private queue: Array<() => void> = [];
private running = 0;
private maxConcurrent: number;
private minDelay: number;
private lastRequest = 0;
constructor(maxConcurrent = 5, requestsPerSecond = 10) {
this.maxConcurrent = maxConcurrent;
this.minDelay = 1000 / requestsPerSecond;
}
async execute<T>(fn: () => Promise<T>): Promise<T> {
// Wait for a slot
while (this.running >= this.maxConcurrent) {
await new Promise<void>(resolve => this.queue.push(resolve));
}
// Ensure minimum delay between requests
const now = Date.now();
const timeSinceLastRequest = now - this.lastRequest;
if (timeSinceLastRequest < this.minDelay) {
await new Promise(r => setTimeout(r, this.minDelay - timeSinceLastRequest));
}
this.running++;
this.lastRequest = Date.now();
try {
return await fn();
} finally {
this.running--;
const next = this.queue.shift();
if (next) next();
}
}
}
// Usage
const limiter = new RateLimiter(5, 10); // 5 concurrent, 10 req/sec
const results = await Promise.all(
prompts.map(prompt =>
limiter.execute(() =>
client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: prompt }]
})
)
)
);Batch Processing
Process large workloads efficiently with controlled batching:
// Instead of sending requests one by one...
// ❌ Bad: 100 separate requests
for (const item of items) {
await processItem(item);
}
// ✅ Good: Batch requests with controlled concurrency
async function processBatch<T, R>(
items: T[],
processor: (item: T) => Promise<R>,
batchSize = 10
): Promise<R[]> {
const results: R[] = [];
for (let i = 0; i < items.length; i += batchSize) {
const batch = items.slice(i, i + batchSize);
const batchResults = await Promise.all(batch.map(processor));
results.push(...batchResults);
// Optional: Add delay between batches
if (i + batchSize < items.length) {
await new Promise(r => setTimeout(r, 100));
}
}
return results;
}
// Process 100 items in batches of 10
const results = await processBatch(items, processItem, 10);Best Practices
Monitor Rate Limit Headers
Track x-ratelimit-remaining-* headers and slow down before hitting limits.
Use Exponential Backoff
Start with a 1-second delay and double it on each retry (1s, 2s, 4s, 8s). Add jitter to prevent thundering herd.
Implement Request Queuing
Queue requests and process them at a controlled rate instead of sending all at once.
Cache Responses
Cache identical requests to reduce API calls. Embeddings are particularly good candidates for caching.
Use Smaller Models
GPT-4o-mini processes faster and has higher token limits than GPT-4o. Use it when quality requirements allow.
Increasing Your Limits
Automatic Tier Upgrades
Your limits automatically increase as you spend more. Each tier unlocks higher limits.
Enterprise Plans
Need custom limits? Contact enterprise@llmhub.dev to discuss your requirements.

