Best Practices
Guidelines for building reliable, cost-effective, and performant AI applications.
Prompt Engineering
Be Specific and Clear
Vague prompts lead to unpredictable results. Be explicit about what you want.
❌ Vague
"Summarize this article"
✓ Specific
"Summarize this article in 3 bullet points, focusing on the key takeaways for software developers. Use plain language and avoid jargon."
Use System Messages Effectively
The system message sets context and constraints that persist throughout the conversation.
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a helpful customer support agent for TechCo.
Rules:
- Be friendly and professional
- If you don't know something, say so
- Never discuss competitors
- Keep responses under 100 words
- Always offer to escalate to a human if the issue is complex`
},
{
role: 'user',
content: 'My order hasn\'t arrived yet.'
}
]
});Provide Examples (Few-Shot)
Show the model what you want with examples.
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: 'Convert natural language to SQL queries.'
},
{ role: 'user', content: 'Get all users from California' },
{ role: 'assistant', content: "SELECT * FROM users WHERE state = 'CA';" },
{ role: 'user', content: 'Find orders over $100 from last month' },
{ role: 'assistant', content: "SELECT * FROM orders WHERE amount > 100 AND created_at >= DATE_SUB(NOW(), INTERVAL 1 MONTH);" },
{ role: 'user', content: 'Count active subscriptions by plan' }
// Model will follow the established pattern
]
});Cost Optimization
Choose the Right Model
Not every task needs GPT-4o. Use gpt-4o-mini for simple tasks—it's 15-20x cheaper and often sufficient.
Limit Response Length
Set max_tokens to prevent unexpectedly long responses. Output tokens cost 2-4x more than input tokens.
Trim Context
Only include relevant conversation history. Summarize or truncate older messages instead of sending the full chat log.
Cache Responses
Cache identical requests. For embeddings, cache vectors in a database instead of re-computing.
Batch Requests
For embeddings and bulk processing, batch multiple items in a single request to reduce overhead.
Reliability
Implement Retries
Use exponential backoff for transient failures:
async function withRetry<T>(
fn: () => Promise<T>,
maxRetries = 3,
baseDelay = 1000
): Promise<T> {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
if (i === maxRetries - 1) throw error;
const delay = baseDelay * Math.pow(2, i) + Math.random() * 1000;
await new Promise(r => setTimeout(r, delay));
}
}
throw new Error('Max retries exceeded');
}Set Timeouts
Always set reasonable timeouts for API calls. LLM responses can occasionally take longer than expected.
Validate Outputs
Don't trust model outputs blindly. Validate JSON structure, check for required fields, and sanitize before use.
Handle Rate Limits
Monitor rate limit headers and implement request queuing for high-volume applications.
Security
Protect API Keys
- • Never expose API keys in client-side code
- • Use environment variables
- • Rotate keys regularly
- • Use separate keys for dev/staging/production
Sanitize User Input
Prevent prompt injection by validating and sanitizing user inputs. Consider using structured formats instead of free text.
Don't Log Sensitive Data
Avoid logging full prompts or responses that may contain PII or sensitive information.
Validate Webhook Signatures
Always verify HMAC signatures on incoming webhooks to prevent spoofing.
Performance
Use Streaming
Enable streaming for chat UIs to show responses as they generate. This dramatically improves perceived latency.
Parallelize Independent Calls
When processing multiple items, run them concurrently:
// Process 10 items in parallel
const results = await Promise.all(
items.map(item =>
client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: item }]
})
)
);Use Smaller Models for Routing
Use a fast model to classify or route requests, then use a powerful model only when needed.
Testing & Evaluation
Create Test Suites
Build a dataset of test cases with expected outputs. Run regression tests when changing prompts.
Use LLM-as-Judge
For subjective quality, use a separate model to evaluate outputs against criteria.
Monitor in Production
Track latency, token usage, error rates, and user feedback. Log enough to debug issues without storing sensitive data.
A/B Test Prompts
When iterating on prompts, run A/B tests to measure impact on key metrics.

