Embeddings
POSTConvert text into numerical vectors for semantic search, clustering, recommendations, and more.
Overview
Embeddings are numerical representations of text that capture semantic meaning. Similar texts have similar embeddings, making them useful for:
- Semantic search (find content by meaning, not just keywords)
- Recommendations (find similar items)
- Clustering (group related content)
- Classification (categorize text)
- RAG (Retrieval-Augmented Generation)
Endpoint
Basic Usage
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: 'The quick brown fox jumps over the lazy dog.',
});
console.log(response.data[0].embedding);
// [0.023, -0.042, 0.018, ...] (1536 dimensions)Batch Processing
Embed multiple texts in a single request for better performance:
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: [
'First document about machine learning',
'Second document about web development',
'Third document about data science',
'Fourth document about artificial intelligence',
]
});
// Each input gets its own embedding
for (const item of response.data) {
console.log(`Index ${item.index}: ${item.embedding.length} dimensions`);
}Tip: Batch up to 2048 texts per request. Processing in batches is 5-10x faster than individual requests.
Available Models
| Model | Dimensions | Max Tokens | Use Case |
|---|---|---|---|
| text-embedding-3-small | 1536 | 8191 | Recommended |
| text-embedding-3-large | 3072 | 8191 | Highest accuracy |
| text-embedding-ada-002 | 1536 | 8191 | Legacy, widely compatible |
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Required | Model ID to use |
input | string | string[] | Required | Text(s) to embed |
encoding_format | string | Optional | "float" (default) or "base64" |
dimensions | integer | Optional | Reduce dimensions (v3 models only) |
Comparing Embeddings
Use cosine similarity to measure how similar two texts are:
function cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
// Compare two texts
const text1 = "I love programming";
const text2 = "Coding is my passion";
const text3 = "The weather is nice today";
const [emb1, emb2, emb3] = await Promise.all([
getEmbedding(text1),
getEmbedding(text2),
getEmbedding(text3),
]);
console.log('Text 1 vs Text 2:', cosineSimilarity(emb1, emb2)); // ~0.85
console.log('Text 1 vs Text 3:', cosineSimilarity(emb1, emb3)); // ~0.30Semantic Search Example
Build a simple semantic search system:
// 1. Index your documents (do once, store in database)
const documents = [
{ id: 1, text: 'How to reset your password' },
{ id: 2, text: 'Billing and subscription FAQ' },
{ id: 3, text: 'Getting started with the API' },
{ id: 4, text: 'Troubleshooting common errors' },
];
// Generate embeddings for each document
const indexed = await Promise.all(
documents.map(async (doc) => ({
...doc,
embedding: await getEmbedding(doc.text)
}))
);
// 2. Search (do for each query)
async function search(query: string, topK = 3) {
const queryEmbedding = await getEmbedding(query);
const results = indexed
.map(doc => ({
...doc,
score: cosineSimilarity(queryEmbedding, doc.embedding)
}))
.sort((a, b) => b.score - a.score)
.slice(0, topK);
return results;
}
// Example search
const results = await search("I can't log into my account");
// Returns: "How to reset your password" (highest match)RAG (Retrieval-Augmented Generation)
Combine embeddings with chat completions for knowledge-grounded responses:
async function askWithContext(question: string) {
// 1. Find relevant documents
const relevantDocs = await search(question, 3);
// 2. Build context from top results
const context = relevantDocs
.map(doc => doc.text)
.join('\n\n');
// 3. Ask LLM with context
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `Answer based on this context:\n\n${context}`
},
{
role: 'user',
content: question
}
]
});
return response.choices[0].message.content;
}
const answer = await askWithContext("How do I reset my password?");Best Practices
Chunk Long Documents
Split long texts into smaller chunks (200-500 tokens) for better search results. Include some overlap between chunks.
Normalize Text
Clean and normalize your text before embedding. Remove excessive whitespace, fix encoding issues, and consider lowercasing.
Use a Vector Database
For production, store embeddings in a vector database like Pinecone, Weaviate, Qdrant, or pgvector for efficient similarity search.
Cache Embeddings
Embeddings are deterministic—the same input always produces the same output. Cache them to avoid redundant API calls.
Vector Database Integrations
Pinecone
Managed vector database with auto-scaling
Qdrant
Open-source with excellent filtering
Weaviate
GraphQL API, hybrid search
pgvector
PostgreSQL extension, use existing DB

