Embeddings

POST

Convert text into numerical vectors for semantic search, clustering, recommendations, and more.

Overview

Embeddings are numerical representations of text that capture semantic meaning. Similar texts have similar embeddings, making them useful for:

  • Semantic search (find content by meaning, not just keywords)
  • Recommendations (find similar items)
  • Clustering (group related content)
  • Classification (categorize text)
  • RAG (Retrieval-Augmented Generation)

Endpoint

POST https://api.llmhub.dev/v1/embeddings

Basic Usage

TypeScript
const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'The quick brown fox jumps over the lazy dog.',
});

console.log(response.data[0].embedding);
// [0.023, -0.042, 0.018, ...] (1536 dimensions)

Batch Processing

Embed multiple texts in a single request for better performance:

TypeScript
const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: [
    'First document about machine learning',
    'Second document about web development',
    'Third document about data science',
    'Fourth document about artificial intelligence',
  ]
});

// Each input gets its own embedding
for (const item of response.data) {
  console.log(`Index ${item.index}: ${item.embedding.length} dimensions`);
}

Tip: Batch up to 2048 texts per request. Processing in batches is 5-10x faster than individual requests.

Available Models

ModelDimensionsMax TokensUse Case
text-embedding-3-small15368191Recommended
text-embedding-3-large30728191Highest accuracy
text-embedding-ada-00215368191Legacy, widely compatible
Note: text-embedding-3-small offers the best balance of quality and cost for most applications.

Parameters

ParameterTypeRequiredDescription
modelstringRequiredModel ID to use
inputstring | string[]RequiredText(s) to embed
encoding_formatstringOptional"float" (default) or "base64"
dimensionsintegerOptionalReduce dimensions (v3 models only)

Comparing Embeddings

Use cosine similarity to measure how similar two texts are:

TypeScript
function cosineSimilarity(a: number[], b: number[]): number {
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;
  
  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  
  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}

// Compare two texts
const text1 = "I love programming";
const text2 = "Coding is my passion";
const text3 = "The weather is nice today";

const [emb1, emb2, emb3] = await Promise.all([
  getEmbedding(text1),
  getEmbedding(text2),
  getEmbedding(text3),
]);

console.log('Text 1 vs Text 2:', cosineSimilarity(emb1, emb2)); // ~0.85
console.log('Text 1 vs Text 3:', cosineSimilarity(emb1, emb3)); // ~0.30

Semantic Search Example

Build a simple semantic search system:

TypeScript
// 1. Index your documents (do once, store in database)
const documents = [
  { id: 1, text: 'How to reset your password' },
  { id: 2, text: 'Billing and subscription FAQ' },
  { id: 3, text: 'Getting started with the API' },
  { id: 4, text: 'Troubleshooting common errors' },
];

// Generate embeddings for each document
const indexed = await Promise.all(
  documents.map(async (doc) => ({
    ...doc,
    embedding: await getEmbedding(doc.text)
  }))
);

// 2. Search (do for each query)
async function search(query: string, topK = 3) {
  const queryEmbedding = await getEmbedding(query);
  
  const results = indexed
    .map(doc => ({
      ...doc,
      score: cosineSimilarity(queryEmbedding, doc.embedding)
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
  
  return results;
}

// Example search
const results = await search("I can't log into my account");
// Returns: "How to reset your password" (highest match)

RAG (Retrieval-Augmented Generation)

Combine embeddings with chat completions for knowledge-grounded responses:

TypeScript
async function askWithContext(question: string) {
  // 1. Find relevant documents
  const relevantDocs = await search(question, 3);
  
  // 2. Build context from top results
  const context = relevantDocs
    .map(doc => doc.text)
    .join('\n\n');
  
  // 3. Ask LLM with context
  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `Answer based on this context:\n\n${context}`
      },
      {
        role: 'user',
        content: question
      }
    ]
  });
  
  return response.choices[0].message.content;
}

const answer = await askWithContext("How do I reset my password?");

Best Practices

Chunk Long Documents

Split long texts into smaller chunks (200-500 tokens) for better search results. Include some overlap between chunks.

Normalize Text

Clean and normalize your text before embedding. Remove excessive whitespace, fix encoding issues, and consider lowercasing.

Use a Vector Database

For production, store embeddings in a vector database like Pinecone, Weaviate, Qdrant, or pgvector for efficient similarity search.

Cache Embeddings

Embeddings are deterministic—the same input always produces the same output. Cache them to avoid redundant API calls.

Vector Database Integrations

Pinecone

Managed vector database with auto-scaling

Qdrant

Open-source with excellent filtering

Weaviate

GraphQL API, hybrid search

pgvector

PostgreSQL extension, use existing DB

Next Steps