Vision

Analyze images using AI vision models. Extract text, describe scenes, compare images, and more.

Supported Models

Model	Provider	Max Images	Notes
gpt-4o	OpenAI	20	Recommended
gpt-4o-mini	OpenAI	20	Faster, cheaper
claude-3.5-sonnet	Anthropic	20	Excellent at details
gemini-2.0-flash	Google	16	Fast, good for video frames
llama-3.2-90b-vision	Meta	10	Open source

Basic Usage

Pass images in the message content array alongside text:

TypeScript

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What is in this image?' },
        {
          type: 'image_url',
          image_url: {
            url: 'https://example.com/image.jpg'
          }
        }
      ]
    }
  ]
});

console.log(response.choices[0].message.content);

Base64 Images

Send images as base64-encoded data URLs for local files or generated images:

TypeScript

import fs from 'fs';

// Read image and convert to base64
const imageBuffer = fs.readFileSync('path/to/image.png');
const base64Image = imageBuffer.toString('base64');

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Describe this image in detail.' },
        {
          type: 'image_url',
          image_url: {
            url: `data:image/png;base64,${base64Image}`
          }
        }
      ]
    }
  ]
});

Supported formats: JPEG, PNG, GIF (first frame only), WebP

Multiple Images

Analyze multiple images in a single request for comparison or context:

TypeScript

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Compare these two images. What are the differences?' },
        {
          type: 'image_url',
          image_url: { url: 'https://example.com/image1.jpg' }
        },
        {
          type: 'image_url',
          image_url: { url: 'https://example.com/image2.jpg' }
        }
      ]
    }
  ]
});

Detail Level

Control image analysis quality with the detail parameter:

TypeScript

{
  type: 'image_url',
  image_url: {
    url: 'https://example.com/image.jpg',
    detail: 'high'  // 'low', 'high', or 'auto'
  }
}

// 'low' - 512x512 fixed, faster and cheaper
// 'high' - Detailed analysis, uses more tokens
// 'auto' - Model decides based on image size (default)

Detail	Tokens	Best For
`low`	~85 tokens	Quick classification, thumbnails, simple scenes
`high`	~765+ tokens	OCR, detailed analysis, small text, fine details
`auto`	Varies	Let the model choose based on image size

Common Use Cases

OCR / Text Extraction

Extract text from documents, screenshots, or photos:

TypeScript

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: 'Extract all text from this image. Return it as plain text.'
        },
        {
          type: 'image_url',
          image_url: { url: 'https://example.com/document.png' }
        }
      ]
    }
  ]
});

Chart & Data Analysis

Analyze charts, graphs, and data visualizations:

TypeScript

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: `Analyze this chart and provide:
1. The type of chart
2. Main trends or patterns
3. Key data points
4. Any insights or conclusions`
        },
        {
          type: 'image_url',
          image_url: { url: 'https://example.com/chart.png' }
        }
      ]
    }
  ]
});

Image Size & Costs

Size Limits

• Maximum file size: 20 MB per image
• Maximum dimensions: 2048 × 2048 pixels (images are resized)
• Minimum dimensions: 10 × 10 pixels

Token Calculation

Images are converted to tokens based on their size and detail level. High detail images are split into 512×512 tiles, each costing ~170 tokens. A 1024×1024 high-detail image uses approximately 765 tokens.

Best Practices

Optimize Image Size

Resize large images before sending to reduce costs. For most tasks, 1024×1024 is sufficient quality.

Use Low Detail When Possible

For simple tasks like classification or general description, usedetail: "low"to save tokens.

Be Specific in Prompts

Tell the model exactly what to look for. "What text is in the top-right corner?" is better than "What's in this image?"

Cache URL Images

When using image URLs, ensure they're stable and fast to load. Consider using a CDN for frequently analyzed images.

Limitations

• Cannot identify specific people (for privacy)
• May struggle with very small text or low-contrast images
• Animated GIFs: only the first frame is analyzed
• Cannot process videos directly (extract frames first)
• May misinterpret highly stylized or abstract images

Vision

Supported Models

Basic Usage

Base64 Images

Multiple Images

Detail Level

Common Use Cases

OCR / Text Extraction

Chart & Data Analysis

Image Size & Costs

Size Limits

Token Calculation

Best Practices

Optimize Image Size

Use Low Detail When Possible

Be Specific in Prompts

Cache URL Images

Limitations

Related Guides

Image Generation →

Chat Completions →