← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Mistral AI

Codestral Embed

Model family: embeddings

Context
8,192 tokens
Released
2025-05-27
Openness
closed-api
License
Cost tier
paid-api
Rating
4.0 — State-of-the-art code retrieval performance with aggressive pricing, and Matryoshka-style variable output dimensions (256 to 3072) let you trade retrieval quality for storage cost smoothly. The go-to first-party choice for code-specific RAG and code-agent retrieval workflows. Closed weights keep this from being the obvious universal pick.
Modalities
text
Capabilities
coding, embeddings
Access
api-first-party, api-third-party

Quick Take

Mistral's code-specialized embedding model — purpose-built for code retrieval, outperforms Voyage Code 3 and OpenAI's embeddings on code benchmarks, and lets you pick your output dimensions to trade quality for storage cost.

Plain-English Description

Codestral Embed does one thing very well: convert source code into vector embeddings optimized for retrieval. That narrow focus matters because general-purpose text embedding models (OpenAI's text-embedding-3-large, Cohere Embed, Voyage AI's general models) treat code as just another text input. Codestral Embed was trained specifically on code with the downstream tasks of code search, repository retrieval, and code-agent context retrieval in mind. The results show it — on Mistral's own benchmarks and on independent code-retrieval evaluations, Codestral Embed outperforms the leading general-purpose embedders for code-specific retrieval, including at smaller output dimensions where general embedders lose quality rapidly.

The most interesting technical feature is variable output dimension. Codestral Embed supports Matryoshka-style nested embeddings: the model produces up to 3,072-dimensional vectors, and you can take the first N dimensions (256, 512, 1024, etc.) as a valid lower-dimensional representation without re-embedding. This is a practical operational win because storage costs for embedding indexes scale linearly with dimensionality — cutting from 3,072 to 512 reduces storage by 6× with modest quality loss. Codestral Embed at dimension 256 and int8 precision still outperforms general-purpose embedders at their full dimensions for code retrieval, which makes it genuinely attractive for very large code indexes.

Codestral Embed is closed-weight — available only through Mistral's hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. at $0.15 per million tokens (50% discount on batch API for offline indexing). For enterprise customers, Mistral offers on-premise deployment agreements, but the weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. themselves aren't publicly released. This is consistent with Mistral's broader pattern for API-first products: Codestral Embed, Mistral Embed (general-purpose), Mistral Moderation, Mistral OCR, and Mistral Saba all follow this model.

Best For

  • Retrieval-Augmented Generation (RAG) systems over large codebases. Feeding relevant code context into a coding agent or IDE copilot. This is the workload Codestral Embed was built for.
  • Code search across enterprise repositories. Natural language or code-query search against proprietary code. Strong retrieval quality at variable dimensions matches enterprise-scale search requirements.
  • Near-duplicate detection and code similarity analysis. Finding functionally similar code across files or repositories. Useful for deduplication, license-policy enforcement, and refactoring-candidate identification.
  • Code clustering and repository analytics. Unsupervised grouping of code by functionality or architectural pattern. Useful for automated documentation, codebase visualization, and architecture analysis.
  • High-volume indexing workloads. The 50% batch API discount makes one-time indexing of very large codebases (millions of files) affordable.

Not For

  • Teams with strict open-weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. requirements. Codestral Embed is closed. For open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. code embeddings, alternatives exist (various community-released Mistral/Ministral fine-tunes, Nomic Embed Code, and others) — none match Codestral Embed's first-party performance but they're inspectable.
  • General-purpose text embeddings. Codestral Embed is code-specialized. For mixed content (documentation, natural language, code), Mistral Embed (the general-purpose counterpart) or a hybrid approach is better.
  • Tiny-budget workloads where embedding quality is secondary. $0.15/M tokens is competitive but not free. For small hobby projects, free community code embedders suffice.
  • Cross-modal retrieval (code + images, code + diagrams). Text-only. Vision-language code retrieval needs different architecture.

License — Plain-English Summary

Proprietary closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability. model. You pay per tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. to call Mistral's API and you get the right to use the embeddings in your applications. You don't get the model weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself., you can't fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. it, and you can't redistribute it. For enterprise customers, Mistral will discuss on-premise deployment under separate commercial terms. Standard proprietary-API arrangement.

How It Compares

  • vs. OpenAI text-embedding-3-large — OpenAI is a general-purpose embedder with broader language coverage; Codestral Embed is code-specialized and outperforms OpenAI on code-retrieval benchmarks, especially at lower dimensions. OpenAI's model is better for mixed content; Codestral Embed is better for code-specific retrieval.
  • vs. Voyage Code 3 — Direct competitor; Mistral's benchmarks show Codestral Embed outperforming Voyage Code 3 on retrieval tasks. Voyage has broader model offerings and longer track record in embedding-specific products.
  • vs. Cohere Embed v4.0 — Cohere's general-purpose embedder, not code-specialized. Codestral Embed wins on code; Cohere wins on multilingual and general-purpose coverage.
  • vs. Mistral Embed (general-purpose) — Mistral's own general-purpose text embedder. Use Codestral Embed for code-heavy content; use Mistral Embed for general documentation, mixed content, or natural-language-dominant workloads.

Under the Hood

Codestral Embed generates output vectors of up to 3,072 dimensions with Matryoshka-style nesting — the first N dimensions of a full vector form a valid lower-dimensional embedding, ordered by retained relevance. This is a practical win over models that require separate model variants or re-embedding for dimensional trade-offs. The model supports float (32-bit default) and int8 precision outputs, with int8 at dimension 256 still outperforming competitors at their full dimensions.

Context windowThe maximum amount of text the model can "see" at once — prompt plus prior conversation plus any documents you give it. Measured in tokens (which are roughly three-quarters of a word each). A 128K context window is about 96,000 words of input — roughly a 400-page book. Larger context windows let the model work with bigger documents but cost more to run. is 8,192 tokens per chunk. Mistral's recommended chunking strategy for code retrieval is 3,000 characters with 1,000-character overlap — larger chunks measurably degrade retrieval performance according to their documentation. The embedding space is optimized for code-to-code retrieval (given a query snippet, find similar code), code-to-text retrieval (given a natural-language query, find relevant code), and text-to-code retrieval (given code documentation, find the code).

The model is accessible via Mistral's Python and TypeScript SDKs, through Spring AI's Mistral integration, via OpenRouter's OpenAI-compatible embeddings API, and through Mistral's batch API for offline indexing workloads. Fine-tuning is not publicly supported — adaptation to domain-specific code is handled through Mistral's enterprise on-premise engagements.

Cost

API input (per 1M tokens)
$0.15
API providers
mistral, openrouter
Notes
$0.15 per million tokens via Mistral's API. 50% discount available through the batch API for large offline indexing jobs. On-premise deployment is available for enterprise customers — contact Mistral's applied AI team. Recommended chunking: 3,000 characters with 1,000-character overlap for retrieval use cases (larger chunks degrade retrieval quality).

Comparable models

Sources