Hybrid Search: Combining Semantic and Keyword Search
RAG & Knowledge Retrieval
Hybrid Search: Combining Semantic and Keyword Search
SStackviv Team
13 min read

Key takeaways

  • Hybrid search combines keyword matching (BM25) with semantic vector search to deliver more accurate, contextually relevant results
  • Reciprocal Rank Fusion (RRF) merges ranked results from both methods without complex score normalization
  • The alpha parameter controls the balance between keyword precision and semantic understanding, typically ranging from 0.3 to 0.7
  • RAG applications benefit significantly from hybrid retrieval, with benchmarks showing 18% or higher improvements in ranking precision
  • Vector databases like Weaviate, Pinecone, and Elasticsearch offer native hybrid search support

What Is Hybrid Search and Why Does It Matter?

Hybrid search is a retrieval technique that runs keyword matching and semantic vector search simultaneously, then merges their results into a single ranked list. Instead of choosing one method over the other, you get the strengths of both.

Keyword search excels at finding exact matches. If someone searches for a product code like "SKU-7742" or a specific error message, traditional BM25 will find it. Semantic search, powered by embeddings, understands meaning and context. A search for "affordable running shoes" can surface products described as "budget sneakers" or "low-cost trainers" even when those exact words don't appear.

The problem? Each method has blind spots.

Keyword search misses synonyms and related concepts. Semantic search can overlook specific technical terms, proper nouns, and domain-specific jargon that embedding models weren't trained on. For many real-world applications, neither approach alone is enough.

That's where hybrid retrieval comes in. By combining both methods, you capture documents that match exact keywords while also pulling in semantically relevant results that pure keyword matching would miss. Google Search itself uses this approach, running semantic search alongside its token-based keyword algorithm.

How Does Keyword Search Work?

Before understanding how to combine semantic and keyword approaches, it helps to know what's happening under the hood.

BM25 (Best Matching 25) is the dominant keyword ranking algorithm used by search engines like Elasticsearch, OpenSearch, and Apache Lucene. It's been around since the 1990s and remains the default in most production systems because it's fast, reliable, and surprisingly effective.

The algorithm scores documents based on three factors:

Term frequency: How often does the search term appear in the document? Documents mentioning "hybrid search" five times score higher than those mentioning it once, but with diminishing returns. The algorithm uses saturation so that repeating a term 50 times doesn't give you 50 times the score.

Inverse document frequency (IDF): How rare is the term across all documents? Common words like "the" or "is" contribute almost nothing to the score because they appear everywhere. Rare, specific terms like "reciprocal rank fusion" carry much more weight.

Document length normalization: Shorter, focused documents score higher than long documents with incidental mentions. A 100-word article specifically about BM25 beats a 10,000-word ebook that mentions it once.

The result is a sparse vector representation. Most values are zero (for terms that don't appear), with a few non-zero values for terms that do. This makes BM25 extremely efficient, answering queries in milliseconds even over millions of documents.

But BM25 only understands tokens, not meaning. Search for "doctor" and you won't find documents about "physicians." Search for "how to fix a broken window" and you might miss a guide titled "repairing cracked glass."

How Does Semantic Search Work?

Semantic search takes a completely different approach. Instead of matching words, it matches meaning.

Understanding how embeddings power semantic retrieval is essential here. An embedding model (like those from OpenAI, Cohere, or open-source alternatives) converts text into dense vectors, typically with 768 to 1536 dimensions. Every dimension has a value, creating a rich numerical representation of the text's meaning.

When you search, your query gets embedded into the same vector space. The system then finds documents whose vectors are closest to your query vector using similarity measures like cosine similarity.

The magic is that semantically similar concepts end up near each other in vector space. "Doctor," "physician," and "medical practitioner" cluster together. "How to fix a broken window" lands near "repairing cracked glass" because the underlying meaning is the same.

This solves the synonym problem. It handles paraphrasing, conceptual similarity, and natural language queries beautifully. But semantic search has its own weaknesses.

Embedding models are trained on specific data. Out-of-domain terms, newly coined product names, internal company jargon, technical codes, and proper nouns often aren't represented well. A semantic search for "MacBook M3 Pro" might return results about laptops generally rather than that specific product.

The comparison between semantic versus keyword search reveals complementary strengths. Keyword search provides precision for exact terms. Semantic search provides recall for conceptually related content. Hybrid search delivers both.

How Does Hybrid Search Combine Both Methods?

Hybrid search runs keyword and semantic queries in parallel, then merges the results. The challenge is combining two ranked lists that use completely different scoring scales.

BM25 scores are unbounded. They might range from 0 to 20 or 0 to 200 depending on your documents and queries. Semantic similarity scores typically fall between 0 and 1 (or 0 and 2 for cosine similarity). You can't simply add them together because the keyword scores would dominate.

Two main fusion approaches solve this:

Reciprocal Rank Fusion (RRF): This method ignores raw scores entirely and focuses on rank positions. Each document receives a score of 1/(k + rank) where k is a constant (typically 60) and rank is its position in each result list. Documents that appear high in both lists get boosted significantly.

RRF is popular because it requires no tuning. It works out of the box across different datasets and scoring systems. The k=60 constant provides good balance between top-ranked and lower-ranked items.

Linear combination with normalization: This approach normalizes scores from each method (using min-max scaling or L2 normalization) and then combines them with weights. The formula looks like: final_score = alpha × dense_score + (1 - alpha) × sparse_score.

Linear combination offers more control. You can emphasize keyword precision by lowering alpha or boost semantic understanding by raising it. But it requires tuning weights for your specific dataset.

For most teams starting out, RRF is the better choice. It's simpler, more robust, and produces strong results without experimentation. Linear combination becomes valuable when you have labeled data to tune against and need precise control over the balance.

What Is Reciprocal Rank Fusion?

Let's break down how reciprocal rank fusion actually works, since it's the most common approach in production systems.

Suppose you search for "Q3 economic data" and get these results:

From keyword search (BM25): Document A (rank 1), Document B (rank 2), Document C (rank 3)

From semantic search: Document B (rank 1), Document D (rank 2), Document A (rank 3)

With k=60, the RRF scores are:

Document A: 1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 = 0.0323

Document B: 1/(60+2) + 1/(60+1) = 0.0161 + 0.0164 = 0.0325

Document C: 1/(60+3) + 0 = 0.0159

Document D: 0 + 1/(60+2) = 0.0161

The final ranking: B, A, D, C.

Notice that Document B wins even though it was ranked second in keyword search. Its strong performance in both lists gives it the highest combined score. Documents appearing in only one list score lower.

This approach naturally prioritizes documents with broad relevance. If something ranks high in both keyword and semantic results, it's probably what the user wants.

How Do You Balance Keyword and Semantic Weight?

When using linear combination instead of RRF, the alpha parameter controls everything. Understanding how to tune it is essential for RAG implementation best practices.

Alpha ranges from 0 to 1:

  • alpha = 0: Pure keyword search (BM25 only)
  • alpha = 1: Pure semantic search (vectors only)
  • alpha = 0.5: Equal weight to both

But 0.5 isn't always optimal. Research from Pinecone suggests different strategies:

Out-of-domain embedding models (alpha 0.3 to 0.6): When your embedding model wasn't trained on your specific content type, keyword matching should carry more weight. The semantic scores are less reliable, so you compensate by leaning toward BM25.

Fine-tuned embedding models (alpha 0.7 to 0.9): When embeddings are trained on your domain, semantic scores are more trustworthy. You can weight them higher while still capturing exact keyword matches.

Query-dependent tuning: Some advanced systems dynamically adjust alpha based on query characteristics. Technical queries with product codes might use lower alpha. Natural language questions might use higher alpha.

Benchmarking from LlamaIndex shows that hybrid search with alpha around 0.5 to 0.6 consistently outperforms pure keyword or pure semantic search across most query types, especially when combined with reranking retrieved results.

Hybrid search isn't always necessary. Understanding when it helps, and when simpler approaches work better, saves engineering effort.

Hybrid search works best when:

Your content contains both domain-specific terms and natural language descriptions. E-commerce is a perfect example. Product searches include exact model names ("Sony WH-1000XM5") alongside descriptive queries ("comfortable noise-canceling headphones for travel").

Users switch between precise and vague queries. Technical documentation falls here. Developers search for exact function names, error codes, and API endpoints, but also ask conceptual questions like "how do I handle authentication."

Your embedding model wasn't fine-tuned on your data. Out-of-the-box embedding models from OpenAI or Cohere work well for general content but struggle with specialized vocabulary. BM25 catches what the embeddings miss.

RAG applications need reliable retrieval. As covered in any guide to RAG and vector databases, the quality of retrieved context directly impacts generation quality. Hybrid search reduces retrieval failures by combining multiple signals.

Keyword-only search works when:

Exact matching is all that matters. Code repositories where developers search for specific function names or error messages. Legal databases where precise clause identification is essential.

Semantic-only search works when:

Users don't know the exact terminology. Mental health forums where people describe feelings rather than clinical terms. Creative platforms where conceptual similarity matters more than specific words.

For most RAG and enterprise search applications, hybrid search delivers better results. The 18% improvement in ranking precision reported in benchmarks is typical.

Here's what a typical hybrid search implementation looks like in practice.

Step 1: Index your content for both methods.

Each document needs two representations: a sparse representation for BM25 (generated by tokenizing text and building an inverted index) and a dense representation for semantic search (generated by running text through an embedding model).

Most vector databases handle the sparse indexing automatically. You focus on generating embeddings. Chunking strategies matter here because both retrieval methods benefit from properly sized content chunks.

Step 2: Configure your hybrid query.

Queries also need dual representations. For keyword matching, the query is tokenized. For semantic matching, the query is embedded using the same model that processed your documents.

Step 3: Apply fusion.

RRF or linear combination merges the results. Configure parameters like k (for RRF) or alpha (for linear combination) based on your use case.

Step 4: Optional reranking.

For highest accuracy, add a reranking step. A cross-encoder model rescores the top candidates, further improving relevance. This is computationally expensive but valuable when precision is critical.

Step 5: Return results to the LLM or user.

The final ranked list feeds into your application. For RAG, these chunks become context for generation. Better prompt engineering combined with better retrieval yields better answers.

Several vector databases now support hybrid search natively. Your choice depends on operational requirements, scale, and budget.

Weaviate is often cited as the hybrid search specialist. It processes keyword, vector, and metadata queries simultaneously through a unified architecture. The alpha parameter controls weighting directly in queries. GraphQL APIs and excellent documentation make it approachable for proof-of-concept work.

Elasticsearch introduced hybrid search through its retriever APIs. You can combine BM25 queries with kNN vector search using either RRF or linear combination. The advantage is Elasticsearch's mature ecosystem, enterprise features, and familiarity for teams already using it for traditional search.

Pinecone recommends creating separate dense and sparse indexes, running searches in parallel, then combining results with their hosted reranking models.

OpenSearch added reciprocal rank fusion support in version 2.19. The Neural Search plugin merges results from keyword, kNN, and Boolean queries into a single ranked list.

Regardless of which database you choose, the AI-powered search tools ecosystem continues to evolve rapidly. Native hybrid support is becoming table stakes rather than a differentiator.

How Do You Tune Hybrid Search for Better Results?

Hybrid search isn't set-and-forget. Tuning matters.

Start with RRF at k=60. This is the default in most systems and works well for general cases. RRF is robust to score differences and requires no weight calibration.

Measure before optimizing. Use evaluation metrics like Mean Reciprocal Rank (MRR) and Hit Rate. MRR tells you how highly the correct answer ranks on average. Hit Rate tells you how often the correct answer appears in your top-k results.

Test different alpha values if using linear combination. Run experiments with alpha at 0.3, 0.5, 0.7, and 1.0. Different query types often favor different settings.

Consider query-dependent weighting. Advanced systems detect query characteristics and adjust dynamically. A query containing a product code triggers lower alpha. A conversational question triggers higher alpha.

Add reranking for high-stakes applications. Cross-encoder rerankers rescore the top 20 to 50 results using more expensive but more accurate relevance models. The added latency (often 100 to 300 milliseconds) is worth it when precision is critical.

Balance latency and accuracy. Hybrid search is slower than single-method retrieval because you're running two systems. For real-time applications, ensure combined latency stays under your threshold.

Teams implementing hybrid search run into predictable problems.

Challenge: Different score scales make tuning confusing.

Solution: Use RRF instead of linear combination. RRF works on ranks, not scores, eliminating scale issues entirely.

Challenge: Keyword search dominates results.

Solution: Lower the weight on keyword scores (increase alpha if using linear combination) or use RRF which naturally balances both signals.

Challenge: Semantic search returns irrelevant results.

Solution: The embedding model may not be suited to your domain. Consider fine-tuning or using a model trained on similar content. Also verify your chunking strategy.

Challenge: Latency is too high.

Solution: Cache embedding calls when possible. Reduce the number of results retrieved from each method before fusion.

Challenge: Zero-result queries.

Solution: Hybrid search actually helps here. When keyword search returns nothing, semantic search can still find conceptually related content.

Conclusion

Hybrid search solves a fundamental tension in information retrieval. Keyword matching gives you precision. Semantic understanding gives you context. Running both and merging results with techniques like reciprocal rank fusion delivers better relevance than either approach alone.

Start with RRF if you're implementing hybrid search for the first time. It's robust, tuning-free, and performs well across most use cases. Use BM25 plus embeddings as your baseline, test with real queries, and add reranking if precision gains justify the latency cost.

For RAG applications, hybrid retrieval is particularly impactful. The quality of retrieved context directly determines generation quality. When your retriever misses relevant chunks, no amount of prompt engineering fixes the problem.

The tools are mature. Weaviate, Elasticsearch, Pinecone, and OpenSearch all support hybrid patterns. The concepts are well-understood. What remains is matching the approach to your specific data, queries, and performance requirements.

Frequently Asked Questions

What is hybrid search?

Hybrid search combines keyword-based matching (using algorithms like BM25) with semantic vector search (using embeddings) to improve retrieval accuracy. It runs both methods simultaneously and merges results into a single ranked list.

How does reciprocal rank fusion work?

RRF combines ranked results from multiple search methods based on position rather than raw scores. Each document receives a score of 1/(k + rank), where k is typically 60. Documents that rank highly in multiple lists get boosted in the final ranking.

When should I use hybrid search instead of semantic search alone?

Use hybrid search when your content contains exact terms users might search for (product codes, technical terms, proper nouns) alongside natural language descriptions. Semantic search alone can miss specific keywords that embedding models weren't trained on.

What is the alpha parameter in hybrid search?

Alpha controls the balance between keyword and semantic scores in linear combination fusion. Alpha equals 0 means pure keyword search. Alpha equals 1 means pure semantic search. Values around 0.5 give equal weight to both methods.

Does hybrid search work with all vector databases?

Not all, but most modern vector databases support hybrid search. Weaviate, Elasticsearch, OpenSearch, and Pinecone offer native hybrid capabilities. Implementation varies, so check your database's documentation for specifics.
Stackviv Team

Stackviv Team

Author

Stackviv Team is our editorial crew of AI enthusiasts and tech researchers dedicated to helping you discover the best AI tools. We test, compare, and review AI software across every category to bring you honest insights and practical guides. Our mission: make AI accessible and useful for everyone - from beginners to professionals.

Related Articles

View All
Cosine Similarity: How AI Measures Relevance
RAG & Knowledge Retrieval

Cosine Similarity: How AI Measures Relevance

Learn how cosine similarity helps AI measure relevance between vectors. Discover the math, real-world applications in search, recommendations, and RAG systems.

SStackviv Team
10 min
Read: Cosine Similarity: How AI Measures Relevance
What Is RAG (Retrieval Augmented Generation)?
RAG & Knowledge Retrieval

What Is RAG (Retrieval Augmented Generation)?

RAG (Retrieval Augmented Generation) connects large language models to external knowledge sources, enabling AI to access real-time information beyond its training data for more accurate, grounded responses.

SStackviv Team
13 min
Read: What Is RAG (Retrieval Augmented Generation)?
AI Knowledge Bases: Building Your Own
RAG & Knowledge Retrieval

AI Knowledge Bases: Building Your Own

Learn how to build an AI knowledge base that transforms scattered company documents into an intelligent system delivering accurate, contextual answers to your team and customers.

SStackviv Team
10 min
Read: AI Knowledge Bases: Building Your Own