Hybrid Search
Combine keyword and semantic search for production-grade retrieval. Neither alone is enough.
The Problem: Neither Search is Perfect
In production systems, you quickly discover that pure semantic search and pure keyword search each have critical blind spots. Hybrid search combines both to eliminate these weaknesses.
Semantic Search Failures
- -Misses exact matches: "error code 0x8007045D"
- -Struggles with proper nouns and IDs
- -May find "similar" but not what user typed
- -Embeddings can conflate similar concepts
Keyword Search Failures
- -Misses synonyms: "car" vs "automobile"
- -Fails on paraphrases and rewordings
- -No understanding of context or intent
- -Vocabulary mismatch between query and docs
Concrete Example
Query: "how to fix authentication failure in OAuth2"
Keyword (BM25) returns:
"OAuth2 authentication failure troubleshooting guide"
Exact keyword match
Semantic returns:
"Debugging login issues with identity providers"
Meaning match (no "OAuth2" keyword)
Hybrid search returns both - the exact match AND the semantic match, giving users the best of both worlds.
The Solution: Hybrid Search
Hybrid search runs both keyword (BM25) and vector search in parallel, then combines the results using a fusion algorithm. The most common approach is Reciprocal Rank Fusion (RRF).
Architecture Overview
Input
Query
Path 1
BM25
Keyword
Path 2
Vector
Semantic
Fusion
RRF
Output
Results
BM25: The Keyword Engine
BM25 (Best Matching 25) is the industry-standard keyword ranking algorithm. It scores documents based on term frequency (TF) and inverse document frequency (IDF), with saturation to prevent keyword stuffing.
from rank_bm25 import BM25Okapi
import numpy as np
documents = [
"OAuth2 authentication failure troubleshooting guide",
"How to configure SSO with SAML providers",
"Debugging login issues with identity providers",
"REST API authentication best practices",
"Token refresh flow implementation guide"
]
# Tokenize documents (simple whitespace split, can use better tokenizer)
tokenized_docs = [doc.lower().split() for doc in documents]
# Create BM25 index
bm25 = BM25Okapi(tokenized_docs)
# Search
query = "authentication failure OAuth2"
tokenized_query = query.lower().split()
bm25_scores = bm25.get_scores(tokenized_query)
# Get ranked results
ranked_indices = np.argsort(bm25_scores)[::-1]
print("BM25 Results:")
for idx in ranked_indices[:3]:
print(f" {bm25_scores[idx]:.3f}: {documents[idx]}")BM25 Results: 2.847: OAuth2 authentication failure troubleshooting guide 1.123: REST API authentication best practices 0.892: Debugging login issues with identity providers
BM25 Parameters
k1 (default: 1.5)
Controls term frequency saturation. Higher = more weight to repeated terms.
b (default: 0.75)
Length normalization. 0 = no normalization, 1 = full normalization.
Vector Search: The Semantic Engine
Vector search finds semantically similar content even when the exact words differ. We covered this in Lesson 1.1, but here's a quick refresher for the hybrid context.
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('BAAI/bge-small-en-v1.5')
documents = [
"OAuth2 authentication failure troubleshooting guide",
"How to configure SSO with SAML providers",
"Debugging login issues with identity providers",
"REST API authentication best practices",
"Token refresh flow implementation guide"
]
# Encode all documents
doc_embeddings = model.encode(documents, normalize_embeddings=True)
# Encode query
query = "fix login problems"
query_embedding = model.encode(query, normalize_embeddings=True)
# Compute cosine similarity (dot product since normalized)
vector_scores = np.dot(doc_embeddings, query_embedding)
# Get ranked results
ranked_indices = np.argsort(vector_scores)[::-1]
print("Vector Search Results:")
for idx in ranked_indices[:3]:
print(f" {vector_scores[idx]:.3f}: {documents[idx]}")Vector Search Results: 0.721: Debugging login issues with identity providers 0.654: OAuth2 authentication failure troubleshooting guide 0.589: REST API authentication best practices
Note: "fix login problems" matches "Debugging login issues" even without shared keywords.
Reciprocal Rank Fusion (RRF)
RRF combines ranked lists by giving each document a score based on its position in each list. Documents that rank well in both lists bubble to the top.
RRF Formula
RRF(d) = 1 / (k + rankBM25(d)) + 1 / (k + rankvector(d))
k is typically 60. It prevents division by small numbers and controls how much weight goes to top results.
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
import numpy as np
def rrf_score(rank: int, k: int = 60) -> float:
"""Calculate RRF score for a given rank (0-indexed)"""
return 1.0 / (k + rank + 1)
def hybrid_search(query: str, documents: list, bm25: BM25Okapi,
model: SentenceTransformer, doc_embeddings: np.ndarray,
k: int = 60, top_n: int = 5) -> list:
"""
Perform hybrid search combining BM25 and vector search with RRF fusion.
"""
# BM25 scores and ranks
tokenized_query = query.lower().split()
bm25_scores = bm25.get_scores(tokenized_query)
bm25_ranks = np.argsort(np.argsort(-bm25_scores)) # Rank positions
# Vector scores and ranks
query_emb = model.encode(query, normalize_embeddings=True)
vector_scores = np.dot(doc_embeddings, query_emb)
vector_ranks = np.argsort(np.argsort(-vector_scores))
# Calculate RRF scores
rrf_scores = []
for i in range(len(documents)):
rrf = rrf_score(bm25_ranks[i], k) + rrf_score(vector_ranks[i], k)
rrf_scores.append((i, rrf, bm25_scores[i], vector_scores[i]))
# Sort by RRF score
rrf_scores.sort(key=lambda x: x[1], reverse=True)
return rrf_scores[:top_n]
# Usage example
documents = [
"OAuth2 authentication failure troubleshooting guide",
"How to configure SSO with SAML providers",
"Debugging login issues with identity providers",
"REST API authentication best practices",
"Token refresh flow implementation guide"
]
tokenized_docs = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)
model = SentenceTransformer('BAAI/bge-small-en-v1.5')
doc_embeddings = model.encode(documents, normalize_embeddings=True)
query = "how to fix authentication failure in OAuth2"
results = hybrid_search(query, documents, bm25, model, doc_embeddings)
print("Hybrid Search Results (RRF):")
for idx, rrf, bm25_s, vec_s in results:
print(f" RRF: {rrf:.4f} | BM25: {bm25_s:.2f} | Vec: {vec_s:.2f}")
print(f" {documents[idx]}")When to Weight Keyword vs Semantic
Standard RRF weights both sources equally, but you can adjust weights based on your use case. Use alpha blending to control the balance.
def weighted_rrf_score(bm25_rank: int, vector_rank: int,
alpha: float = 0.5, k: int = 60) -> float:
"""
Weighted RRF: alpha controls balance between BM25 and vector.
alpha = 0.5: equal weight (default)
alpha = 0.7: favor keyword search
alpha = 0.3: favor semantic search
"""
bm25_rrf = rrf_score(bm25_rank, k)
vector_rrf = rrf_score(vector_rank, k)
return alpha * bm25_rrf + (1 - alpha) * vector_rrfFavor Keyword (alpha = 0.7)
Best when:
- - Technical docs with specific terms
- - Error codes and identifiers
- - Legal/compliance documents
- - Product SKUs or model numbers
Equal Weight (alpha = 0.5)
Best when:
- - General purpose search
- - Mixed query types
- - Unknown user behavior
- - Starting point for tuning
Favor Semantic (alpha = 0.3)
Best when:
- - Natural language questions
- - FAQ and support content
- - Conceptual searches
- - Users describe problems in own words
Real Benchmarks: Hybrid Beats Both
Across multiple retrieval benchmarks, hybrid search consistently outperforms either method alone. Here are results from the BEIR benchmark suite.
BEIR Benchmark Results (NDCG@10)
| Dataset | BM25 | Vector | Hybrid | Gain |
|---|---|---|---|---|
| MS MARCO | 0.228 | 0.343 | 0.389 | +13.4% |
| Natural Questions | 0.329 | 0.463 | 0.502 | +8.4% |
| TREC-COVID | 0.656 | 0.567 | 0.712 | +8.5% |
| FiQA (Finance) | 0.236 | 0.295 | 0.341 | +15.6% |
Key Insight
Notice that TREC-COVID is one case where BM25 alone outperforms vector search. This is because COVID-related queries often contain specific medical terms that benefit from exact matching. Hybrid search captures the best of both.
Production Implementation
In production, you typically use a vector database that supports hybrid search natively. Here's how to do it with popular options.
Weaviate (Built-in Hybrid)
import weaviate
client = weaviate.Client("http://localhost:8080")
# Hybrid search with alpha parameter
result = client.query.get(
"Document",
["content", "title"]
).with_hybrid(
query="authentication failure OAuth2",
alpha=0.5 # 0 = pure BM25, 1 = pure vector
).with_limit(10).do()Qdrant (Fusion Mode)
from qdrant_client import QdrantClient, models
client = QdrantClient("localhost", port=6333)
# Hybrid with RRF fusion
results = client.query_points(
collection_name="documents",
prefetch=[
models.Prefetch(query=query_embedding, using="dense", limit=20),
models.Prefetch(query=query_text, using="sparse", limit=20),
],
query=models.FusionQuery(fusion=models.Fusion.RRF),
limit=10
)Elasticsearch (kNN + BM25)
# Hybrid search with Elasticsearch
{
"query": {
"bool": {
"should": [
{
"match": {
"content": "authentication failure OAuth2"
}
},
{
"knn": {
"field": "embedding",
"query_vector": [0.1, 0.2, ...],
"k": 10,
"num_candidates": 100
}
}
]
}
}
}Key Takeaways
- 1
Neither search type is sufficient alone - Keyword search misses synonyms, semantic search misses exact matches. Use both.
- 2
RRF is the standard fusion method - Simple, effective, and no training required. k=60 is a good default.
- 3
Tune alpha based on your content - Technical docs favor keyword, natural language queries favor semantic.
- 4
Hybrid consistently beats single methods - 8-15% improvement on retrieval benchmarks. Worth the extra complexity.