Level 3: Production~25 min

Hybrid Search

Combine keyword and semantic search for production-grade retrieval. Neither alone is enough.

The Problem: Neither Search is Perfect

In production systems, you quickly discover that pure semantic search and pure keyword search each have critical blind spots. Hybrid search combines both to eliminate these weaknesses.

Semantic Search Failures

  • -Misses exact matches: "error code 0x8007045D"
  • -Struggles with proper nouns and IDs
  • -May find "similar" but not what user typed
  • -Embeddings can conflate similar concepts

Keyword Search Failures

  • -Misses synonyms: "car" vs "automobile"
  • -Fails on paraphrases and rewordings
  • -No understanding of context or intent
  • -Vocabulary mismatch between query and docs

Concrete Example

Query: "how to fix authentication failure in OAuth2"

Keyword (BM25) returns:

"OAuth2 authentication failure troubleshooting guide"

Exact keyword match

Semantic returns:

"Debugging login issues with identity providers"

Meaning match (no "OAuth2" keyword)

Hybrid search returns both - the exact match AND the semantic match, giving users the best of both worlds.

The Solution: Hybrid Search

Hybrid search runs both keyword (BM25) and vector search in parallel, then combines the results using a fusion algorithm. The most common approach is Reciprocal Rank Fusion (RRF).

Architecture Overview

Input

Query

->

Path 1

BM25

Keyword

Path 2

Vector

Semantic

->

Fusion

RRF

->

Output

Results

BM25: The Keyword Engine

BM25 (Best Matching 25) is the industry-standard keyword ranking algorithm. It scores documents based on term frequency (TF) and inverse document frequency (IDF), with saturation to prevent keyword stuffing.

# Install: pip install rank-bm25
from rank_bm25 import BM25Okapi
import numpy as np

documents = [
    "OAuth2 authentication failure troubleshooting guide",
    "How to configure SSO with SAML providers",
    "Debugging login issues with identity providers",
    "REST API authentication best practices",
    "Token refresh flow implementation guide"
]

# Tokenize documents (simple whitespace split, can use better tokenizer)
tokenized_docs = [doc.lower().split() for doc in documents]

# Create BM25 index
bm25 = BM25Okapi(tokenized_docs)

# Search
query = "authentication failure OAuth2"
tokenized_query = query.lower().split()
bm25_scores = bm25.get_scores(tokenized_query)

# Get ranked results
ranked_indices = np.argsort(bm25_scores)[::-1]
print("BM25 Results:")
for idx in ranked_indices[:3]:
    print(f"  {bm25_scores[idx]:.3f}: {documents[idx]}")
Expected output:
BM25 Results:
  2.847: OAuth2 authentication failure troubleshooting guide
  1.123: REST API authentication best practices
  0.892: Debugging login issues with identity providers

BM25 Parameters

k1 (default: 1.5)

Controls term frequency saturation. Higher = more weight to repeated terms.

b (default: 0.75)

Length normalization. 0 = no normalization, 1 = full normalization.

Vector Search: The Semantic Engine

Vector search finds semantically similar content even when the exact words differ. We covered this in Lesson 1.1, but here's a quick refresher for the hybrid context.

# pip install sentence-transformers
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('BAAI/bge-small-en-v1.5')

documents = [
    "OAuth2 authentication failure troubleshooting guide",
    "How to configure SSO with SAML providers",
    "Debugging login issues with identity providers",
    "REST API authentication best practices",
    "Token refresh flow implementation guide"
]

# Encode all documents
doc_embeddings = model.encode(documents, normalize_embeddings=True)

# Encode query
query = "fix login problems"
query_embedding = model.encode(query, normalize_embeddings=True)

# Compute cosine similarity (dot product since normalized)
vector_scores = np.dot(doc_embeddings, query_embedding)

# Get ranked results
ranked_indices = np.argsort(vector_scores)[::-1]
print("Vector Search Results:")
for idx in ranked_indices[:3]:
    print(f"  {vector_scores[idx]:.3f}: {documents[idx]}")
Expected output:
Vector Search Results:
  0.721: Debugging login issues with identity providers
  0.654: OAuth2 authentication failure troubleshooting guide
  0.589: REST API authentication best practices

Note: "fix login problems" matches "Debugging login issues" even without shared keywords.

Reciprocal Rank Fusion (RRF)

RRF combines ranked lists by giving each document a score based on its position in each list. Documents that rank well in both lists bubble to the top.

RRF Formula

RRF(d) = 1 / (k + rankBM25(d)) + 1 / (k + rankvector(d))

k is typically 60. It prevents division by small numbers and controls how much weight goes to top results.

from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
import numpy as np

def rrf_score(rank: int, k: int = 60) -> float:
    """Calculate RRF score for a given rank (0-indexed)"""
    return 1.0 / (k + rank + 1)

def hybrid_search(query: str, documents: list, bm25: BM25Okapi,
                  model: SentenceTransformer, doc_embeddings: np.ndarray,
                  k: int = 60, top_n: int = 5) -> list:
    """
    Perform hybrid search combining BM25 and vector search with RRF fusion.
    """
    # BM25 scores and ranks
    tokenized_query = query.lower().split()
    bm25_scores = bm25.get_scores(tokenized_query)
    bm25_ranks = np.argsort(np.argsort(-bm25_scores))  # Rank positions

    # Vector scores and ranks
    query_emb = model.encode(query, normalize_embeddings=True)
    vector_scores = np.dot(doc_embeddings, query_emb)
    vector_ranks = np.argsort(np.argsort(-vector_scores))

    # Calculate RRF scores
    rrf_scores = []
    for i in range(len(documents)):
        rrf = rrf_score(bm25_ranks[i], k) + rrf_score(vector_ranks[i], k)
        rrf_scores.append((i, rrf, bm25_scores[i], vector_scores[i]))

    # Sort by RRF score
    rrf_scores.sort(key=lambda x: x[1], reverse=True)

    return rrf_scores[:top_n]

# Usage example
documents = [
    "OAuth2 authentication failure troubleshooting guide",
    "How to configure SSO with SAML providers",
    "Debugging login issues with identity providers",
    "REST API authentication best practices",
    "Token refresh flow implementation guide"
]

tokenized_docs = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)
model = SentenceTransformer('BAAI/bge-small-en-v1.5')
doc_embeddings = model.encode(documents, normalize_embeddings=True)

query = "how to fix authentication failure in OAuth2"
results = hybrid_search(query, documents, bm25, model, doc_embeddings)

print("Hybrid Search Results (RRF):")
for idx, rrf, bm25_s, vec_s in results:
    print(f"  RRF: {rrf:.4f} | BM25: {bm25_s:.2f} | Vec: {vec_s:.2f}")
    print(f"       {documents[idx]}")

When to Weight Keyword vs Semantic

Standard RRF weights both sources equally, but you can adjust weights based on your use case. Use alpha blending to control the balance.

def weighted_rrf_score(bm25_rank: int, vector_rank: int,
                       alpha: float = 0.5, k: int = 60) -> float:
    """
    Weighted RRF: alpha controls balance between BM25 and vector.
    alpha = 0.5: equal weight (default)
    alpha = 0.7: favor keyword search
    alpha = 0.3: favor semantic search
    """
    bm25_rrf = rrf_score(bm25_rank, k)
    vector_rrf = rrf_score(vector_rank, k)
    return alpha * bm25_rrf + (1 - alpha) * vector_rrf

Favor Keyword (alpha = 0.7)

Best when:

  • - Technical docs with specific terms
  • - Error codes and identifiers
  • - Legal/compliance documents
  • - Product SKUs or model numbers

Equal Weight (alpha = 0.5)

Best when:

  • - General purpose search
  • - Mixed query types
  • - Unknown user behavior
  • - Starting point for tuning

Favor Semantic (alpha = 0.3)

Best when:

  • - Natural language questions
  • - FAQ and support content
  • - Conceptual searches
  • - Users describe problems in own words

Real Benchmarks: Hybrid Beats Both

Across multiple retrieval benchmarks, hybrid search consistently outperforms either method alone. Here are results from the BEIR benchmark suite.

BEIR Benchmark Results (NDCG@10)

DatasetBM25VectorHybridGain
MS MARCO0.2280.3430.389+13.4%
Natural Questions0.3290.4630.502+8.4%
TREC-COVID0.6560.5670.712+8.5%
FiQA (Finance)0.2360.2950.341+15.6%

Key Insight

Notice that TREC-COVID is one case where BM25 alone outperforms vector search. This is because COVID-related queries often contain specific medical terms that benefit from exact matching. Hybrid search captures the best of both.

Production Implementation

In production, you typically use a vector database that supports hybrid search natively. Here's how to do it with popular options.

Weaviate (Built-in Hybrid)

import weaviate

client = weaviate.Client("http://localhost:8080")

# Hybrid search with alpha parameter
result = client.query.get(
    "Document",
    ["content", "title"]
).with_hybrid(
    query="authentication failure OAuth2",
    alpha=0.5  # 0 = pure BM25, 1 = pure vector
).with_limit(10).do()

Qdrant (Fusion Mode)

from qdrant_client import QdrantClient, models

client = QdrantClient("localhost", port=6333)

# Hybrid with RRF fusion
results = client.query_points(
    collection_name="documents",
    prefetch=[
        models.Prefetch(query=query_embedding, using="dense", limit=20),
        models.Prefetch(query=query_text, using="sparse", limit=20),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=10
)

Elasticsearch (kNN + BM25)

# Hybrid search with Elasticsearch
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "content": "authentication failure OAuth2"
          }
        },
        {
          "knn": {
            "field": "embedding",
            "query_vector": [0.1, 0.2, ...],
            "k": 10,
            "num_candidates": 100
          }
        }
      ]
    }
  }
}

Key Takeaways

  • 1

    Neither search type is sufficient alone - Keyword search misses synonyms, semantic search misses exact matches. Use both.

  • 2

    RRF is the standard fusion method - Simple, effective, and no training required. k=60 is a good default.

  • 3

    Tune alpha based on your content - Technical docs favor keyword, natural language queries favor semantic.

  • 4

    Hybrid consistently beats single methods - 8-15% improvement on retrieval benchmarks. Worth the extra complexity.