One-Line Summary: Real-world agent queries always involve both kinds of constraint — "find similar tickets for this customer" — so the production pattern is structured filters first, semantic ranking second.
Prerequisites: Lessons 01-structured-retrieval-bigquery-as-a-tool.md and 02-semantic-retrieval-embeddings-and-vector-search.md.
What's the Concept?
Pure semantic search returns the globally most-similar rows. That's almost never what you want. A real query is "similar tickets for this customer," or "documents matching this topic from the last 30 days," or "products related to this query in stock and under $50."
Hybrid retrieval combines:
- Structured filters — applied first, narrowing the candidate set by exact predicates (customer ID, date range, status).
- Semantic ranking — applied to the filtered set, ordering by similarity.
The order matters: filter first, then rank. Running ANN over the full table and then filtering wastes both compute and recall.
How It Works
BigQuery makes hybrid trivially expressible because VECTOR_SEARCH is just a table function — you can apply WHERE and JOIN like any other SQL:
WITH query AS (
SELECT ml_generate_embedding_result AS query_vec
FROM ML.GENERATE_EMBEDDING(
MODEL `myco.embedding_models.text_embedding_005`,
(SELECT 'double charge issue' AS content)
)
),
candidate_tickets AS (
SELECT * FROM `myco.gold.support_tickets`
WHERE customer_id = @customer_id -- structured filter
AND created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
AND status IN ('open', 'pending')
)
SELECT base.ticket_id, base.subject, base.body, distance
FROM VECTOR_SEARCH(
TABLE candidate_tickets,
'embedding',
TABLE query,
top_k => 5,
distance_type => 'COSINE'
)
ORDER BY distance ASC;This pattern keeps the strong guarantees of each tier:
- The structured filter is exact. Customer scoping, time windows, PII boundaries — all honored before any semantic comparison happens.
- The semantic ranking is fuzzy. It finds matches the user couldn't have named.
Wrapped as an agent tool, the parameters split the same way:
def search_similar_tickets(
customer_id: str,
query: str,
since_days: int = 90,
statuses: list[str] = ["open", "pending"],
top_k: int = 5,
) -> list[dict]:
...The agent provides the natural-language query string; the structured constraints come from context the model already knows (the current customer's ID, the time window relevant to the conversation).
Why It Matters
- Pure semantic isn't safe for multi-tenant systems. Without a customer-ID filter, an embedding query can return another tenant's rows. Structured filters enforce isolation.
- Recall improves dramatically. Filtering to ~1000 candidates and then ranking finds the right answer more often than searching across 100M rows globally.
- Latency stays bounded. ANN over filtered candidates is fast. ANN over the universe followed by post-filtering can blow the latency budget.
Key Technical Details
- BigQuery's
VECTOR_SEARCHover a filtered candidate set still uses the vector index if one exists — the index is consulted within the filter scope. - For very narrow filters (one customer's 50 rows), you don't even need an index — a plain
ORDER BY VECTOR_DISTANCE(...) LIMIT kis faster than an index lookup. - Mix in a keyword match (
CONTAINS_SUBSTR) for terms the embedding might miss — proper nouns, error codes, model numbers. This is BM25-style hybrid in miniature. - Always return
distanceto the agent. A high distance score tells the model "this is a weak match" and it can choose to say "I couldn't find a confident answer."
Common Misconceptions
"Reciprocal Rank Fusion is the canonical hybrid." RRF blends two ranked lists (lexical + semantic). It's the canonical hybrid for search engines. For agent tools, structured-filter-then-semantic-rank is simpler and usually performs better — because the filter dimensions aren't lexical, they're true predicates.
"The agent can do the filtering itself." It can, but each filter adds a tool round-trip. Filter in SQL, rank in SQL, return one result — fewer round-trips, lower latency, simpler reasoning trace.
"Same embedding for query and corpus is always optimal." Usually yes for symmetric tasks. For asymmetric retrieval (short query → long passage), models like text-embedding-005 have separate "query" and "document" modes — use them.
Connections to Other Concepts
04-the-retrieval-contract-between-pipeline-and-agent.md— How hybrid tools fit into the broader interface design.- Course
07-operating-the-system/02-cost-control-on-bigquery-and-vertex-ai.md— Hybrid is usually the cheapest retrieval mode at scale, because filters cut the indexed set. - Course
04-refinement-in-bigquery/02-silver-to-gold-modeling-for-agents.md— The gold table feeds hybrid retrieval.
Further Reading
- Edge et al., "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" (Microsoft Research, 2024) — GraphRAG builds an entity-relationship graph at index time and traverses it at query time alongside the vector index. Strongest on multi-hop questions; expensive to build. https://arxiv.org/abs/2404.16130
- Anthropic, "Introducing Contextual Retrieval" (Sept 2024) — Combines contextual embeddings with BM25 + reranking; their reported best configuration is the current industry reference point for hybrid RAG. https://www.anthropic.com/research/contextual-retrieval
- Cormack, Clarke, Buettcher, "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods" (SIGIR, 2009) — The original RRF paper. Still the workhorse blend method for fusing lexical and semantic rankings when you go that route.
- Karpukhin et al., "Dense Passage Retrieval" (2020) — The semantic half.
- Robertson & Zaragoza, "The Probabilistic Relevance Framework: BM25 and Beyond" — The lexical half; understanding BM25 is the price of admission for tuning hybrid retrieval.
- Google Cloud, "BigQuery hybrid search" docs + Vertex AI Search blog posts — Current product-level guidance; specifics evolve quickly.