One-Line Summary: Real-world agent queries always involve both kinds of constraint — "find similar tickets for this customer" — so the production pattern is structured filters first, semantic ranking second.

Prerequisites: Lessons 01-structured-retrieval-bigquery-as-a-tool.md and 02-semantic-retrieval-embeddings-and-vector-search.md.

What's the Concept?

Pure semantic search returns the globally most-similar rows. That's almost never what you want. A real query is "similar tickets for this customer," or "documents matching this topic from the last 30 days," or "products related to this query in stock and under $50."

Hybrid retrieval combines:

  1. Structured filters — applied first, narrowing the candidate set by exact predicates (customer ID, date range, status).
  2. Semantic ranking — applied to the filtered set, ordering by similarity.

The order matters: filter first, then rank. Running ANN over the full table and then filtering wastes both compute and recall.

How It Works

BigQuery makes hybrid trivially expressible because VECTOR_SEARCH is just a table function — you can apply WHERE and JOIN like any other SQL:

WITH query AS (
  SELECT ml_generate_embedding_result AS query_vec
  FROM ML.GENERATE_EMBEDDING(
    MODEL `myco.embedding_models.text_embedding_005`,
    (SELECT 'double charge issue' AS content)
  )
),
candidate_tickets AS (
  SELECT * FROM `myco.gold.support_tickets`
  WHERE customer_id = @customer_id              -- structured filter
    AND created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
    AND status IN ('open', 'pending')
)
SELECT base.ticket_id, base.subject, base.body, distance
FROM VECTOR_SEARCH(
  TABLE candidate_tickets,
  'embedding',
  TABLE query,
  top_k => 5,
  distance_type => 'COSINE'
)
ORDER BY distance ASC;

This pattern keeps the strong guarantees of each tier:

  • The structured filter is exact. Customer scoping, time windows, PII boundaries — all honored before any semantic comparison happens.
  • The semantic ranking is fuzzy. It finds matches the user couldn't have named.

Wrapped as an agent tool, the parameters split the same way:

def search_similar_tickets(
    customer_id: str,
    query: str,
    since_days: int = 90,
    statuses: list[str] = ["open", "pending"],
    top_k: int = 5,
) -> list[dict]:
    ...

The agent provides the natural-language query string; the structured constraints come from context the model already knows (the current customer's ID, the time window relevant to the conversation).

Why It Matters

  • Pure semantic isn't safe for multi-tenant systems. Without a customer-ID filter, an embedding query can return another tenant's rows. Structured filters enforce isolation.
  • Recall improves dramatically. Filtering to ~1000 candidates and then ranking finds the right answer more often than searching across 100M rows globally.
  • Latency stays bounded. ANN over filtered candidates is fast. ANN over the universe followed by post-filtering can blow the latency budget.

Key Technical Details

  • BigQuery's VECTOR_SEARCH over a filtered candidate set still uses the vector index if one exists — the index is consulted within the filter scope.
  • For very narrow filters (one customer's 50 rows), you don't even need an index — a plain ORDER BY VECTOR_DISTANCE(...) LIMIT k is faster than an index lookup.
  • Mix in a keyword match (CONTAINS_SUBSTR) for terms the embedding might miss — proper nouns, error codes, model numbers. This is BM25-style hybrid in miniature.
  • Always return distance to the agent. A high distance score tells the model "this is a weak match" and it can choose to say "I couldn't find a confident answer."

Common Misconceptions

"Reciprocal Rank Fusion is the canonical hybrid." RRF blends two ranked lists (lexical + semantic). It's the canonical hybrid for search engines. For agent tools, structured-filter-then-semantic-rank is simpler and usually performs better — because the filter dimensions aren't lexical, they're true predicates.

"The agent can do the filtering itself." It can, but each filter adds a tool round-trip. Filter in SQL, rank in SQL, return one result — fewer round-trips, lower latency, simpler reasoning trace.

"Same embedding for query and corpus is always optimal." Usually yes for symmetric tasks. For asymmetric retrieval (short query → long passage), models like text-embedding-005 have separate "query" and "document" modes — use them.

Connections to Other Concepts

  • 04-the-retrieval-contract-between-pipeline-and-agent.md — How hybrid tools fit into the broader interface design.
  • Course 07-operating-the-system/02-cost-control-on-bigquery-and-vertex-ai.md — Hybrid is usually the cheapest retrieval mode at scale, because filters cut the indexed set.
  • Course 04-refinement-in-bigquery/02-silver-to-gold-modeling-for-agents.md — The gold table feeds hybrid retrieval.

Further Reading

  • Edge et al., "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" (Microsoft Research, 2024) — GraphRAG builds an entity-relationship graph at index time and traverses it at query time alongside the vector index. Strongest on multi-hop questions; expensive to build. https://arxiv.org/abs/2404.16130
  • Anthropic, "Introducing Contextual Retrieval" (Sept 2024) — Combines contextual embeddings with BM25 + reranking; their reported best configuration is the current industry reference point for hybrid RAG. https://www.anthropic.com/research/contextual-retrieval
  • Cormack, Clarke, Buettcher, "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods" (SIGIR, 2009) — The original RRF paper. Still the workhorse blend method for fusing lexical and semantic rankings when you go that route.
  • Karpukhin et al., "Dense Passage Retrieval" (2020) — The semantic half.
  • Robertson & Zaragoza, "The Probabilistic Relevance Framework: BM25 and Beyond" — The lexical half; understanding BM25 is the price of admission for tuning hybrid retrieval.
  • Google Cloud, "BigQuery hybrid search" docs + Vertex AI Search blog posts — Current product-level guidance; specifics evolve quickly.