Hybrid Retrieval: Structured Plus Semantic

One-Line Summary: Real-world agent queries always involve both kinds of constraint — "find similar tickets for this customer" — so the production pattern is structured filters first, semantic ranking second.

Prerequisites: Lessons 01-structured-retrieval-bigquery-as-a-tool.md and 02-semantic-retrieval-embeddings-and-vector-search.md.

What's the Concept?

Pure semantic search returns the globally most-similar rows. That's almost never what you want. A real query is "similar tickets for this customer," or "documents matching this topic from the last 30 days," or "products related to this query in stock and under $50."

Hybrid retrieval combines:

Structured filters — applied first, narrowing the candidate set by exact predicates (customer ID, date range, status).
Semantic ranking — applied to the filtered set, ordering by similarity.

The order matters: filter first, then rank. Running ANN over the full table and then filtering wastes both compute and recall.

How It Works

BigQuery makes hybrid trivially expressible because VECTOR_SEARCH is just a table function — you can apply WHERE and JOIN like any other SQL:

WITH query AS (
  SELECT ml_generate_embedding_result AS query_vec
  FROM ML.GENERATE_EMBEDDING(
    MODEL `myco.embedding_models.text_embedding_005`,
    (SELECT 'double charge issue' AS content)
  )
),
candidate_tickets AS (
  SELECT * FROM `myco.gold.support_tickets`
  WHERE customer_id = @customer_id              -- structured filter
    AND created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
    AND status IN ('open', 'pending')
)
SELECT base.ticket_id, base.subject, base.body, distance
FROM VECTOR_SEARCH(
  TABLE candidate_tickets,
  'embedding',
  TABLE query,
  top_k => 5,
  distance_type => 'COSINE'
)
ORDER BY distance ASC;

This pattern keeps the strong guarantees of each tier:

The structured filter is exact. Customer scoping, time windows, PII boundaries — all honored before any semantic comparison happens.
The semantic ranking is fuzzy. It finds matches the user couldn't have named.

Wrapped as an agent tool, the parameters split the same way:

def search_similar_tickets(
    customer_id: str,
    query: str,
    since_days: int = 90,
    statuses: list[str] = ["open", "pending"],
    top_k: int = 5,
) -> list[dict]:
    ...

The agent provides the natural-language query string; the structured constraints come from context the model already knows (the current customer's ID, the time window relevant to the conversation).

Why It Matters

Pure semantic isn't safe for multi-tenant systems. Without a customer-ID filter, an embedding query can return another tenant's rows. Structured filters enforce isolation.
Recall improves dramatically. Filtering to ~1000 candidates and then ranking finds the right answer more often than searching across 100M rows globally.
Latency stays bounded. ANN over filtered candidates is fast. ANN over the universe followed by post-filtering can blow the latency budget.

Key Technical Details

BigQuery's VECTOR_SEARCH over a filtered candidate set still uses the vector index if one exists — the index is consulted within the filter scope.
For very narrow filters (one customer's 50 rows), you don't even need an index — a plain ORDER BY VECTOR_DISTANCE(...) LIMIT k is faster than an index lookup.
Mix in a keyword match (CONTAINS_SUBSTR) for terms the embedding might miss — proper nouns, error codes, model numbers. This is BM25-style hybrid in miniature.
Always return distance to the agent. A high distance score tells the model "this is a weak match" and it can choose to say "I couldn't find a confident answer."

Common Misconceptions

"Reciprocal Rank Fusion is the canonical hybrid." RRF blends two ranked lists (lexical + semantic). It's the canonical hybrid for search engines. For agent tools, structured-filter-then-semantic-rank is simpler and usually performs better — because the filter dimensions aren't lexical, they're true predicates.

"The agent can do the filtering itself." It can, but each filter adds a tool round-trip. Filter in SQL, rank in SQL, return one result — fewer round-trips, lower latency, simpler reasoning trace.

"Same embedding for query and corpus is always optimal." Usually yes for symmetric tasks. For asymmetric retrieval (short query → long passage), models like text-embedding-005 have separate "query" and "document" modes — use them.

Connections to Other Concepts

04-the-retrieval-contract-between-pipeline-and-agent.md — How hybrid tools fit into the broader interface design.
Course 07-operating-the-system/02-cost-control-on-bigquery-and-vertex-ai.md — Hybrid is usually the cheapest retrieval mode at scale, because filters cut the indexed set.
Course 04-refinement-in-bigquery/02-silver-to-gold-modeling-for-agents.md — The gold table feeds hybrid retrieval.