Semantic Retrieval: Embeddings and Vector Search

One-Line Summary: Embeddings turn arbitrary text into 768- or 1536-dim vectors; cosine-similarity search over those vectors finds rows by meaning, not by exact keys — essential when the agent doesn't know what to ask for by name.

Prerequisites: Lesson 01-structured-retrieval-bigquery-as-a-tool.md.

What's the Concept?

Structured retrieval works when the agent already knows the lookup key — a customer ID, an order number, a date range. Semantic retrieval handles the other half: "find documents that discuss billing disputes" or "look up tickets that sound like this new one."

The mechanism: pre-compute an embedding — a high-dimensional vector — for every chunk of text in the relevant gold table. At query time, embed the agent's search string the same way, then find the rows whose vectors are closest to the query vector (usually by cosine similarity). The result is a ranked list of semantically-similar rows.

On GCP, this is a two-service combination:

Vertex AI Embeddings API to compute embeddings (text-embedding-005, text-embedding-004, gecko family).
BigQuery's native VECTOR column + VECTOR_SEARCH for storage and search, or Vertex AI Vector Search for high-scale managed ANN.

How It Works

The pipeline:

                ┌─────────────────────────────────┐
   silver  ───▶ │  gold.support_tickets           │
                │  ┌──────────┬──────────┐         │
                │  │ ticket_id│ subject  │ body   │
                │  └──────────┴──────────┴────────┘
                │            ▲                     │
                │            │ daily refresh       │
                │  ┌─────────┴──────┐              │
                │  │ Vertex AI       │              │
                │  │ Embeddings API  │              │
                │  └─────────┬──────┘              │
                │            ▼                     │
                │  ┌──────────────────┐            │
                │  │ embedding         │            │
                │  │ VECTOR<768>       │            │
                │  └──────────────────┘            │
                └─────────────────────────────────┘
 
                       ┌────────────────┐
   user query  ─────▶  │ Vertex AI       │  ──▶  query_vec VECTOR<768>
                       │ Embeddings API  │
                       └────────────────┘
                                                 │
                                                 ▼
                                       VECTOR_SEARCH(
                                          gold.support_tickets,
                                          'embedding',
                                          query_vec,
                                          top_k => 5
                                       )
                                                 │
                                                 ▼
                                       top 5 similar ticket rows

The gold table generation in BigQuery uses the built-in ML.GENERATE_EMBEDDING:

CREATE OR REPLACE TABLE `myco.gold.support_tickets`
PARTITION BY DATE(created_at)
AS
SELECT
  t.ticket_id,
  t.customer_id,
  t.subject,
  t.body,
  t.status,
  t.created_at,
  emb.ml_generate_embedding_result AS embedding
FROM `myco.silver.tickets` t,
UNNEST([STRUCT(t.subject || '\n\n' || t.body AS content)]) AS chunk,
LATERAL (
  SELECT ml_generate_embedding_result
  FROM ML.GENERATE_EMBEDDING(
    MODEL `myco.embedding_models.text_embedding_005`,
    (SELECT chunk.content AS content),
    STRUCT(TRUE AS flatten_json_output)
  )
) emb;

Search at query time, also in SQL:

SELECT base.ticket_id, base.subject, base.body, distance
FROM VECTOR_SEARCH(
  TABLE `myco.gold.support_tickets`,
  'embedding',
  (SELECT ml_generate_embedding_result AS query_vec
   FROM ML.GENERATE_EMBEDDING(
     MODEL `myco.embedding_models.text_embedding_005`,
     (SELECT 'customer complained about a double-charge' AS content)
   )),
  top_k => 5,
  distance_type => 'COSINE'
);

The agent's tool wraps this query the same way structured tools wrap their queries.

Why It Matters

The agent can search by meaning. A user asking "did anyone else hit this bug?" doesn't know which ticket IDs to look up; semantic search does.
Chunks become first-class data. Long documents get split into paragraph-sized chunks, each embedded separately. The agent retrieves the relevant chunks, not whole documents.
In-warehouse search keeps the architecture flat. BigQuery's VECTOR_SEARCH runs alongside your regular SQL. No separate vector DB to operate, until you genuinely need one.

Key Technical Details

Embedding dimensions: 768 for text-embedding-005, 3072 for gemini-embedding-001. Higher dims are slightly more accurate but cost more in storage and search latency.
BigQuery vector index types: IVF (cheap, fast build, fine for <10M rows), ScaNN (recommended for >10M). Both indexes are eventually consistent — there's a small delay between insert and searchability.
text-embedding-005 is bilingual-strong and cheap (~$0.0001 per 1000 input chars). Re-embedding a million rows of customer-support data is single-digit dollars.
Distance metric: cosine for normalized text embeddings (always with Vertex AI text models), L2/euclidean for image embeddings.
Embedding cost is recurring — every refresh re-embeds new or changed rows. Cache aggressively; we'll cover that in Module 07.

Common Misconceptions

"Embeddings replace structured search." They complement it. Structured retrieval is faster, cheaper, and exact for known-key lookups. Use both — see 03-hybrid-retrieval-structured-plus-semantic.md.

"Bigger embedding model = better retrieval." Not always. Domain match matters more than parameter count. A general-purpose 768-dim model often beats a specialist 1536-dim model on out-of-domain content.

"Just use a vector database." You eventually might, but starting with BigQuery's native vector search means one fewer service to operate, monitor, and pay for. Graduate when scale forces it.

Connections to Other Concepts

03-hybrid-retrieval-structured-plus-semantic.md — Combining the two retrieval modes.
Course 04-refinement-in-bigquery/02-silver-to-gold-modeling-for-agents.md — Where the embedded gold table is built.
Course 07-operating-the-system/02-cost-control-on-bigquery-and-vertex-ai.md — Embedding cost dominates this part of the bill.