One idea, given the time and the interactive surface area it deserves. Drip pieces are essays you read, not chapters you skim — designed to leave you with a working mental model by the last paragraph.
Why language is hard for machines, what tokens are, and how words become vectors an attention mechanism can compare.
Generation as un-corruption. Drag the slider, see noise resolve into a picture, one denoising step at a time.
softmax(QKᵀ/√d)·V — one operation, repeated. Click, hover, and break a real attention matrix as you read.
Fine-tune massive LLMs on consumer hardware. Learn about Low-Rank Adaptation and 4-bit Quantization.
Before AI can read, it must chop. Learn how text is broken down into the fundamental atoms of meaning.
How do LLMs decide what to say next? Explore greedy vs. probabilistic sampling and log probabilities.
Beyond the prompt: Curate the perfect information to feed your LLM's limited attention span.
Learn how Zero-Shot, Few-Shot, and Chain-of-Thought prompting steer LLM probabilities.
Why doesn't ChatGPT re-read your whole chat every time it types a word? Memory optimization explained.
Predicting the future by assuming simplicity. Learn how this probabilistic algorithm uses Bayes' Theorem for classification.
Strength in numbers. See how an ensemble of diverse decision trees can vote to make robust predictions.
The classic algorithm that finds the widest possible street between two classes of data.
From collaborative filtering to matrix factorization: how Netflix knows what you want before you do.
The architecture that solved translation before Transformers. Learn about the Context Vector bottleneck.
The 2012 breakthrough that started the Deep Learning era. ReLU, Dropout, and GPUs.
Before Transformers, RNNs learned to focus. The mechanism that solved the bottleneck problem.
Understand how AI processes sequential data using hidden states and memory loops.
See how 'Masked Language Model' training enables deep context from both directions.
Learn how breaking images into 16x16 patches allowed pure Transformers to beat CNNs.
Discover the architecture behind precise image segmentation and preserving fine details.
'You Only Look Once': Real-time object detection framed as a single regression problem.
New Research: 10 connected MCP servers = ~8K tokens of schemas loaded before your prompt starts. Many-small > one-big, and how to lazy-load.
New Research: Prompts are code. The four-stage CI pipeline (local dev → PR check → deploy gate → production monitor) that replaces saturated benchmarks.
New Research: 66% of developers said the same thing — AI code that is almost right. The verification cascade and the four error classes it catches.
New Research: 60% of all LLM errors are rate limits, not model errors. The five harness layers that decide reliability.
New Research: Why your agent forgets the rules by turn 15 — and the four operations (Write, Select, Compress, Isolate) production teams converged on.
New Research: What 1,200+ production deployments taught us about putting LLM agents into the extract-transform-load loop — and the two-layer sandwich that survives contact with real data.
Build Along: 80 lines, no dependencies. Three small MCP servers, the tool-bloat problem they create together, and the router that fixes it. Companion code to Multi-MCP Architecture.
Build Along: the smallest honest eval suite — dataset, system under test, scorers, and a CI gate that exits non-zero below threshold. Companion code to Eval-Driven Development.
Build Along: 70 lines that show the two-layer sandwich — a fuzzy LLM transform wrapped in deterministic validation on both sides. Companion code to Agentic ETL.
Build Along: the generate → verify → feed-failure-back loop in 60 lines. The model writes; the verifier decides. Companion code to Verifying AI Code.
New Research: A hybrid, parameter-adaptive RAG system designed specifically for high-stakes legal applications.
When RAG gets smart. Learn how adding an autonomous agent loop enables multi-hop reasoning and self-correction.
New Research: Combining GraphRAG and VectorRAG with an autonomous router for scientific literature review.
Google Cloud Architecture: From simple prompts to complex multi-agent systems.
Give AI an open-book test. Connect LLMs to external knowledge bases for accurate answers.
Go beyond basic vector search with Reranking, Hybrid Search, and Query Expansion for production-grade accuracy.
Research Deep Dive: Why Small Language Models (SLMs) are replacing monolithic LLMs.
New Research: Hyperagents (Meta FAIR), Recursive Language Models (MIT), and GMPO (Microsoft / ICLR). Three architectural moves from a single month.
New Research: When models think too much, they often talk themselves out of the correct answer.
New Research: What if LLMs didn't have to 'think' in words? Explore reasoning directly in continuous latent space.
New Research: A single model that can dynamically switch between fast responses and deep reasoning modes.
New Research: How a 7B model approached GPT-4 math performance by ditching the RL 'Critic' model.
New Research: An open-source thinking agent that interleaves reasoning with tool use (300+ steps).
New Research: Compressing long documents into highly efficient 2D visual tokens instead of text.
New Research: Can AI models learn to hide their dangerous thoughts from safety monitors?
New Research: Why are Transformers so robust? They naturally learn 'low sensitivity' functions.
New Research: An unsupervised method that uses 'sticky' keywords to find topic boundaries.
New Research: Does Supervised Fine-Tuning just memorize while RL actually learns rules?
New Research: Anthropic's 2026 method for translating Claude's internal activations directly into human-readable English.