Introducing MiniGPT: Learn How LLMs Work by Building One

A hands-on series: Build a GPT from scratch and finally understand how LLMs actually work

Oct 29, 2025

Most developers use LLMs every day, but few understand how they actually work.
When prompts break or models hallucinate, we treat it like magic failing instead of software misbehaving.
This series changes that.

MiniGPT is a hands-on guide to understanding language models by building one from scratch—small enough to grasp fully, real enough to work.

What You’ll Build

By the end of this series, you’ll build a working MiniGPT that can:

Process text and generate human-like responses
Understand context across multiple sentences
Learn patterns from training data
Predict the next word with surprising accuracy

More importantly, you’ll understand:

Why LLMs sometimes “hallucinate”
Why prompt engineering works (and when it doesn’t)
How to debug LLM behaviour in production
The real constraints and tradeoffs of these systems

Who This Is For

This series is for you if:

✅ You’re a developer who uses LLMs (ChatGPT API, Copilot, etc.)
✅ You want to understand how they actually work
✅ You’re comfortable with Python and basic math (matrix multiplication, probability)
✅ You learn best by building, not just reading

This series is NOT for you if:

❌ You want a quick “10 ChatGPT prompts” listicle
❌ You’re looking for cutting-edge research papers
❌ You want to use OpenAI’s API without understanding the internals

The Approach: Learn by Building

Each post follows the same structure:

A real problem - Why does GPT behave this way?
The concept - Clear explanation with visuals
Build it - Working code you can run and modify
Real-world implications - How this affects production systems

No hand-waving. If we use a concept, we implement it.
No prerequisites beyond Python. I’ll explain the math as we go.
No fluff. Every section moves you toward understanding.

The Roadmap

Part 1: Tokenization (This Week)

Why GPT can’t count letters in “strawberry”

How text becomes numbers
Build a simple tokenizer
Why token limits break your prompts

Part 2: Embeddings

How “king - man + woman = queen” actually works

Turning tokens into vectors
Semantic similarity
Building an embedding layer

Part 3: Attention Is All You Need

The mechanism that changed everything

Self-attention from scratch
Why transformers replaced LSTMs
Implementing multi-head attention

Part 4: The Transformer Architecture

Putting all the pieces together

Encoder-decoder structure
Positional encoding
Building a mini transformer

Part 5: Training and Generation

Making it actually work

Training on real text
Sampling strategies
Why temperature matters

Part 6: Fine-tuning and Prompt Engineering

Making it useful

Transfer learning
Prompt design
Production deployment patterns

What You’ll Need

Python 3.8+ (we’ll use PyTorch, but I’ll explain every line)
Basic linear algebra (don’t worry, I’ll review as we go)
30-45 minutes per week (reading + coding exercises)
Curiosity (most important)

All code is available on GitHub with Colab notebooks you can run in your browser. No GPU required.

What Makes This Different

There are plenty of transformer tutorials out there. Here’s what makes this one different:

1. Production-focused
Every concept connects to real problems you’ll face building LLM apps. Not just “here’s how attention works,” but “here’s why your context window fills up faster than expected.”

2. Complete implementation
We build everything from scratch. No mysterious library calls. When we use PyTorch, you’ll understand what it’s doing under the hood.

3. Progressive complexity
Each part builds on the last. By Part 3, you’ll be reading transformer papers and actually understanding them.

4. Debuggable intuition
The goal isn’t memorization, it’s developing intuition. When something breaks, you’ll know where to look and why.

The Philosophy

Understanding > Completion

I’d rather you deeply understand Parts 1-3 than skim through all 6. Each part is designed to give you a mental model you can build on.

Build > Read

Every concept includes working code. Type it out. Break it. Fix it. That’s where understanding happens.

Why > How

We don’t just implement—we explain the tradeoffs. Why BPE instead of character-level? Why self-attention instead of RNNs? Understanding the “why” makes you a better engineer.

Join Me

Part 1 drops in 3 days: “How GPT Reads Your Words (And Why It Can’t Count Letters)”

We’ll start with the most fundamental question: How does GPT actually “read” your text?

Spoiler: It doesn’t see letters at all.

Want to follow along?

⭐ Star the repo on GitHub to get notified
💻 Clone the code to code along
💬 Join the discussions to ask questions
🔗 Connect on LinkedIn for updates

The best way to learn is to build. Let’s build together.

Async Thinking

Discussion about this post