← AI

AI basics: a getting-started guide

This is a map, not a textbook. If you want to get into AI — either as a researcher pushing on how models work, or as a builder wiring them into real software — this page lays out what's worth learning, in roughly the order it pays off, and points you at where to go deeper. None of it is secret. The field rewards people who are comfortable being confused for a while and keep going.

The two roads

“Doing AI” splits into two overlapping crafts, and being honest with yourself about which you're chasing saves a lot of wasted effort.

Getting into AI splits into two roads: research, whose output is knowledge, and application, whose output is a working product. Doing AI Research invent the methods · output: knowledge Application build with them · output: a product
The two crafts. Most people lean to one — but the best drift between them.

You don't have to pick forever, and the best people drift between them. But you can be a superb application builder without ever deriving a gradient by hand, and you can do real research without shipping a polished product. Read the section that fits, skim the other.

The groundwork everyone needs

Programming

Learn Python. It is the lingua franca of the field — nearly every library, tutorial, and paper's reference code is in it. You need to be comfortable with functions, classes, lists and dictionaries, reading other people's code, and working in a notebook (Jupyter or Google Colab, which gives you a free GPU to start). You do not need to be a software architect on day one.

A little maths

For application work you can get a long way with intuition and pick maths up as you hit it. For research it's not optional. The three pillars — and the single best way to build intuition for them is Grant Sanderson's visual 3Blue1Brown series:

The good news: you learn these alongside the models, not as a prerequisite gate. Touch the maths when a concept demands it and it sticks far better than a year of abstract study up front.

How the models actually work

Build the mental model in this order — each layer rests on the one before it.

1. Machine learning, the core idea

Instead of writing rules by hand, you show a program many examples and let it learn the rule. You define a loss (a number measuring how wrong the model is) and nudge the model's parameters to make that number smaller — gradient descent. That single loop, repeated billions of times, is the engine under everything else.

A bowl-shaped loss curve with points stepping downhill toward the minimum, illustrating gradient descent. a model parameter → loss → start: high loss minimum
Gradient descent: each step nudges the parameters downhill, lowering the loss until it bottoms out. Real models do this across millions of dimensions at once.

2. Neural networks

Layers of simple units, each doing a weighted sum followed by a non-linear squash, stacked deep. “Deep learning” just means many layers. With enough of them and enough data, the network learns its own useful features rather than relying on ones you hand-craft. Training them is the gradient-descent loop above, with backpropagation the bookkeeping that tells each parameter which way to move. (For a from-the-ground-up walkthrough, 3Blue1Brown's neural-networks series is hard to beat.)

A small neural network: three input nodes connected through two hidden layers to two output nodes. input hidden layers output
A toy network. Information flows left to right; training adjusts the weight on every connection. “Deep” just means more hidden layers than this.

3. Transformers and attention

The 2017 paper Attention Is All You Need introduced the transformer, the architecture behind essentially every modern large model. Its key trick, attention, lets the model weigh how much every piece of the input should influence every other piece — so it can connect a pronoun to the noun it refers to, or a question to the relevant fact, no matter how far apart they sit. If you read one paper this year, read this one (then read Jay Alammar's Illustrated Transformer, the canonical annotated walkthrough).

A sentence where the word "it" attends most strongly to "animal", and more weakly to other words. The animal didn't cross the street because it was too tired
Attention in one picture: arc width ≈ how much “it” attends to each word. To resolve what “it” means, the model leans hardest on “animal”.

4. Large language models

An LLM is a very large transformer trained on a very large pile of text to do one humble thing: predict the next token (roughly, a word-piece — you can see text split into tokens with a live tokenizer). Do that well enough, at enough scale, and the ability to summarise, translate, write code, and reason starts to emerge. Modern chat models add two stages on top of that raw “pre-training”: instruction tuning (teaching it to follow requests) and alignment via human or AI feedback (teaching it to be helpful, honest, and harmless). Knowing these three stages exist explains most of how a model behaves.

For the prompt "The capital of France is", a bar chart of next-token probabilities dominated by "Paris" at 92 percent. The capital of France is P(next token) Paris 92% the 2% a 1.5% located 1.2% home 0.8%
Under the hood, the model turns your prompt into a probability for every possible next token — here it's overwhelmingly sure of “Paris”. It picks the top one (or samples for variety), appends it, and repeats, one token at a time.

A sixty-second history

None of this appeared overnight. The ideas stacked up over decades, with a sharp acceleration once attention met scale.

A timeline of AI milestones from the 1958 perceptron through the 2017 transformer to agents and reasoning models in 2025. 1958 Perceptron 1986 Backprop 2012 AlexNet 2017 Transformer 2020 GPT-3 2022 ChatGPT 2025 Agents
A rough arc, not the whole story — follow the links below for the primary sources.

Building applications

The headline of the last few years: you can build remarkably capable products without training a model at all. You call one that already exists. Here's the toolkit, simplest first — and the golden rule is to climb down this ladder only when the rung above you demonstrably falls short.

A ladder of application techniques from prompting at the top to fine-tuning at the bottom, with effort and control increasing downward. more effort & control Prompting shape the request — start here, it's the cheapest win RAG retrieve your own documents into the prompt Tools & agents let the model act — search, run code, call APIs Fine-tuning change the model's weights — the last resort
Each rung adds power — and cost, latency and maintenance. Reach for the lowest one only when you have evidence the simpler ones aren't enough.

Calling a model

Every major lab exposes its models over an HTTP API — you send text, you get text back. Anthropic's Claude, OpenAI's GPT, Google's Gemini, plus open-weight models you can run yourself (Llama, Mistral, Qwen and others via tools like Ollama or vLLM). The chat experiment on this site is exactly this: a browser page talking to a model through a small server-side proxy so the API key never reaches the page — a pattern worth copying.

Prompting

The prompt is your main control surface. A few reliable habits: be specific about the task and the format you want; give a worked example or two (few-shot); ask the model to think step by step for harder problems; and tell it what not to do. Prompting is empirical — try, read the output, adjust. It is the cheapest skill to practise and the one with the fastest payoff. (Anthropic's prompt-engineering guide is a good, vendor-neutral-ish starting point.)

Giving the model your data (RAG)

A model only knows what it was trained on, and it can't cite a document it's never seen. Retrieval-augmented generation fixes that: you store your documents as embeddings (vectors that capture meaning) in a vector database, find the chunks most relevant to a question, and paste them into the prompt as context. Most “chat with your docs” products are RAG under the hood.

Tools and agents

Let a model call functions — search the web, run code, query a database, hit an API — and decide when to use them, and you have an agent: a loop where the model acts, observes the result, and acts again until the task is done. This is where a lot of the current frontier is, and where most of the hard engineering (reliability, error handling, knowing when to stop) lives. Anthropic's Building effective agents is a clear-eyed guide to doing it without over-engineering.

Fine-tuning

Reach for this last. Fine-tuning nudges a model's weights on your own examples to lock in a style, format, or narrow skill. It's more work than prompting or RAG and usually isn't the first answer — try those first, and fine-tune only when you have clear evidence they fall short.

Evaluation

The thing that separates a demo from a product. Decide how you'll know it's working before you scale up: a set of test cases with expected outcomes, a way to score outputs (exact match, a rubric, or another model as judge), and a habit of re-running it whenever you change a prompt or swap a model. Without evals you're tuning blind.

Doing research

If the methods themselves are what draw you, the path looks different.

The frameworks to know are PyTorch (dominant in research) and increasingly JAX; for using pre-trained models, Hugging Face's transformers library and its model hub are indispensable.

A sane learning path

  1. Get comfortable in Python, in a Colab notebook.
  2. Take one structured course end to end — Andrew Ng's Machine Learning and Deep Learning specialisations, or fast.ai's Practical Deep Learning, which starts top-down with working code.
  3. Build a tiny project that calls a model's API. Ship it, however small.
  4. Add RAG, then tools, to that project as you need them.
  5. If research pulls at you, work through Karpathy's series and build a transformer from scratch.
  6. Read one paper a week. Pick the maths up where it blocks you, not before.

Staying current — and grounded

The field moves fast, and most of that motion is noise. You do not need to track every model release. Pick a few signals — a newsletter, a handful of researchers, the labs' own engineering blogs — and ignore the rest. Depth on fundamentals ages far more slowly than the headlines suggest: attention, gradient descent, and good evaluation will still matter long after today's leaderboard is forgotten.

Two cautions worth carrying from the start. Models hallucinate — they produce confident, fluent text that is simply wrong — so verify anything that matters against a primary source. And the field has real questions about safety, bias, and misuse that aren't someone else's department; building responsibly is part of the craft, not an afterthought.

Start small, build something, and let your curiosity pick the next thing to learn. That beats any reading list.

Some of the figures in the charts and diagrams on this page were compiled with the help of AI tools and may contain errors or be out of date. They are shared in good faith for general interest only — not as professional, financial, investment or purchasing advice — and should be checked against the cited primary sources before you rely on them.