← AI

AI basics: a getting-started guide

This is a map, not a textbook. If you want to get into AI (either as a researcher pushing on how models work, or as a builder wiring them into real software), this page lays out what's worth learning, in roughly the order it pays off, and points you at where to go deeper. None of it is secret. The field rewards people who are comfortable being confused for a while and keep going.

The two roads

“Doing AI” splits into two overlapping crafts, and being honest with yourself about which you're chasing saves a lot of wasted effort.

Research — inventing or improving the methods: new architectures, training techniques, ways to evaluate or align models. This leans on maths, reading papers, and running experiments. The output is knowledge.
Application — taking models that already exist and building something useful with them: products, tools, internal systems. This leans on software engineering and good judgement about what models can and can't do. The output is a working thing people use.

The two crafts. Most people lean to one, but the best drift between them.

You don't have to pick forever, and the best people drift between them. But you can be a superb application builder without ever deriving a gradient by hand, and you can do real research without shipping a polished product. Read the section that fits, skim the other.

The groundwork everyone needs

Programming

Learn Python. It is the lingua franca of the field: nearly every library, tutorial, and paper's reference code is in it. You need to be comfortable with functions, classes, lists and dictionaries, reading other people's code, and working in a notebook (Jupyter or Google Colab, which gives you a free GPU to start). You do not need to be a software architect on day one.

A little maths

For application work you can get a long way with intuition and pick maths up as you hit it. For research it's not optional. The three pillars are below, and the single best way to build intuition for them is Grant Sanderson's visual 3Blue1Brown series:

Linear algebra — vectors, matrices, dot products. Everything a model does is matrix multiplication, and concepts like embeddings only make sense once vectors feel natural.
Calculus: derivatives and the chain rule. Training is just following gradients downhill; backpropagation is the chain rule applied at scale.
Probability & statistics — distributions, expectation, why a language model is really predicting a probability over the next token.

The good news: you learn these alongside the models, not as a prerequisite gate. Touch the maths when a concept demands it and it sticks far better than a year of abstract study up front.

How the models actually work

Build the mental model in this order; each layer rests on the one before it.

1. Machine learning, the core idea

Instead of writing rules by hand, you show a program many examples and let it learn the rule. You define a loss (a number measuring how wrong the model is) and nudge the model's parameters to make that number smaller: gradient descent. That single loop, repeated billions of times, is the engine under everything else.

Gradient descent: each step nudges the parameters downhill, lowering the loss until it bottoms out. Real models do this across millions of dimensions at once.

2. Neural networks

Layers of simple units, each doing a weighted sum followed by a non-linear squash, stacked deep. “Deep learning” just means many layers. With enough of them and enough data, the network learns its own useful features rather than relying on ones you hand-craft. Training them is the gradient-descent loop above, with backpropagation the bookkeeping that tells each parameter which way to move. (For a from-the-ground-up walkthrough, 3Blue1Brown's neural-networks series is hard to beat.)

A toy network. Information flows left to right; training adjusts the weight on every connection. “Deep” just means more hidden layers than this.

3. Transformers and attention

The 2017 paper Attention Is All You Need introduced the transformer, the architecture behind essentially every modern large model. Its key trick, attention, lets the model weigh how much every piece of the input should influence every other piece, so it can connect a pronoun to the noun it refers to, or a question to the relevant fact, no matter how far apart they sit. If you read one paper this year, read this one (then read Jay Alammar's Illustrated Transformer, the canonical annotated walkthrough).

Attention in one picture: arc width ≈ how much “it” attends to each word. To resolve what “it” means, the model leans hardest on “animal”.

4. Large language models

An LLM is a very large transformer trained on a very large pile of text to do one humble thing: predict the next token (roughly, a word-piece; you can see text split into tokens with a live tokenizer). Do that well enough, at enough scale, and the ability to summarise, translate, write code, and reason starts to emerge. Modern chat models add two stages on top of that raw “pre-training”: instruction tuning (teaching it to follow requests) and alignment via human or AI feedback (teaching it to be helpful, honest, and harmless). Knowing these three stages exist explains most of how a model behaves.

Under the hood, the model turns your prompt into a probability for every possible next token; here it's overwhelmingly sure of “Paris”. It picks the top one (or samples for variety), appends it, and repeats, one token at a time.

A sixty-second history

None of this appeared overnight. The ideas stacked up over decades, with a sharp acceleration once attention met scale.

A rough arc, not the whole story; follow the links below for the primary sources.

1958: Rosenblatt's perceptron, a single trainable artificial neuron.
1986: Rumelhart, Hinton & Williams popularise backpropagation, making it practical to train multi-layer networks.
2012: AlexNet wins ImageNet by a landslide and kicks off the deep-learning boom.
2017: Attention Is All You Need introduces the transformer.
2020: GPT-3 shows that sheer scale buys surprising few-shot ability.
2022: ChatGPT puts a capable chat model in front of everyone and the field goes mainstream.
2024–25: reasoning models and agents, models that plan, call tools, and act in a loop.

Building applications

The headline of the last few years: you can build remarkably capable products without training a model at all. You call one that already exists. Here's the toolkit, simplest first, and the golden rule is to climb down this ladder only when the rung above you demonstrably falls short.

Each rung adds power, but also cost, latency and maintenance. Reach for the lowest one only when you have evidence the simpler ones aren't enough.

Calling a model

Every major lab exposes its models over an HTTP API: you send text, you get text back. Anthropic's Claude, OpenAI's GPT, Google's Gemini, plus open-weight models you can run yourself (Llama, Mistral, Qwen and others via tools like Ollama or vLLM). The chat experiment on this site is exactly this: a browser page talking to a model through a small server-side proxy so the API key never reaches the page, a pattern worth copying.

Prompting

The prompt is your main control surface. A few reliable habits: be specific about the task and the format you want; give a worked example or two (few-shot); ask the model to think step by step for harder problems; and tell it what not to do. Prompting is empirical: try, read the output, adjust. It is the cheapest skill to practise and the one with the fastest payoff. (Anthropic's prompt-engineering guide is a good, vendor-neutral-ish starting point.)

Giving the model your data (RAG)

A model only knows what it was trained on, and it can't cite a document it's never seen. Retrieval-augmented generation fixes that: you store your documents as embeddings (vectors that capture meaning) in a vector database, find the chunks most relevant to a question, and paste them into the prompt as context. Most “chat with your docs” products are RAG under the hood.

Tools and agents

Let a model call functions (search the web, run code, query a database, hit an API) and decide when to use them, and you have an agent: a loop where the model acts, observes the result, and acts again until the task is done. This is where a lot of the current frontier is, and where most of the hard engineering (reliability, error handling, knowing when to stop) lives. Anthropic's Building effective agents is a clear-eyed guide to doing it without over-engineering.

Fine-tuning

Reach for this last. Fine-tuning nudges a model's weights on your own examples to lock in a style, format, or narrow skill. It's more work than prompting or RAG and usually isn't the first answer: try those first, and fine-tune only when you have clear evidence they fall short.

Evaluation

The thing that separates a demo from a product. Decide how you'll know it's working before you scale up: a set of test cases with expected outcomes, a way to score outputs (exact match, a rubric, or another model as judge), and a habit of re-running it whenever you change a prompt or swap a model. Without evals you're tuning blind.

Doing research

If the methods themselves are what draw you, the path looks different.

Implement things from scratch. Nothing teaches like building a small neural net, then a tiny transformer, in plain code. Andrej Karpathy's “Zero to Hero” series and his nanoGPT are the canonical starting points.
Learn to read papers. Skim the abstract, figures, and conclusion first; only then read deeply. arXiv is where the field publishes, often months before anywhere else, and Hugging Face Papers (which also hosts the Papers with Code archive) links many to runnable implementations.
Reproduce a result before you try to beat it. Getting someone else's experiment to actually run is half the skill and teaches you where the real difficulties hide.
Pick a narrow question. Research is made of small, sharp questions, not grand ambitions. Find one corner (an evaluation, an efficiency trick, a failure mode) and go deep.

The frameworks to know are PyTorch (dominant in research) and increasingly JAX; for using pre-trained models, Hugging Face's transformers library and its model hub are indispensable.

A sane learning path

Get comfortable in Python, in a Colab notebook.
Take one structured course end to end: Andrew Ng's Machine Learning and Deep Learning specialisations, or fast.ai's Practical Deep Learning, which starts top-down with working code.
Build a tiny project that calls a model's API. Ship it, however small.
Add RAG, then tools, to that project as you need them.
If research pulls at you, work through Karpathy's series and build a transformer from scratch.
Read one paper a week. Pick the maths up where it blocks you, not before.

Staying current — and grounded

The field moves fast, and most of that motion is noise. You do not need to track every model release. Pick a few signals (a newsletter, a handful of researchers, the labs' own engineering blogs) and ignore the rest. Depth on fundamentals ages far more slowly than the headlines suggest: attention, gradient descent, and good evaluation will still matter long after today's leaderboard is forgotten.

Two cautions worth carrying from the start. Models hallucinate (they produce confident, fluent text that is simply wrong), so verify anything that matters against a primary source. And the field has real questions about safety, bias, and misuse that aren't someone else's department; building responsibly is part of the craft, not an afterthought.

Start small, build something, and let your curiosity pick the next thing to learn. That beats any reading list.

Some of the figures in the charts and diagrams on this page were compiled with the help of AI tools and may contain errors or be out of date. They are shared in good faith for general interest only (not as professional, financial, investment or purchasing advice) and should be checked against the cited primary sources before you rely on them.