AI basics: a getting-started guide
This is a map, not a textbook. If you want to get into AI — either as a researcher pushing on how models work, or as a builder wiring them into real software — this page lays out what's worth learning, in roughly the order it pays off, and points you at where to go deeper. None of it is secret. The field rewards people who are comfortable being confused for a while and keep going.
The two roads
“Doing AI” splits into two overlapping crafts, and being honest with yourself about which you're chasing saves a lot of wasted effort.
- Research — inventing or improving the methods: new architectures, training techniques, ways to evaluate or align models. This leans on maths, reading papers, and running experiments. The output is knowledge.
- Application — taking models that already exist and building something useful with them: products, tools, internal systems. This leans on software engineering and good judgement about what models can and can't do. The output is a working thing people use.
You don't have to pick forever, and the best people drift between them. But you can be a superb application builder without ever deriving a gradient by hand, and you can do real research without shipping a polished product. Read the section that fits, skim the other.
The groundwork everyone needs
Programming
Learn Python. It is the lingua franca of the field — nearly every library, tutorial, and paper's reference code is in it. You need to be comfortable with functions, classes, lists and dictionaries, reading other people's code, and working in a notebook (Jupyter or Google Colab, which gives you a free GPU to start). You do not need to be a software architect on day one.
A little maths
For application work you can get a long way with intuition and pick maths up as you hit it. For research it's not optional. The three pillars — and the single best way to build intuition for them is Grant Sanderson's visual 3Blue1Brown series:
- Linear algebra — vectors, matrices, dot products. Everything a model does is matrix multiplication, and concepts like embeddings only make sense once vectors feel natural.
- Calculus — derivatives and the chain rule. Training is just following gradients downhill; backpropagation is the chain rule applied at scale.
- Probability & statistics — distributions, expectation, why a language model is really predicting a probability over the next token.
The good news: you learn these alongside the models, not as a prerequisite gate. Touch the maths when a concept demands it and it sticks far better than a year of abstract study up front.
How the models actually work
Build the mental model in this order — each layer rests on the one before it.
1. Machine learning, the core idea
Instead of writing rules by hand, you show a program many examples and let it learn the rule. You define a loss (a number measuring how wrong the model is) and nudge the model's parameters to make that number smaller — gradient descent. That single loop, repeated billions of times, is the engine under everything else.
2. Neural networks
Layers of simple units, each doing a weighted sum followed by a non-linear squash, stacked deep. “Deep learning” just means many layers. With enough of them and enough data, the network learns its own useful features rather than relying on ones you hand-craft. Training them is the gradient-descent loop above, with backpropagation the bookkeeping that tells each parameter which way to move. (For a from-the-ground-up walkthrough, 3Blue1Brown's neural-networks series is hard to beat.)
3. Transformers and attention
The 2017 paper Attention Is All You Need introduced the transformer, the architecture behind essentially every modern large model. Its key trick, attention, lets the model weigh how much every piece of the input should influence every other piece — so it can connect a pronoun to the noun it refers to, or a question to the relevant fact, no matter how far apart they sit. If you read one paper this year, read this one (then read Jay Alammar's Illustrated Transformer, the canonical annotated walkthrough).
4. Large language models
An LLM is a very large transformer trained on a very large pile of text to do one humble thing: predict the next token (roughly, a word-piece — you can see text split into tokens with a live tokenizer). Do that well enough, at enough scale, and the ability to summarise, translate, write code, and reason starts to emerge. Modern chat models add two stages on top of that raw “pre-training”: instruction tuning (teaching it to follow requests) and alignment via human or AI feedback (teaching it to be helpful, honest, and harmless). Knowing these three stages exist explains most of how a model behaves.
A sixty-second history
None of this appeared overnight. The ideas stacked up over decades, with a sharp acceleration once attention met scale.
- 1958 — Rosenblatt's perceptron, a single trainable artificial neuron.
- 1986 — Rumelhart, Hinton & Williams popularise backpropagation, making it practical to train multi-layer networks.
- 2012 — AlexNet wins ImageNet by a landslide and kicks off the deep-learning boom.
- 2017 — Attention Is All You Need introduces the transformer.
- 2020 — GPT-3 shows that sheer scale buys surprising few-shot ability.
- 2022 — ChatGPT puts a capable chat model in front of everyone and the field goes mainstream.
- 2024–25 — reasoning models and agents — models that plan, call tools, and act in a loop.
Building applications
The headline of the last few years: you can build remarkably capable products without training a model at all. You call one that already exists. Here's the toolkit, simplest first — and the golden rule is to climb down this ladder only when the rung above you demonstrably falls short.
Calling a model
Every major lab exposes its models over an HTTP API — you send text, you get text back. Anthropic's Claude, OpenAI's GPT, Google's Gemini, plus open-weight models you can run yourself (Llama, Mistral, Qwen and others via tools like Ollama or vLLM). The chat experiment on this site is exactly this: a browser page talking to a model through a small server-side proxy so the API key never reaches the page — a pattern worth copying.
Prompting
The prompt is your main control surface. A few reliable habits: be specific about the task and the format you want; give a worked example or two (few-shot); ask the model to think step by step for harder problems; and tell it what not to do. Prompting is empirical — try, read the output, adjust. It is the cheapest skill to practise and the one with the fastest payoff. (Anthropic's prompt-engineering guide is a good, vendor-neutral-ish starting point.)
Giving the model your data (RAG)
A model only knows what it was trained on, and it can't cite a document it's never seen. Retrieval-augmented generation fixes that: you store your documents as embeddings (vectors that capture meaning) in a vector database, find the chunks most relevant to a question, and paste them into the prompt as context. Most “chat with your docs” products are RAG under the hood.
Tools and agents
Let a model call functions — search the web, run code, query a database, hit an API — and decide when to use them, and you have an agent: a loop where the model acts, observes the result, and acts again until the task is done. This is where a lot of the current frontier is, and where most of the hard engineering (reliability, error handling, knowing when to stop) lives. Anthropic's Building effective agents is a clear-eyed guide to doing it without over-engineering.
Fine-tuning
Reach for this last. Fine-tuning nudges a model's weights on your own examples to lock in a style, format, or narrow skill. It's more work than prompting or RAG and usually isn't the first answer — try those first, and fine-tune only when you have clear evidence they fall short.
Evaluation
The thing that separates a demo from a product. Decide how you'll know it's working before you scale up: a set of test cases with expected outcomes, a way to score outputs (exact match, a rubric, or another model as judge), and a habit of re-running it whenever you change a prompt or swap a model. Without evals you're tuning blind.
Doing research
If the methods themselves are what draw you, the path looks different.
-
Implement things from scratch. Nothing teaches like
building a small neural net, then a tiny transformer, in plain code. Andrej
Karpathy's
“Zero to Hero”
series and his
nanoGPTare the canonical starting points. - Learn to read papers. Skim the abstract, figures, and conclusion first; only then read deeply. arXiv is where the field publishes, often months before anywhere else, and Papers with Code links many to runnable implementations.
- Reproduce a result before you try to beat it. Getting someone else's experiment to actually run is half the skill and teaches you where the real difficulties hide.
- Pick a narrow question. Research is made of small, sharp questions, not grand ambitions. Find one corner — an evaluation, an efficiency trick, a failure mode — and go deep.
The frameworks to know are PyTorch
(dominant in research) and increasingly
JAX; for using
pre-trained models, Hugging Face's
transformers
library and its model hub are
indispensable.
A sane learning path
- Get comfortable in Python, in a Colab notebook.
- Take one structured course end to end — Andrew Ng's Machine Learning and Deep Learning specialisations, or fast.ai's Practical Deep Learning, which starts top-down with working code.
- Build a tiny project that calls a model's API. Ship it, however small.
- Add RAG, then tools, to that project as you need them.
- If research pulls at you, work through Karpathy's series and build a transformer from scratch.
- Read one paper a week. Pick the maths up where it blocks you, not before.
Staying current — and grounded
The field moves fast, and most of that motion is noise. You do not need to track every model release. Pick a few signals — a newsletter, a handful of researchers, the labs' own engineering blogs — and ignore the rest. Depth on fundamentals ages far more slowly than the headlines suggest: attention, gradient descent, and good evaluation will still matter long after today's leaderboard is forgotten.
Two cautions worth carrying from the start. Models hallucinate — they produce confident, fluent text that is simply wrong — so verify anything that matters against a primary source. And the field has real questions about safety, bias, and misuse that aren't someone else's department; building responsibly is part of the craft, not an afterthought.
Start small, build something, and let your curiosity pick the next thing to learn. That beats any reading list.
Some of the figures in the charts and diagrams on this page were compiled with the help of AI tools and may contain errors or be out of date. They are shared in good faith for general interest only — not as professional, financial, investment or purchasing advice — and should be checked against the cited primary sources before you rely on them.