Post-Training vs. In-Context Learning

If you’ve spent any time working with large language models, you’ve probably encountered two very different ways of getting them to do what you want: post-training and in-context learning. They solve the same fundamental problem — adapting a general-purpose model to a specific task — but they do it in completely different ways.

What Is In-Context Learning?

In-context learning (ICL) is the ability of a language model to perform a task based on examples or instructions provided directly in the prompt. No weights are updated. No training happens. You simply show the model what you want, and it figures out the pattern.

The term was popularized by the GPT-3 paper (Brown et al., 2020), which demonstrated that sufficiently large models could perform tasks they were never explicitly trained on, just by conditioning on a few examples in the prompt.

There are a few flavors:

Zero-shot — You describe the task with no examples. “Translate this sentence to French.”
Few-shot — You provide a handful of input-output pairs before your actual query.
Many-shot — With longer context windows, you can now stuff dozens or even hundreds of examples into the prompt.

Here’s the remarkable thing: none of this changes the model. The weights stay frozen. All the “learning” happens during the forward pass — the model’s attention mechanism identifies patterns in your examples and applies them to the new input. Researchers have shown that this process is surprisingly similar to running gradient descent internally (Dai et al., 2023), except it all happens at inference time.

Why It Works

The leading explanation is that during pre-training on massive corpora, the model encounters such a diverse range of tasks and patterns that it implicitly learns a kind of meta-algorithm for task recognition. When you provide examples in the prompt, you’re not teaching the model something new — you’re helping it locate the right task-solving circuit already encoded in its weights.

Anthropic’s research on “induction heads” (Olsson et al., 2022) identified specific attention head circuits that appear to be a key mechanism behind this capability.

The Tradeoffs

ICL is incredibly flexible. You can change the task on every API call just by changing the prompt. No GPUs, no training pipeline, no datasets. But it comes with real limitations:

Ephemeral — The model forgets everything when the context window resets.
Prompt-sensitive — The ordering and formatting of examples can swing accuracy dramatically. Zhao et al. (2021) showed that just reordering few-shot examples could move performance from near-chance to near-state-of-the-art.
Bounded by context length — You can only fit so many examples before you run out of tokens.
Inference cost — Those demonstration tokens cost money on every single call.

What Is Post-Training?

Post-training is any training procedure applied after the initial pre-training phase. Unlike ICL, post-training actually modifies the model’s weights. The changes are permanent, baked into the model itself.

Pre-training gives a model broad knowledge and linguistic competence by predicting the next token across trillions of tokens of text. Post-training then refines that foundation for specific purposes. Think of pre-training as a general education and post-training as professional specialization.

The Major Forms

Supervised Fine-Tuning (SFT) is the most straightforward approach. You train the model on curated (instruction, response) pairs so it learns to follow instructions and produce useful outputs. This is the basis of “instruction tuning” — what made models like FLAN (Wei et al., 2022) and InstructGPT (Ouyang et al., 2022) so much more usable than raw base models.

RLHF (Reinforcement Learning from Human Feedback) takes it further. A separate reward model is trained on human preference comparisons — “response A is better than response B” — and then used to optimize the language model via reinforcement learning. This is the technique behind ChatGPT and is critical for alignment: making models helpful, harmless, and honest.

DPO (Direct Preference Optimization) simplifies RLHF by skipping the reward model entirely. Rafailov et al. (2023) showed you can optimize directly on preference pairs, getting comparable results with a much simpler pipeline.

Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA (Hu et al., 2022) update only a small fraction of the model’s parameters, dramatically reducing the compute required while retaining most of the benefits. This has made post-training far more accessible — you don’t need a cluster of GPUs to fine-tune a model anymore.

How They Differ

	In-Context Learning	Post-Training
Weight updates	None	Yes
Persistence	Per-session only	Permanent
Infrastructure	Just API access	Training pipeline + GPUs
Data needed	A handful of examples	Hundreds to millions
Flexibility	Change behavior instantly	Requires retraining
Inference cost	Higher (long prompts)	Lower (behavior is internalized)
Performance ceiling	Bounded by existing capabilities	Can exceed ICL, especially on complex tasks
Risk of forgetting	None	Fine-tuning can degrade general capabilities

When to Use Which

Reach for in-context learning when:

You’re prototyping or experimenting
The task changes frequently
You have very few examples
You can’t modify the model (e.g., using a closed API)
The task is within the model’s existing capabilities

Reach for post-training when:

You need consistent, reliable performance at scale
Inference cost matters (those few-shot examples add up)
You need the model to learn genuinely new knowledge or behaviors
You want durable alignment (safety, tone, format)
You’re distilling a larger model’s capabilities into a smaller one

They Work Together

In practice, these aren’t competing approaches — they’re complementary layers in a stack:

Pre-training provides the foundation: broad knowledge and language understanding.
Post-training shapes the model into something useful: instruction-following, aligned, specialized.
In-context learning provides the final layer of customization at inference time.

Here’s an important subtlety: post-training makes in-context learning much better. Instruction-tuned models respond far more reliably to few-shot prompts than base models do. The post-training teaches the model to pay attention to the structure and intent of prompts, which directly improves its ability to learn from in-context examples.

So the question isn’t really “which one should I use?” It’s “what’s the right mix?” For most practitioners working with modern LLMs, you’re already benefiting from post-training (the API model you’re calling has been instruction-tuned and RLHF’d), and you’re applying ICL on top of that every time you write a prompt. The real decision is whether your use case justifies the additional investment of custom fine-tuning on top of what’s already there.

Comments

Came here from LinkedIn or X? Join the conversation below — all discussion lives here.