Post-Training vs. In-Context Learning


If you’ve spent any time working with large language models, you’ve probably encountered two very different ways of getting them to do what you want: post-training and in-context learning. They solve the same fundamental problem — adapting a general-purpose model to a specific task — but they do it in completely different ways.

What Is In-Context Learning?

In-context learning (ICL) is the ability of a language model to perform a task based on examples or instructions provided directly in the prompt. No weights are updated. No training happens. You simply show the model what you want, and it figures out the pattern.

The term was popularized by the GPT-3 paper (Brown et al., 2020), which demonstrated that sufficiently large models could perform tasks they were never explicitly trained on, just by conditioning on a few examples in the prompt.

There are a few flavors:

  1. Zero-shot — You describe the task with no examples. “Translate this sentence to French.”
  2. Few-shot — You provide a handful of input-output pairs before your actual query.
  3. Many-shot — With longer context windows, you can now stuff dozens or even hundreds of examples into the prompt.

Here’s the remarkable thing: none of this changes the model. The weights stay frozen. All the “learning” happens during the forward pass — the model’s attention mechanism identifies patterns in your examples and applies them to the new input. Researchers have shown that this process is surprisingly similar to running gradient descent internally (Dai et al., 2023), except it all happens at inference time.

Why It Works

The leading explanation is that during pre-training on massive corpora, the model encounters such a diverse range of tasks and patterns that it implicitly learns a kind of meta-algorithm for task recognition. When you provide examples in the prompt, you’re not teaching the model something new — you’re helping it locate the right task-solving circuit already encoded in its weights.

Anthropic’s research on “induction heads” (Olsson et al., 2022) identified specific attention head circuits that appear to be a key mechanism behind this capability.

The Tradeoffs

ICL is incredibly flexible. You can change the task on every API call just by changing the prompt. No GPUs, no training pipeline, no datasets. But it comes with real limitations:

  • Ephemeral — The model forgets everything when the context window resets.
  • Prompt-sensitive — The ordering and formatting of examples can swing accuracy dramatically. Zhao et al. (2021) showed that just reordering few-shot examples could move performance from near-chance to near-state-of-the-art.
  • Bounded by context length — You can only fit so many examples before you run out of tokens.
  • Inference cost — Those demonstration tokens cost money on every single call.

What Is Post-Training?

Post-training is any training procedure applied after the initial pre-training phase. Unlike ICL, post-training actually modifies the model’s weights. The changes are permanent, baked into the model itself.

Pre-training gives a model broad knowledge and linguistic competence by predicting the next token across trillions of tokens of text. Post-training then refines that foundation for specific purposes. Think of pre-training as a general education and post-training as professional specialization.

The Major Forms

Supervised Fine-Tuning (SFT) is the most straightforward approach. You train the model on curated (instruction, response) pairs so it learns to follow instructions and produce useful outputs. This is the basis of “instruction tuning” — what made models like FLAN (Wei et al., 2022) and InstructGPT (Ouyang et al., 2022) so much more usable than raw base models.

RLHF (Reinforcement Learning from Human Feedback) takes it further. A separate reward model is trained on human preference comparisons — “response A is better than response B” — and then used to optimize the language model via reinforcement learning. This is the technique behind ChatGPT and is critical for alignment: making models helpful, harmless, and honest.

DPO (Direct Preference Optimization) simplifies RLHF by skipping the reward model entirely. Rafailov et al. (2023) showed you can optimize directly on preference pairs, getting comparable results with a much simpler pipeline.

Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA (Hu et al., 2022) update only a small fraction of the model’s parameters, dramatically reducing the compute required while retaining most of the benefits. This has made post-training far more accessible — you don’t need a cluster of GPUs to fine-tune a model anymore.

How They Differ

In-Context LearningPost-Training
Weight updatesNoneYes
PersistencePer-session onlyPermanent
InfrastructureJust API accessTraining pipeline + GPUs
Data neededA handful of examplesHundreds to millions
FlexibilityChange behavior instantlyRequires retraining
Inference costHigher (long prompts)Lower (behavior is internalized)
Performance ceilingBounded by existing capabilitiesCan exceed ICL, especially on complex tasks
Risk of forgettingNoneFine-tuning can degrade general capabilities

When to Use Which

Reach for in-context learning when:

  • You’re prototyping or experimenting
  • The task changes frequently
  • You have very few examples
  • You can’t modify the model (e.g., using a closed API)
  • The task is within the model’s existing capabilities

Reach for post-training when:

  • You need consistent, reliable performance at scale
  • Inference cost matters (those few-shot examples add up)
  • You need the model to learn genuinely new knowledge or behaviors
  • You want durable alignment (safety, tone, format)
  • You’re distilling a larger model’s capabilities into a smaller one

They Work Together

In practice, these aren’t competing approaches — they’re complementary layers in a stack:

  1. Pre-training provides the foundation: broad knowledge and language understanding.
  2. Post-training shapes the model into something useful: instruction-following, aligned, specialized.
  3. In-context learning provides the final layer of customization at inference time.

Here’s an important subtlety: post-training makes in-context learning much better. Instruction-tuned models respond far more reliably to few-shot prompts than base models do. The post-training teaches the model to pay attention to the structure and intent of prompts, which directly improves its ability to learn from in-context examples.

So the question isn’t really “which one should I use?” It’s “what’s the right mix?” For most practitioners working with modern LLMs, you’re already benefiting from post-training (the API model you’re calling has been instruction-tuned and RLHF’d), and you’re applying ICL on top of that every time you write a prompt. The real decision is whether your use case justifies the additional investment of custom fine-tuning on top of what’s already there.

Comments

Came here from LinkedIn or X? Join the conversation below — all discussion lives here.