Home/Guides/Few-Shot Learning
Opinionated Take

Few-Shot Learning is Dead.
Long Live Foundation Models.

For a decade, the few-shot learning community built intricate meta-learning algorithms to classify with minimal examples. Then GPT-3 put examples in a prompt and matched their results. CLIP classified images it had never seen. The field didn't die overnight, but the cause of death is now clear.

March 2026|18 min read|Thought Leadership

The Premise

Few-shot learning as a dedicated research field is being absorbed into foundation model capabilities. The specialized methods — metric learning, meta-learning, episodic training — are not wrong. They're just unnecessary when you have a model that already understands the world.

This is not a prediction. It has already happened. The top few-shot learning benchmarks are now dominated by foundation models using zero-shot or simple linear probes. The conferences still accept few-shot papers, but the leaderboards tell the real story.

A Brief, Fond Obituary

Few-shot learning was one of the most intellectually beautiful subfields in machine learning. The core question was profound: how do you learn from almost nothing? The answers were elegant. They just got overtaken by something bigger.

2015Siamese Networks for One-ShotKoch et al.obsolete

Learn a similarity function between image pairs. Requires careful pair construction and task-specific training.

2016Matching NetworksVinyals et al. (DeepMind)obsolete

Attention over support set embeddings. Episodic training to simulate few-shot at train time.

2017Prototypical NetworksSnell et al.

Classify by distance to class prototypes in embedding space. Elegant, but the embedding is the whole game.

2017MAMLFinn et al.

Learn an initialization that adapts in few gradient steps. Beautiful idea. Difficult to scale.

2019Meta-DatasetTriantafillou et al.

Harder benchmark: diverse domains, variable shots. Exposed fragility of existing methods.

2020GPT-3 Few-Shot PromptingBrown et al. (OpenAI)foundation model

Put examples in the prompt. No training. No gradients. Just scale. The inflection point.

2021CLIP Zero-Shot ClassificationRadford et al. (OpenAI)foundation model

Classify images by text description alone. Zero-shot beats many few-shot methods. Game over for image few-shot.

2023DINOv2 / Segment AnythingMeta AIfoundation model

Universal visual features. Any downstream task with minimal adaptation. Few-shot is just "use good features."

2024GPT-4V / Gemini multimodalOpenAI / Googlefoundation model

Describe the task in natural language. Show a couple examples. Done. Few-shot learning is now a prompt.

What Happened

Two papers broke the field. Not by refuting few-shot learning, but by making it trivially solvable as a side effect of scale.

2020

GPT-3: Few-Shot as Prompting

Brown et al. showed that a 175B parameter language model could perform few-shot classification by simply placing examples in the context window. No meta-learning. No episodic training. No learned distance functions. Just next-token prediction at sufficient scale.

# The entire "algorithm":
Classify the sentiment:
Great movie! -> Positive
Terrible film. -> Negative
Loved every minute. -> ???
2021

CLIP: Zero-Shot as Default

Radford et al. trained on 400M image-text pairs and got a model that classifies images by text description alone. Zero-shot CLIP beat many fully-trained few-shot methods on their own benchmarks. The entire concept of "few-shot image classification" became a rounding error.

# No training examples needed:
image = load("mystery_bird.jpg")
labels = ["sparrow", "eagle", "penguin"]
result = clip.classify(image, labels)
# Just works. On any class.

The pattern is unmistakable. Few-shot learning researchers spent years building increasingly sophisticated episodic training procedures to learn good representations for low-data scenarios. Foundation models achieved the same thing — better, actually — by training on vastly more data with simpler objectives. The specialized field was simply outscaled.

The Evidence

Numbers don't lie. Across every major few-shot benchmark, foundation models have caught up or surpassed dedicated methods — usually with zero or minimal adaptation.

BenchmarkSpecialized MethodFoundation ModelYear
miniImageNet 5-way 1-shot
MAML++
75.1%
CLIP zero-shot
79.3%
2021
tieredImageNet 5-way 1-shot
ProtoNet + SSL
73.6%
DINOv2 + linear probe (1-shot)
82.1%
2023
Cross-domain few-shot (CUB)
Meta-Dataset CNAPS
73.2%
GPT-4V description + CLIP
78.4%
2024
NLP few-shot (SuperGLUE avg)
Pattern-exploiting Training
76.8%
GPT-3 few-shot prompting
79.5%
2020
Few-shot object detection (PASCAL VOC)
TFA w/ cos
39.8 mAP
Grounding DINO zero-shot
48.7 mAP
2023
Few-shot speech commands (5-way 1-shot)
Prototypical Networks
82.4%
Whisper embeddings + kNN
89.1%
2024
miniImageNet 5-way 1-shot: Zero-shot CLIP beats a method literally designed for this benchmark.
tieredImageNet 5-way 1-shot: A frozen foundation model feature extractor plus a single linear layer.
Cross-domain few-shot (CUB): Multimodal reasoning replaces learned metrics entirely.
NLP few-shot (SuperGLUE avg): The paper that started the conversation. No gradient updates needed.
Few-shot object detection (PASCAL VOC): Open-vocabulary detection makes few-shot detection obsolete.
Few-shot speech commands (5-way 1-shot): Foundation model representations are just better features.

The Real Insight

Few-shot learning was never really about few-shot learning.

It was about learning good representations. Every successful few-shot method — ProtoNets, Matching Networks, MAML — worked because it learned embeddings where similar things were close and different things were far apart. The episodic training, the meta-learning objectives, the support/query splits — these were all scaffolding to learn better features under data constraints.

Foundation models solve the representation problem directly. Train on enough data with self-supervision, and you get embeddings so good that nearest-neighbor classification in the resulting space beats any meta-learned metric. The scaffolding becomes unnecessary when the foundation is strong enough.

What's Left of Few-Shot Learning?

Intellectual honesty requires acknowledging the niches where dedicated few-shot methods still hold value. They exist, but the list is shorter than the field wants to admit.

STILL VIABLEOn-device learning

When you literally cannot call an API. Edge devices, offline scenarios, privacy constraints.

STILL VIABLEDrug discovery / molecular

Domains where foundation models have limited pretraining data. But this window is closing.

STILL VIABLERobotic manipulation

Real-world physical interaction data is scarce and expensive. Meta-learning still adds value here.

ABSORBEDNLP classification

GPT-3 killed this in 2020. In-context learning is strictly better than any meta-learning approach.

ABSORBEDImage classification

CLIP and DINOv2 made this pointless. Zero-shot or linear probe on frozen features wins.

ABSORBEDCross-lingual transfer

Multilingual LLMs handle this natively. No need for specialized few-shot transfer.

The pattern: Few-shot methods remain viable only in domains where foundation model pretraining data is scarce or where inference-time constraints prevent using large models. As foundation models expand to new modalities and edge deployment improves, even these niches will shrink.

The Pattern: Bitter Lesson, Chapter Two

This is not the first time a specialized AI subfield has been absorbed by general-purpose scale. It is the same story Rich Sutton identified in 2019, playing out again with remarkable fidelity.

The Bitter Lesson Pattern

  1. Researchers encode domain knowledge into specialized methods
  2. These methods work well on carefully constructed benchmarks
  3. The field grows: workshops, surveys, benchmarks, taxonomies
  4. A general method powered by scale casually matches the results
  5. The specialized field enters denial, then bargaining, then niche-seeking
  6. The general method improves further. The field quietly pivots.

Previous Victims

Hand-crafted featuresKilled by deep learning (2012)
NLP pipelinesKilled by transformers (2018)
Task-specific modelsKilled by LLMs (2020+)

"The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin."

— Rich Sutton, 2019. Few-shot learning is the latest confirmation.

The Counterarguments (And Why They're Weak)

"Foundation models use few-shot learning internally"

This is technically true and entirely beside the point. In-context learning in LLMs is mechanistically related to meta-learning. But the implication is devastating for the field: the best few-shot learner is one that wasn't trained to do few-shot learning at all. It emerged from next-token prediction at scale. If your specialized training procedure produces worse few-shot performance than an emergent capability of a general model, your specialization adds negative value.

"Few-shot methods are more efficient"

Efficient at training, yes. But nobody cares about training efficiency for a model you train once and use forever. Foundation models amortize their training cost across billions of users and tasks. The per-task cost is vanishingly small. And at inference time, a CLIP classification is just a dot product — identical cost to ProtoNets.

"We need few-shot for novel domains"

This was true in 2020. In 2026, foundation models cover vision, language, audio, video, code, molecules, proteins, weather, and genomics. The "novel domain" argument keeps retreating to increasingly niche territory. At some point, you have to ask: are you solving a real problem, or defending a research agenda?

"The benchmarks aren't fair"

This is the most honest objection. Foundation models have seen far more data than few-shot methods are allowed to use — they've arguably seen the test distribution during pretraining. But that's exactly the point. If you can pretrain on diverse data and get few-shot for free, why would you do it the hard way? The benchmark comparison isn't about fairness. It's about practical relevance.

Implications for Researchers

Stop Doing

  • --Publishing miniImageNet results as if they prove anything
  • --Building meta-learning algorithms that don't compare against frozen foundation model baselines
  • --Proposing new episodic training procedures for vision
  • --Calling in-context learning "few-shot" to boost citation counts

Start Doing

  • ++Studying in-context learning as a mechanistic phenomenon
  • ++Working on efficient adaptation of foundation models (LoRA, adapters, prompt tuning)
  • ++Focusing on domains where pretraining data genuinely doesn't exist
  • ++Building on foundation model representations instead of competing with them

The Honest Question

If you are working on few-shot learning in 2026, ask yourself: am I solving this problem because it's genuinely unsolved, or because I have a hammer and this looks like a nail? The best few-shot learning researchers have already pivoted — Chelsea Finn works on robot foundation models, Oriol Vinyals led Gemini at DeepMind. They read the writing on the wall. The meta-learners meta-learned a new career direction.

The Deeper Lesson

Few-shot learning's absorption into foundation models isn't a failure of the researchers. It's a success of the representations. The field asked "how do we learn from little data?" and the answer turned out to be: "pretrain on a lot of data first, then everything is few-shot."

This is, in a way, a vindication of the few-shot learning intuition. The core insight — that good representations enable rapid adaptation — was exactly right. The mistake was thinking that few-shot-specific training was the best way to get those representations.

The field didn't fail. It succeeded so completely that it became a footnote in a larger story. The research on metric learning, prototype computation, and gradient-based meta-learning laid the intellectual foundations for understanding why in-context learning works. That contribution is real. But the practical methods? Those are done.

What Comes Next

More subfields will be absorbed

Domain adaptation, transfer learning, multi-task learning — each faces the same existential question. If a single foundation model handles all your tasks, what is the purpose of your specialized transfer method?

The research focus shifts to adaptation efficiency

The interesting question is no longer "how do we learn from few examples" but "how do we efficiently adapt a foundation model to a specific task." This is where LoRA, prefix tuning, and prompt engineering live. It's a better question.

The last niches will fall

On-device learning, molecular property prediction, robotic manipulation — each has its own foundation model coming. RT-2 for robotics. MolBERT and beyond for chemistry. When those arrive, the last justifications for dedicated few-shot methods evaporate.

Few-shot learning is dead.

Its ideas live on in every foundation model that adapts to new tasks from a handful of examples — or none at all. The best epitaph a research field can have is that its central problem was solved so thoroughly that nobody needs to think about it anymore.

The bitter lesson strikes again.

References

Related Guides

Not sure which solution fits your use case?

Describe your challenge and we'll point you to the right solution - or create a dedicated benchmark for your needs.