The Bitter Lesson

Why Brute Force Keeps Winning in AI

Jul 17, 2025

Rich Sutton's influential 2019 essay, "The Bitter Lesson," argues that long-term progress in artificial intelligence comes from scalable, computation-driven general methods, rather than meticulously embedding human knowledge. Since its publication, subsequent breakthroughs have further validated this argument, showcasing the power of massive computational resources and generic architectures over domain-specific approaches.

Large Language Models (GPT-3, GPT-4)
GPT-3 (2020) and GPT-4 (2023) exemplify the triumph of massive computational scaling over explicitly engineered linguistic features. Rather than incorporating traditional syntactic and semantic rules, these transformer-based architectures trained on enormous datasets have surpassed specialized NLP tools, achieving human-level linguistic capabilities and generalizing to tasks never explicitly trained for.

Diffusion-based Image Generation (DALL·E, Stable Diffusion, Midjourney)
Prior image generation methods often involved manually crafted visual features and intricate domain-specific algorithms. By contrast, diffusion-based generative models introduced in 2021–2023 rely purely on large-scale neural networks and huge datasets. This scalability enabled unprecedented image quality and diversity, effectively rendering prior specialized techniques obsolete.

MuZero and Generalized Game Mastery
MuZero (2020) from DeepMind went beyond AlphaZero by learning to master multiple complex games without explicit rules, input domain knowledge, or even explicit game dynamics. Its architecture—deep learning coupled with Monte Carlo tree search—exemplifies Sutton's claim: general-purpose, scalable computational methods dramatically outperform handcrafted solutions.

Protein Folding Breakthrough (AlphaFold)
AlphaFold’s success in solving protein folding (2021) resulted from deploying generic deep learning techniques at vast computational scale rather than explicit biochemical rules. Previous biochemical modeling approaches incrementally improved through human insight; AlphaFold leapfrogged these efforts by harnessing raw computational power combined with large datasets.

AI-Assisted Programming (GitHub Copilot)
GitHub Copilot, powered by large language models, demonstrates how generic, computation-driven methods dominate specialized rule-based programming tools. It succeeded by scaling massively, learning directly from vast repositories of code, rather than embedding explicit programming logic or syntax rules.

Robotics and Embodied AI (Google RT-1, RT-DETR)
Recent robotics advances further affirm the bitter lesson. Systems like Google's RT-1 and DeepMind's RT-DETR employ transformer-based architectures trained on extensive robot data, surpassing methods that relied heavily on handcrafted kinematic models and control policies. Their success reinforces the effectiveness of general-purpose, data-driven methods at scale.

These recent examples underscore Sutton’s core thesis: long-term AI breakthroughs are consistently achieved by embracing scalable, generic architectures combined with ever-growing computational resources. The temptation to embed explicit human-derived knowledge might offer short-term gains but ultimately constrains future progress. The true path forward remains unwaveringly computational, leveraging sheer scale and generic learning methods.

Axio

Discussion about this post