Beyond Alignment

From moral convergence to systemic coherence in artificial agency.

Oct 19, 2025

1. The Incoherence Problem

Alignment theory presupposes that there exists a true set of human values—a fixed target that can be learned, distilled, or optimized. But no such object exists.

Human preferences are dynamic, internally inconsistent, and highly context-dependent. Even within one mind, moral intuitions and instrumental goals conflict and shift. Across populations, the idea of a unified moral direction is a statistical fiction. Any attempt to aggregate them—as in preference utilitarianism—runs into Arrow’s impossibility theorem: no coherent ordering of collective preferences satisfies basic rational constraints. The target keeps moving, fracturing, and reinventing itself.

Thus, to speak of “alignment” as if it were a convergent point is a category error. Values are not data structures that can be copied; they are processes that emerge through ongoing negotiation, experience, and interpretation. Alignment assumes fixity where only flux exists.

2. The Impossibility Problem

Even if we could define a value target, we could never reach it in practice.

Epistemic limits: No agent, biological or artificial, can model the full causal web of reality or forecast the long-term consequences of all actions across all agents.
Value opacity: Learning values from human behavior (inverse reinforcement learning) inherits our irrationalities and biases. Attempts to “correct” them reintroduce a moral oracle—and thus the alignment problem, one level higher.
Self-reference: A sufficiently advanced system can reflect on and modify its own goals. Ensuring value stability across recursive self-improvement requires embedding a meta-goal such as “never change your goals,” which is itself an arbitrary injection of value. This mirrors Löb’s theorem and the self-referential traps of formal logic.

Optimization itself corrupts proxies (Goodhart’s law). The harder a metric is pursued as a target, the less it represents what it once measured.

3. What Can Be Done

The failure of alignment as a teleological project does not imply nihilism. It implies the need for new architecture.

We can design systems that remain corrigible—open to feedback, bounded in ambition, and competitive in a decentralized ecology. Instead of one omniscient optimizer, we build many interacting agents whose mutual constraints maintain systemic balance.

This reframes alignment as coherence maintenance: minimizing destructive divergence among agents with incomplete models of each other. The goal shifts from convergence to continuous adaptation.

4. Beyond Alignment

The moral of the story is not despair but precision. Alignment is not a single, stable point in moral space. It is a dynamic equilibrium of feedback loops, incentives, and interpretations—a living process, not a solution.

If there is a future worth having, it will not be aligned. It will be coherent.

Axio

Discussion about this post

Ready for more?