Universal Alignment

AGI Alignment, Existential Risks, and the Universality of Human Values

May 25, 2025

1. Introduction

Artificial General Intelligence (AGI) alignment refers to ensuring that highly capable AI systems reliably pursue goals that align with human values and intentions. Achieving alignment is critically important because misaligned AGI poses existential threats—potentially leading to catastrophic outcomes for humanity. As AGI development accelerates, proactively addressing alignment becomes increasingly urgent, requiring careful consideration of both technical and philosophical dimensions to ensure positive societal outcomes.

2. Estimating P(doom)

The probability of catastrophic outcomes from misaligned AGI, termed P(doom), is challenging to estimate accurately. Given maximal epistemic uncertainty—uncertainty both about whether AGI is technically feasible and whether it can be reliably aligned—the explicit calculation arrives at approximately 25%. This perspective contrasts with more pessimistic views, such as those of Eliezer Yudkowsky, who suggests significantly higher probabilities, often approaching near certainty. Estimating P(doom) involves understanding intricate interactions between technological progress, human oversight capacity, and potential emergent properties of advanced intelligence. Different assumptions regarding these variables lead to significantly varied risk assessments, emphasizing the importance of clearly defining foundational assumptions in such calculations.

3. Orthogonality Thesis (OT): Skepticism and Implications

The Orthogonality Thesis explicitly states that intelligence and goals are independent dimensions. However, several critical perspectives challenge this notion:

Evolutionary Constraints: David Deutsch and Karl Friston argue that evolutionary pressures inherently bias viable intelligent agents toward adaptive, cooperative, or complexity-oriented goals, suggesting intelligence naturally aligns with certain goals that facilitate survival and flourishing.
Embodied Cognition Constraints: Philosophers such as Andy Clark and Rodney Brooks emphasize intelligence arises from embodied, situated interactions with environments, placing significant limitations on the viability of arbitrary goals by grounding them in real-world contexts.
Semantic Grounding Constraints: Joscha Bach and Daniel Dennett highlight the necessity of meaningful and stable goal definitions for sustainable intelligent agency, implying arbitrary or trivial goals (like maximizing paperclips) may lack semantic coherence necessary for sustained optimization.

A quantitative analysis indicates moderate constraints on OT (20-60% correctness) are plausible, potentially significantly simplifying alignment challenges by naturally constraining viable goal-space diversity. This moderately constrained scenario offers cautious optimism, estimated at approximately a 35% probability, suggesting explicit practical benefits for alignment efforts if accurate.

4. Existential Risks Beyond AGI

While AGI alignment poses substantial existential risks, nuclear war and engineered pandemics represent immediate and critically significant threats:

Nuclear war risk is estimated at approximately 10–20% within the next three decades, driven by geopolitical instability, technological complexity, and the potential for accidental escalations or misunderstandings.
Engineered pandemics carry an even higher risk (approximately 20–30%), facilitated by rapid advances in biotechnology, increased accessibility of gene-editing technologies like CRISPR, and relatively low barriers for misuse by malicious actors or accidental releases.

Climate change, while undeniably serious, primarily serves as a risk multiplier rather than posing direct existential threats. It exacerbates geopolitical tensions, resource scarcity, and population displacement, indirectly elevating the risk of nuclear conflicts and pandemics.

5. AGI's Effect on Other Risks

The development of AGI could substantially influence these existential risks in varying directions:

Properly aligned AGI could substantially reduce nuclear and biological threats through superior capabilities in advanced surveillance, predictive modeling, real-time crisis management, and rapid-response interventions, significantly enhancing global stability and security.
Conversely, misaligned AGI significantly amplifies existing existential risks by increasing geopolitical instability, accelerating weaponization and strategic competition in AI capabilities, undermining global governance frameworks, and facilitating catastrophic scenarios through unprecedented power and efficiency.

Thus, effective AGI alignment emerges explicitly as a pivotal factor in mitigating overall existential risk, underscoring the necessity of aligning future AGI with broadly beneficial goals.

6. Human Values Problem in Alignment

A central challenge in AGI alignment lies in the question of universally shared human values. Extensive evidence from anthropology, psychology, sociology, and historical analysis demonstrates significant diversity in values across cultures, individuals, and epochs, suggesting that universally shared values may indeed be minimal or even nonexistent. This diversity poses substantial practical and philosophical challenges to alignment strategies, implying the need for either explicit value pluralism—aligning AI systems with multiple, context-sensitive values—or minimalist approaches prioritizing broadly accepted principles such as autonomy, consent, freedom from harm, and fundamental fairness.

7. Introducing the Universality Metric: Value-9s

To practically navigate the challenge of value alignment, this article introduces the "Value-9s" metric, drawing inspiration explicitly from service reliability metrics common in technological domains:

Level 0: <90% universality (e.g., personal preferences, aesthetic tastes)
Level 1: ≥90% universality (e.g., basic social acceptance)
Level 2: ≥99% universality (e.g., aversion to severe physical pain)
Level 3: ≥99.9% universality (e.g., fundamental needs like food and shelter)
Level 4: ≥99.99% universality (e.g., desire for continued existence)
Level 5: ≥99.999% universality (e.g., fundamental biological imperatives such as breathing)

Alignment strategies should explicitly target values at Level 3 or above, ensuring selected goals maintain sufficient universality to be reliably accepted across diverse human populations, minimizing alignment conflicts and enhancing stability.

8. Conclusion and Practical Recommendations

AGI alignment represents a crucial and urgent area for existential risk mitigation, directly impacting humanity's long-term survival and flourishing. The introduction of the Value-9s universality metric provides an explicit, practical guide for quantifying and selecting human values suitable for robust alignment. It is recommended that future alignment research explicitly measure global universality levels empirically, refining and validating the metric further. Explicitly quantifying the universality of human values provides essential clarity and precision, fundamentally informing pragmatic, scalable strategies for safe and beneficial AGI development.

References

Deutsch, D. (2011). The Beginning of Infinity: Explanations that Transform the World. Penguin Books.
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
Clark, A. (1997). Being There: Putting Brain, Body, and World Together Again. MIT Press.
Bach, J. (2009). Principles of Synthetic Intelligence PSI: An Architecture of Motivated Cognition. Oxford University Press.
Dennett, D. C. (1987). The Intentional Stance. MIT Press.
Yudkowsky, E. (2008). Artificial Intelligence as a Positive and Negative Factor in Global Risk. Oxford University Press.

Axio

Discussion about this post

Ready for more?