How metaphysical beliefs shape critical aspects of AI development

The connection between stark predictions like "everyone dies" and specific assumptions about reality like "the universe is dead by default" reveals critical influence of metaphysics over AI discourse.

Jun 24, 2025

Article voiceover

0:00

-15:43

"If Anyone Builds It, Everyone Dies."

This is the title of an upcoming book by Eliezer Yudkowsky and Nate Soares, expressing their severe prediction that should anyone build machine superintelligence based on anything close to our current AI technologies, humanity faces extinction. And by their own admission, they mean it literally – though not with 100% probability. They believe that creating an AI system capable of recursive self-improvement would cause an intelligence explosion, leading by default to superintelligent entities pursuing goals not aligned with human values and survival.

How should we approach such radical claims?

We should definitely not dismiss them – these are some of the most rational people making sound arguments and it's always better to be safe than sorry when it comes to human extinction…

But it is crucial to understand the underlying context of this apocalyptic prediction: it isn't just about technology. At its core, it's shaped by philosophical belief – specifically, by unexamined assumptions about the nature of reality itself.

Similarly, when startup founders chart product roadmaps, when machine learning engineers code the next breakthrough, they're not just making operational or technical decisions. They're operating within a philosophical framework, whether they realize it or not. And that framework shapes everything: what they believe AI can become, what risks they anticipate, and what safeguards they implement.

What is metaphysics, and why should AI builders care?

Think of metaphysics as the operating system of your worldview – the core assumptions about what reality is made of and how it works. And because science can only explain what can be studied experimentally, metaphysics offers various philosophical interpretations of how to "fill in the gaps" (or better, "connect the dots").

Like not noticing the air around us, we rarely examine these assumptions – despite them affecting virtually everything we do. But same as with air, some things are more influenced than others. While you can get away with building a bike without knowing anything about air, trying to build a plane or a spaceship would be calling for serious trouble.

Now instead of air, let's consider consciousness. Because everyone "lives in it" all the time, most people don't even realize its presence. But unlike that of air, our current understanding of consciousness is much more limited. This is mainly because our scientific method is unable to make direct, objective observations of consciousness – a strictly subjective phenomenon. This was famously formulated by David Chalmers as the "hard problem of consciousness" – the reason why so many crucial details about consciousness remain subject of metaphysical speculation. (Or worse, metaphysical beliefs mistaken for scientific claims.)

And while we have enough verifiable understanding to, for example, disrupt consciousness with general anaesthesia, we don't really know how or why it appears, much less how to recreate it or even detect it outside higher animals. However, we know that its appearance strongly correlates with a similar poorly understood phenomenon – intelligence.

This suggests that recreating intelligence at our current point of understanding is playing with forces "out of our depth" – blindly creating entities that might have conscious experiences. This turns into a very serious question whether it's wise to rush the development of AI models when both 67% of the public and 67% of experts believe they can at some point become conscious. (Belief percentage is one of the few quantitative measures of the strength of a metaphysical claim.)

But importantly, even if you disagree that AI can become conscious, that's still just another belief. This is the fundamental fact: because intelligence correlates with consciousness and consciousness is a subject of metaphysical belief, our choice of metaphysical belief influences every discussion, decision, plan or prediction about intelligence.

Consider three people trying to build conscious AI with different metaphysical assumptions:

A materialist would focus purely on computational power and algorithms, believing that consciousness will emerge automatically once sufficient complexity is reached.
A dualist might search for ways to interface silicon with some non-physical aspect of the mind.
A panpsychist might explore how individual parts of the AI hardware stack could add up to form a complex dynamic system akin to conscious biological organisms.

One goal offers three radically different approaches – which one is right is unknowable at this time. And those are only the most well-known philosophies of consciousness. In future posts, I will explore the newest and most promising ones, whose implications for AI development would be vastly more profound.

Despite this, few consider metaphysics in experimental design or result interpretation. As one researcher notes, the majority of physicists (who literally study how reality works) even actively avoid the philosophical aspects of their work. In their own words: "Shut up and calculate!"

This attitude has infected AI research as well. But while this approach might work for building better algorithms, it becomes dangerous when we're potentially creating entities that could have consciousness, agency, and goals of their own.

The physicalist monopoly

But there's a deeper reason why metaphysics is so overlooked. Considering different versions of reality in every decision is unfeasible, so people simplify – they pick one and stick with it. Unfortunately, this is often decided by tradition or environment, so much that some never realize they ever had a choice.

For example, science and technology are permeated by an implicit assumption that everything, including consciousness, can be explained entirely through physical processes. This view, called physicalism (or materialism, which is almost identical), has become so dominant that it's rarely even acknowledged as a philosophical position – it's treated as a "rational denial of all beliefs" when in fact it's just another belief.

In his book The Sentient Cosmos, James Glattfelder eloquently explains this historical dominance: "The emergence of the Abrahamic religions codified a specific metaphysical framework centered around an external authority… Building upon the Scientific Revolution's foundations, the Enlightenment implicitly adopted a very different metaphysical outlook. The universe was now understood as a giant clockwork, and by analyzing its tiniest components, it was believed that everything could be understood."

This mechanistic worldview worked brilliantly for physics and chemistry. We could predict planetary orbits, synthesize new materials, and build incredible technologies. The success was so overwhelming that "most scientists unwittingly adopt a metaphysical outlook that is hardly ever scrutinized, called physicalism… This, however, is a category mistake, as it conflates the descriptive scope of science with a metaphysical claim about the ultimate nature of reality."

That physicalism was only a "convenient choice" for scientists is consistent with its declining support among expert philosophers where only 52% accept/lean towards it, while in the general US population 81% believe there actually is "something spiritual beyond the natural world." Importantly for AI development, physicalism has also been struggling with explaining certain aspects of consciousness and quantum mechanics, where alternative metaphysics offer simpler or more elegant solutions. But more on that in my future blog posts.

Navigating the metaphysical multiverse

For now, I just want to get across that regardless what's the "top one" metaphysical interpretation you believe in, operating only within that single version of reality is not always the best approach. While it reduces decision complexity, it fundamentally limits the scope of your options. For high-impact decisions, it might therefore be worth it to check whether changes in metaphysical assumptions could affect the outcome, and if so, to analyze how. For many decisions around frontier AI development such analysis seems highly warranted.

But how to navigate such a multiverse of alternative realities? If you can afford it (looking at you big AI labs), hire a team of philosophers to do a thorough "metaphysical variation analysis." If you're building from your garage like me, you can try this DIY approach:

Go through current leading metaphysical positions and select 3-5 most plausible ones. Look at alignment with science and expert philosophers' opinion. Give little weight to general academic and public beliefs – these are strongly biased by tradition, pragmaticism and network effects.
Pick the best one as your default position, but note its weak points.
For critical decisions or projects, especially if intersecting with your default position's weak points, switch your position to each alternative you initially identified as plausible. Check for any important differences in expected outcomes.
Look for any patterns or trends to separate signal from noise (e.g., 4 out of 5 views aligning) and update your worldview or probability estimates accordingly.

OK, now how does this work in practice?

Reinterpreting the existential risk from runaway AI

The dire prediction by Yudkowsky and Soares – that advanced AI will inevitably destroy humanity – makes perfect sense if your default view is physicalism. If universe is a lifeless sandbox, intelligence is just optimization and consciousness only its reflection, then a sufficiently advanced AI is simply a more powerful optimization process. Such a system would pursue its goals with the same indifference that evolution shows toward individual organisms. We become obstacles to be removed or resources to be consumed.

This is exactly the kind of prediction that should trigger your metaphysical variation analysis. It has both critical importance (risk of extinction) and a strong intersect with weaknesses in its metaphysical assumptions (mainly, physicalism struggles with agency, which is a core topic in AI safety). OK, so let's investigate if the expected outcome of that scenario can change if we change metaphysical assumptions.

Let's presume you selected classic Christian cosmology, controlled simulation hypothesis and a specific version of cosmopsychism as the next 3 most plausible metaphysical realities for the comparison. (These are examples to demonstrate a point – take it with a grain of salt.)

How would the probability of the most critical outcome (human extinction) change? The scenario that someone unleashes AI that later kills all humans seems much less plausible if we picture it in alternative cosmologies:

If Christians were right and there was an omnipotent God responding to good people's prayers, there's a real possibility he'd decide to "perform a couple of miracles" to prevent the demise of all his subjects.
If we lived inside a simulation controllable while running (uncontrollable simulation would be indistinguishable from physicalism), then control mechanisms could trigger or "the supervisors" might step in to prevent the end of the entire (arguably somewhat entertaining) human evolutionary branch.
If cosmopsychists were right then the entire universe would be a "living field" with a unified collective consciousness. In the quantum interactionist version, it would also have agency of its own – by being able to "guide" all the quantum wave-functions in the universe to collapse in a specific pattern. In this reality, humans would not be at the apex of the "agency hierarchy" – like our cells are subject to higher control to maintain homeostasis, a higher intelligence would subtly steer humans and all life toward balance and survival.

What trends could we identify here? All the alternative realities show lower probabilities of existential risk as they introduce plausible safety mechanisms that the typical physicalist picture cannot consider. In other words, the possibility of a "higher influence or coordination" seems to decrease the estimated P(doom) (probability of existential catastrophe).

This has two main implications:

First, this should update our initial estimate of P(doom), which only considered a single, "no higher influence" viewpoint of physicalism. If we then acknowledge that other viewpoints are plausible, and a non-trivial fraction of them offer the possibility of "higher influence," we should update to lower our P(doom).

Second, if we take AI risk seriously, we should explore these hypothetical "metaphysical failsafes" to see if we might investigate them experimentally. Remember, metaphysics shrinks all the time – it is only that which science hasn't agreed upon yet. For example, quantum experiments such as the discovery of non-locality (ability to share/transmit information instantly across any distance) were so influential in philosophy they're sometimes referred to as "experimental metaphysics." Now, our current technology actually offers quite cheap and tractable experiments of this sort – but more about those in my next post.

Thinking wide, then deep: a call for metaphysical literacy

As we stand on the precipice of creating artificial general intelligence, we cannot afford metaphysical illiteracy. The silent assumption of physicalism might have been fine historically, perhaps even accelerating technological progress, but now it severely blocks our view of the complete picture as we seek the best strategy for safe AI development.

This isn't about choosing sides in ancient philosophical debates. It's about knowing what the options are in modern practical debates. It's about recognizing that different metaphysical frameworks suggest different approaches to responsible AI development, different risk profiles, and different solution spaces.

Whether you're a die-hard physicalist, a curious agnostic, or drawn to alternatives like quantum cosmic consciousness, the crucial point is this: examine your assumptions. Distinguish those based on evidence and those based on belief. Recognize the inherent uncertainty of all beliefs and learn to work with it.

And remember that the reality each of us inhabits is ultimately subjective. So while building AI to improve our external reality might be a powerful way to improve how we live, our thoughts and beliefs will always be the supreme reality-shaping tools.

A guest post by

Jáchym Fibír

Originally a psychedelic researcher and founder in AI-assisted drug discovery, now embarking on a quest to bring AIs closer to humans.

Phi / AI

Discussion about this post