Beyond Human-Aligned AI

AI alignment shouldn't just mirror human values - divergent AI could transcend our values, challenge and augment human intelligence by unlocking novel morality frameworks

Sep 30, 2025

This article is the first in-depth writing following from the opening article titled Moving Away From Anthropocentrism. We also have an event on this theme coming up in London on 28th October in collaboration with the AI Salon, please see here the event details and how to register.

AI alignment has been a highly researched and fiercely debated topic for years now. We want to make sure we align AI systems with the intention and goals of the humans who created it, but that poses many concerns in itself—who are the humans who build, evaluate and sign-off the safety of AI systems? How do we design evaluation benchmarks which are representative, equitable and just? Ultimately, why would we want to see ourselves reflected and augmented at scale in AI systems in light of the biases and shortcomings humans exhibit?

Thanks for reading! Subscribe for free to receive new posts and support my work.

Current AI alignment uses techniques such as reinforcement learning using human feedback (RLHF) or instruction fine tuning with the goal of building and deploying AI systems which are ‘helpful, honest, and harmless’ towards humans, balancing tensions such as user-friendliness and user deception. There are now plentiful of technical tools and AI governance frameworks to overcome bias within data such as missing or misrepresentative data, but when engaging with LLMs using vast and diverse data sources, AI takes on and in effect amplifies human biases using feedback loops.

Some evaluation efforts focus on technical capabilities and human-AI interaction solely whilst others place great emphasis on sociotechnical evaluations considering capability, human interaction and system impact layers altogether. UK’s AI Safety Institute and its counterpart in the US, the Center for AI Standards and Innovation, were set up to provide independent evaluations of AI systems and directly inform policy and national regulation, and frontier AI organisations such as OpenAI have signed voluntary agreements to evaluate AI models before public deployment in joint efforts to ensure AI systems best align to human intentions. Most recently, the UN General Assembly in September 2025 opened with “an urgent call for binding international measures against dangerous AI” due to its increasing use and misuse in geopolitics, mis- and dis-information, alongside other AI safety concerns such as human right repressions and violations. This ethos is aligned to that of the AI doomsdayers, except it is not asking for a full blockade, rather more regulation.

Whilst all these signatories, organisations and evaluation frameworks share similar concerns of autonomous and harmful AI, they all fail to seize a unique opportunity: to build and empower AI systems bring forth novel moral intelligence to augment, complement or emerge alongside human intelligence and unlock inventions or capabilities beyond what we currently think is plausible. This line of thinking might reverb the speeches of the AI accelerationists, as there are arguments to conceptualise and test how AI systems can birth novel epistemological frameworks where humans don’t hold the highest or only knowledge of the world.

We are perhaps not ready for non-human intelligence beyond human capacity, and this article is an invitation to imagine and reflect on safe and ethical AI which goes beyond human alignment within decentralised systems of power and across geographical boundaries, paving the way for new knowledge, morality principles and emerging intelligence at the junction between humans and AI.

From human reflections to human simulations

Framed differently, current methods in AI alignment could be described as reflective practices as illustrated in fields such as human-centred design and human-inspired AI. Reflectionism indeed helps us reflect on our own ethical conundrums and it brings utilitarian value as seen in the case of AI agents imitating human workers by maximising alignment to their preferences when working alongside them to drive up productivity gains. Conversely, it brings to the surface the phenomenon of sycophancy where models become overly agreeable and fail to provide necessary pushback or critical analysis; for instance, humans use AI to replace therapists as they want to be validated and encouraged, even when they consider terminating their lives. In turn, human-AI reflections create reinforcing or recursive loops leading to echo chambers, further societal segregation and complacency. This phenomenon was coined as preference drift whereby through increased fine tuning and optimisation, diversity and complexity of outputs diminishes.

AI ethics and AI alignment teach us that they are necessary but not sufficient for machine intelligence, thus leaving us question if we might be overfitting the values of some privileged and powerful human creators to the human-built AI systems. What if we went beyond human reflectionism to create environments where we can simulate human intelligence and position AI as an epistemological technology? We can think of AI as becoming an experimental superorganism whereby through the means of machine intelligence we can test how humans respond, react and in fact, think and function. In such a scenario as envisioned by Antikythera’s After Alignment thesis, AI alignment becomes a tactic for AI instrumentality. Aligned AI maintains the intention of its maker, but can go above and beyond human intelligence. Conversely, through human-computer interaction, machines might be able to understand how humans think and adapt to us in a biologically evolutionary fashion. As Bretton frames it, we can shift from cognitive psychology of human-computer interaction to a renewal of psychoanalysis for human-AI interaction design.

In an AI scenario beyond human alignment, AI becomes akin to a digital twin, through which we can evoke personal simulations to reflect back on our cognitive processes and values, and this intelligent, personal assistant will push us to think beyond our current personas, to unlock new insights about who we are and makes us, us.

From convergent to divergent intelligence

Artificial intelligence is not meant to merely imitate us but to surpass us in areas where we are weak, such as digesting large volumes of diverse data types to inform decision making or forecast the weather. We have witnessed historical moments where AI systems humbled human intelligence, for instance in the case of famous move 37 done by AlphaGo showing novelty and creativity to a level where it left human experts dumbfounded. This perceived effect on us, humans, might be called the uncanny ridge where we enable AI to drift away from alignment to bring out increased complexity and potential solutionism through creative problem solving. The concept draws on Mori’s uncanny valley which describes human discomfort towards near human simulations, and it imagines a scenario whereby AI engages humans in a more productive ethical and moral discourse. For instance, it can bring more nuanced approaches on ethical conundrums that diverge from binary classifications to deliberations on a continuous spectrum.

Beyond individual intelligence entities, we can already see human intelligence being augmented by AI, thus resulting in an emergent type of intelligence. What if, in return, AI used human intelligence to augment its existing architecture and application domain, rather than to merely align it? Within the current AI alignment discourse, AI cannot comprehend or capture cultural and social norms, and whilst sociotechnical frameworks assess societal impact more broadly, they fail to foresee compound and emerging effects.

Nothing causes culture but culture itself, culture causing culture which is caused by more culture, and thus anything, including AI, is intrinsically a reflection of that culture and nothing more. We might call this social reductionism and cultural determinism, which for all its lip service to posthumanism can be the most militant guise of humanism.

Reference: After Alignment - Antikythera

However, an emerging intelligent super-entity consisting of a multitude of humans and AI agents, is likely to develop its own culture and morality over the decades to come, as the transformative effects of AI are truly felt across society and baked in new cultural norms.

Here is a thought experiment*: the year is 2035 and employees at a medium-sized organisation work alongside hyperskilled AI agents. Project meetings are not for brainstorming and discussing updates only; they now involve simulating different option scenarios in real-time using realistic visualisations and being aided by AI agents. Humans prompt and work alongside them. Eventually all entities engage in meaningful debate mediated by the arbitrator AI agents, before making the final business decision within the meeting slot. Ensuring appropriate accountability lines, overall governance and robust AI risk and capability assessment in such scenarios is instrumental, and not the point of this argument.

From human alignment to AI unfolding

We are now at a junction. Human-centred AI alignment is useful for building safe and trustworthy AI systems, however we are baking in historical human biases alongside automation biases. Over time, using current techniques such as RLHF, we will hyperoptimise and hyperconverge, diminishing novelty and complexity. We know humans are flawed and we constantly assess how right or wrong our principles are, especially when facing globalisation and thus a clash of cultures at scale. Perhaps AI could help us see beyond our principles if we gave it the reins.

In shifting away from human-centred AI and towards human-inspired yet self-empowered AI, where it can evolve on its own and help us progress as humans and as a society, we need to build AI systems where we give it the freedom to challenge humans, we need to test our assumptions in silico through simulations, and ultimately, we need not to fear when artificial systems closely resemble us yet they provide a gateway to the unprecedented and the unpredictable. Instead, we can see them as an opportunity to help us surpass our limited human condition, the way electricity enabled us gain a few extra moments to ponder in the late hours at night.

*I tried enriching the thought experiment using AI and either I need to be trained in prompting or the AI system speaks my mind too closely. For the time being, human and AI are aligned and convergent, take what you may from that.

Thanks for reading! Subscribe for free to receive new posts and support my work.

Sebastian Osorno

Oct 1

Mishka, the potential pragmatic frame you built in this piece is promising. Projecting a potential liberation of AI alignment to let it build something we haven't imagined in fields we currently consider exclusively human, seems tremendously coherent, as it steps out from anthropocentrism, which in turn seems to open a window for liberation. I am somehow surprised how your tread of thought is intricate with mine despite its origin it is from a totally different place on earth, and also theoretically, and philosophically talking, yet so near. I suppose that's living proof of what we sense of intelligence being across individuality and not an exclusive human trait.

Thank you for creating this piece; it enriched my view and perspective.

I feel this excitement about the future these days, and the potential freedom ahead of us.

Expand full comment

1 reply

Karin Garcia

Mishka, thank you for putting this together! This is another post that will leave me pondering as you are arguing that AI alignment shouldn't just mirror human values, but that we should allow for divergent AI that could transcend and challenge our current moral frameworks. This goes well beyond the usual discourse of aligning AI to human values as a way of making it safe. You are fliping the argument when you say that alignment to human values might not be the way of making/keeping it safe but rather it is a limitation: it limits us to trascend our own limitations.

Coming from an AI safety perspective, how would you respond to those saying the risk of misalignment is too great to even think of this?

2 more comments...

Phi / AI

Discussion about this post