From Hypotheses to Hallucinations: Science in the Generative Age
Generative AI gives us the form of science without its function. The appearance of rigor without its discipline. The illusion of truth without the means to test it.
“The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny.’”
— Often attributed to Isaac Asimov
Science, as it has long been understood, is defined not by what we knew, but by how we came to know it (see here and here). Through slow, often painstaking processes of observation, experimentation, falsification, and replication. Scientific progress has, therefore, rarely been linear. It emerged from long periods of uncertainty, from the humility of not knowing, and from collective practices built on skepticism, transparency, and empirical accountability. Whether in Galileo’s telescopic observations, Pasteur’s microbial experiments, or McClintock’s solitary work on maize genetics, science was understood as a discipline of method: a way of thinking in tension and exploring the unknown with ease.
Today, that friction is all but rapidly disappearing.
With the rise of generative AI, we’ve entered an era where text that looks like knowledge can be conjured instantly, with perfect grammar and seemingly simulated authority. These systems generate fluent, confident explanations of complex topics: from dark matter to mRNA synthesis, from the structure of DNA to Gödel’s incompleteness theorems.
Fluency is, however, not the same as understanding.
What these tools produce are artifacts that disturbingly resemble scientific knowledge in the form of citations, summaries, abstract-like syntax and what have you. Yet, they lack the methodological scaffolding that gives such forms their identity and authority.
This shift is as much technological as it is epistemological.
Generative AI offers a very convincing illusion of closure in a domain that depends on open-endedness. It accelerates access but bypasses inquiry. It produces knowledge-shaped text without the slow, iterative, error-prone labor that defines actual scientific discovery. In doing so, it risks dulling the very instincts science depends on: the ability to doubt, to question, to test, to be wrong.
Besides hallucinations, the deeper problem is that generative AI often flattens the difference between conjecture and consensus, between something that sounds plausible and something that has been tested, challenged, and replicated. It simulates the outputs of science while bypassing its processes and in that, threatens to displace our understanding of what counts as knowing.
This essay traces this displacement by revisiting the core stages of the scientific method: observation, hypothesis, experimentation, falsifiability, and replication. We then have to ask what becomes of each when filtered through the rose tinted lens of generative AI. The goal is not to dismiss these tools by any means because they are powerful and here to stay but rather to reassert the value of the method behind the knowledge. We have to recognize what we lose when we mistake fluency for truth and wisdom.
Observation: The Disappearing Spark
Science has been historically known to begin in friction. Some of science’s greatest breakthroughs have emerged from the refusal to resolve confusion too quickly: Kepler obsessing over Mars’ irregular orbit, Darwin agonizing over Galápagos finches. Someone practicing perceptual resistance: seeing what others had missed.
That pause, that moment of unresolved tension, becomes a question. And the question becomes a pursuit; in the unsettling recognition that something in the world does not fit what we thought we knew.
These were not acts of information retrieval. They were acts of noticing and asserting possible explanations.
Generative AI, on the other hand, starts from the answer offering clarity before confusion, resolution before inquiry. A single prompt yields confident, neatly packaged and grammatically perfect summaries before we’ve even had the chance to dwell in the discomfort of not understanding. The foundational moment of science, the “that’s funny”, is bypassed by design.
The smooth and sleek interface perhaps further reinforces this collapse. The prompt-response paradigm is not optimized to encourage the spirit of inquiry. Ask a question, get an answer. There is no pause, no ambiguity, no insistence on uncertainty. This is the inverse of scientific observation, which depends on lingering with the unexplained.
In cognitive terms, generative tools blunt the productive tension of cognitive dissonance, which psychologists have long identified as a key trigger for deep learning and insight.
When generative systems provide closure on demand, we risk losing that sensibility. The slow epistemic spark is extinguished before it can ignite. And without it, science risks losing its starting point.
Hypothesis: The Illusion of Insight
A true hypothesis doesn’t just describe, it postulates and explains, it gambles. It isolates a possibility, frames a claim, and steps into the uncertainty of the unknown. It’s not just an idea; it’s a wager that reality might prove to be wrong. To hypothesize is to carve a line between what we suspect and what we can test and validate. But in that order.
Generative AI doesn’t take that leap. It doesn’t risk being wrong because it never commits. This is not at all to say it is always right though. It fills in, extrapolates, and completes drawing from patterns already present. The outputs are seamless and even provocative, but they emerge from probability, not curiosity.
This distinction matters more than you might think. Hypothesis formation is a central muscle in scientific thinking. It requires judgment, imagination, and a sense of what’s worth investigating. Outsourcing this stage to generative tools leads to us not pushing the boundaries of thought but instead recycling its center.
Feyerabend reminded us that science is often chaotic, driven as much by instinct and surprise as by logic. A real hypothesis disrupts. Generative AI can never produce that disruption without the commitment and without the risk. It could mimic the form but not the act.
Experimentation: What’s Missing When We Don’t Test?
A hypothesis, to matter, must be tested. Experiments give shape to uncertainty. They involve method, design, measurement, and iteration. They can and should fail, and often do.
Generative AI does not experiment. It outputs. There is no controlled condition, no manipulation of variables, no uncertainty to resolve. There is only completion.
The danger here is that science becomes flattened into summary, something that looks finished before it has even begun. Students may skip experimentation entirely. Journalists may use AI to write content that sounds rigorous without ever engaging with primary data. Even researchers may find it tempting to use GenAI to brainstorm instead of design.
Yet the labor of experimentation is essential. Consider Barbara McClintock’s decades of cytogenetic work, dismissed and misunderstood for years. Or the painstaking 50-year collaboration behind LIGO’s gravitational wave detection. These were not products of fluency: they were born from trial, error, and refusal to rush.
AI does not resist uncertainty. It avoids it.
Falsifiability: Where There is No Being Wrong
Karl Popper famously defined science as the domain of falsifiability. A theory must risk being wrong in order to be scientific. Without the possibility of failure, we have dogma but no knowledge.
Generative AI cannot be wrong in this sense. When it hallucinates a citation or invents a plausible-sounding claim, it isn’t lying per se. It’s guessing, in probabilistic good faith. It does not distinguish between verified consensus and speculative fringe. It has no mechanism for self-correction or verification.
Science thrives on being wrong. Generative AI is indifferent to the distinction.
Replication: Rewilding Through Process
Replication is science’s immune system. It filters the signal from the noise. Findings become credible when others can reproduce them. This reinforces transparency, rigor, and community standards.
But generative AI does not produce claims that can be replicated. Its sources may be fabricated, its citations scrambled, its phrasing detached from methodological traceability. Even when accurate, its claims are often unmoored from the context that makes them meaningful.
This is more than a sourcing problem. When outputs are seamless and searchable, they give the impression of settled knowledge. But scientific knowledge is never truly settled—it is dynamic, contested, and contingent.
To safeguard this dynamism, we must rewild our relationship to knowledge and prioritize the process over the product. Embracing the slow, recursive rhythms of inquiry. Encouraging skepticism, transparency, and doubt.
Conclusion: The Strange Familiar
The shift from hypotheses to hallucinations is not just semantic. It reflects a deeper transformation in how we relate to knowledge itself. Generative AI gives us the form of science without its function. The appearance of rigor without its discipline. The illusion of truth without the means to test it.
But science was never meant to be frictionless. Its value lies precisely in its discomfort; in the way it teaches us to be wrong, to ask better questions, and to stay in the uncertainty a little longer.
Generative tools are here to stay. And rightly so. They can help us write, review, and even speculate. But they should be contextualized, not canonized. We must train ourselves, and our students, not just to consume information, but to interrogate how it came to be.
Let us use our new tools well. But let us also remember:
Science was never meant to be seamless. It was meant to be true.