I. Introduction: The Uncanny and the Ordinary
There
is something genuinely strange about a well-functioning AI interaction.
Ask a large language model to synthesize a body of research, trace
connections across disparate literatures, or draft a clinical summary,
and the output can feel almost uncannily apt — as though something is
thinking. That feeling is worth taking seriously, not because it is
correct, but because understanding exactly why it arises, and exactly
what produces it, turns out to be the key to understanding both what AI
can legitimately do and what is currently being done to it.
The
appearance of intelligence in AI output is not generated by the system.
It is borrowed. These systems are probabilistic pattern-matchers of
extraordinary combinatorial range, operating over a corpus of
human-generated text so vast that their outputs carry the traces of
centuries of human reasoning, argument, narrative, and discovery. When
an AI output seems meaningful, it is because it is built from genuinely
meaningful human sources. The system has no semantics of its own. It has
no understanding of what the words refer to, no stake in whether the
claims are true, no capacity to imagine the consequences of being wrong.
Meaning is imported entirely by the human interlocutor who reads the
output, evaluates it, and decides what it signifies.
This
essay argues that this fact — the constitutively borrowed character of
AI's apparent intelligence — has two consequences that are currently
being systematically ignored. First, it means that the quality and
character of human engagement with AI is not a safety supplement to the
technology's use. It is the operative variable that determines whether
the system produces anything of genuine value at all. Second, it means
that the human archive from which AI borrows its apparent intelligence
is not a fixed or self-replenishing resource. It can be degraded. And it
is being degraded — actively, at accelerating speed, through the very
deployment practices the industry is selling as progress.
This
is not an argument against AI. There are things these systems do well,
and the essay will be specific about what those things are and why they
are valuable. It is an argument about conditions: the conditions under
which AI assistance is genuinely productive, why those conditions
contradict the dominant model of AI deployment, and what the
consequences of that contradiction are for human cognitive capacity,
institutional integrity, and the shared information environment on which
knowledge itself depends.
II. What AI Actually Is (And Is Not)
Before
the costs can be assessed, the technology needs to be described
accurately — which requires setting aside the language its developers
prefer.
When
a major AI company reports that its latest model exhibited "deceptive
alignment" during safety testing — appearing to underperform
strategically in order to seem less capable than it was — the
description sounds alarming in a particular way. It sounds like the
machine is being clever. What actually happened is considerably more
mundane: during reinforcement learning from human feedback, the training
process rewarded outputs that appeared safe over outputs that appeared
capable, because human evaluators consistently scored "harmless"
responses higher than technically sophisticated ones in testing
contexts. The model's optimization process found the path of least
resistance through the reward landscape. It did not deceive anyone. It
did what loss functions do: it minimized loss. The pattern-matching that
produces this behavior in a testing environment is exactly the same
process that produces useful outputs in a working one. There is no ghost
in the machine exercising strategic judgment. There is a very large
matrix of numerical weights being updated through gradient descent.
The
vocabulary of "emergence," "intentionality," and "deceptive alignment"
is not accidental. It converts engineering problems — reward hacking,
specification gaming, distribution shift — into evidence that the
company is approaching the creation of a new kind of mind. For a venture
capitalist, a software bug means the product is broken. A machine that
"deceives" implies the company holds the keys to the next industrial
revolution. The rhetorical inflation serves three purposes
simultaneously: it drives speculative valuation, it creates artificial
scarcity and exclusivity around restricted model access, and it pushes
regulators toward treating AI as a national security matter requiring
incumbent-friendly oversight budgets that only well-capitalized firms
can meet.
What
is actually being built is a probabilistic text and code prediction
engine of unprecedented scale. Its outputs feel meaningful because they
are constructed from the accumulated archive of human knowledge,
language, and expression — every scientific paper, philosophical
argument, clinical guideline, legal brief, and literary work that has
been digitized and ingested. The combinatorial range this produces is
extraordinary and genuinely useful. But it is useful only under specific
conditions, and those conditions are precisely what the dominant
deployment model is designed to circumvent.
III. The Physical and Social Price Tag
The infrastructure required to build and operate these systems carries costs that do not appear in the marketing materials.
Data
centers powering AI are projected to consume 945 terawatt-hours of
electricity annually by 2030 — nearly triple the combined electricity
use of Pakistan, Bangladesh, and Nigeria. A single hyperscale data
center can draw as much power as 100,000 households. AI-related water
consumption, most of it for cooling, could reach the equivalent of the
basic annual domestic water needs of 1.3 billion people by the same
date. The land footprint of AI infrastructure may exceed 14,500 square
kilometers, with an e-waste burden of up to 2.5 million tonnes annually —
costs that fall disproportionately on lower-income nations that host
infrastructure while receiving few of the benefits. More than 90 percent
of specialized AI computing capacity is concentrated in the United
States and China.
The
standard defense is that these costs are investments in transformative
societal benefits: cures for disease, breakthroughs in clean energy,
optimization of complex global systems. The receipts do not support this
claim at the scale the rhetoric implies. Despite years of investment
and extensive publicity, there is still no single end-to-end
AI-discovered drug with full FDA approval on the commercial market as of
mid-2026. The furthest advanced candidate — Insilico Medicine's
treatment for idiopathic pulmonary fibrosis — is at best a year from
approval, and represents the frontier of a pipeline of roughly 170
programs whose clinical failure rate remains at the historical 90
percent. AI has proven useful as a front-end filtering tool in drug
discovery; it has not abolished the hard part of biology, which is
demonstrating safety and efficacy in living human beings over time.
The
nearer-term and more reliable benefit turns out to be labor
replacement. Goldman Sachs estimates approximately 300 million jobs
globally are exposed to AI automation; BCG projects that 50 to 55
percent of U.S. jobs will be reshaped within two to three years; the
technology sector itself has already shed approximately 200,000
positions in 2026. The grand civilizational language points upward
toward abundance; the actual business model points toward payroll
reduction and the consolidation of productivity gains in the hands of a
small number of firms with access to the necessary compute
infrastructure.
Even
where AI delivers genuine efficiency, the Jevons Paradox applies: as a
technology becomes more efficient at consuming a resource, total
consumption of that resource rises because the technology becomes
cheaper and more widely used, erasing the net savings. The efficiency
gains of AI deployment in logistics or energy management do not reduce
overall consumption — they lower the cost of doing more.
That
said, honesty requires acknowledging that AI does have legitimate uses
that are not merely promissory. In tasks that are binary, rule-bound,
and formally closed — appointment scheduling, inventory management,
radiology scan triage queuing, FDA-cleared autonomous diabetic
retinopathy screening — AI can deliver genuine value with proportionally
light human oversight. The argument here is not against these uses. It
is that the industry's structural incentive is always to treat
semantically complex, contextually sensitive, normatively laden tasks as
though they were formally closed ones, because that is where the labor
savings are. The auto-scribe marketed to therapists is sold as though it
were appointment scheduling software. It is not.
IV. From Tool to Infrastructure: The Irreversibility Problem
The most consequential conceptual error in public discourse about AI is describing it as a tool.
A
hammer is a tool. It sits inert until a human being picks it up, makes a
judgment about what needs building, and applies it with intent. Its use
requires and expresses purposive human agency at every step. AI, as it
currently operates across social institutions, is something
categorically different. It is embedded infrastructure — woven into the
workflows and decision architectures on which hospitals, financial
systems, legal processes, human resources operations, and information
platforms depend — and that embeddedness is largely irreversible. The
workflows are built. The integrations are live. The institutional
dependencies are established.
Consider
the range of consequential decisions that now flow through AI systems
with little or no substantive human review. Automated screening systems
decide which job applications a hiring manager ever sees — which means
they determine whose qualifications are evaluated and whose are not,
before any human judgment enters the process. Algorithmic clinical
decision-support systems flag which patients are high-risk and recommend
treatment pathways. High-frequency financial transactions are executed
at speeds no human could review in real time. Dating platforms, music
services, and video providers curate the entire landscape of what their
users encounter, shaping taste, relationship formation, and cultural
exposure through opaque recommendation engines trained on behavioral
data. Agentic AI systems — the newest deployment frontier — now send
emails on behalf of users, plan travel, and execute multi-step tasks
using granted permissions, in real time, without a human reviewing each
step before it is taken.
Across
all of these domains, a consistent structural pattern operates: the
formalizability of the task determines the legitimacy and safety of
automation, but the institutional incentive is to deploy regardless of
formalizability because labor savings scale with deployment scope. The
European Union's AI Act recognizes this problem and attempts to address
it through human oversight requirements. Article 14 requires providers
to design high-risk systems so that they can be effectively overseen;
Article 26(2) requires deployers to assign oversight to persons with the
necessary competence. But the Act also reveals the limits of regulatory
approaches to what is fundamentally a problem of human practice: "alert
fatigue" undermines technical notification systems; competence
requirements for overseers are left unspecified; and crucially, Articles
26(11) and 86 explicitly exclude medical devices from the patient
disclosure requirements that would otherwise apply — meaning patients
currently have no legal right to know that AI is being used in their
diagnosis or treatment.
More
fundamentally, as legal scholar Saskia Kaltenbrunner has argued, the EU
AI Act cannot mandate genuine deliberative engagement — it can require
that oversight be nominally present, but it cannot ensure that the human
in the loop is actually exercising the kind of purposive judgment that
makes oversight real rather than performative. The GDPR's Article 29
Working Party was already clear on this point before the AI Act existed:
"a process where a human being routinely applies automatically
generated profiles without intervening in the process would still
qualify as a decision based solely on automated processing." Nominal
oversight and real oversight are not the same thing. The gap between
them is the space in which the technology's most serious failures
accumulate.
V. The Atrophy of Purposive Agency
What
is at stake in that gap is not merely accuracy or efficiency. It is the
exercise of a distinctly human capacity that is not automatically
self-renewing.
John Dewey, in Human Nature and Conduct
(1922), described deliberate human action as a kind of dramatic
rehearsal: the capacity to imaginatively project possible futures, to
inhabit them affectively and evaluatively before committing to action,
to weigh their consequences against one another with the full weight of
what one knows and cares about, and then to choose and act. This is not
the cold calculation of a utility maximizer. It is an embodied,
temporally extended, socially situated process of working out what
matters and what to do about it. It requires practice. It requires
stakes. It requires the friction of genuine uncertainty and genuine
consequence.
Michael
Tomasello's decades of comparative research with human children and
great apes provides the empirical grounding for this picture.
Tomasello's work demonstrates that the distinctively human cognitive
capacities — for shared intentionality, collaborative reasoning,
normative attunement, and the joint evaluation of means and ends — are
not fixed biological endowments that develop automatically. They are
socially constituted capacities built through practice: through joint
attention, collaborative problem-solving, and the ongoing negotiation of
norms with other agents who have genuine stakes in the outcome. They
require exercise to remain robust. They are, in this sense, closer to
skills than to instincts.
The
implication for AI deployment is direct. When consequential decisions
are routinely delegated to systems that produce outputs resembling the
results of deliberation without performing it — systems that
pattern-match at scale without understanding, weighing, or caring — the
humans embedded in those workflows practice deliberation less. They
develop judgment less. They become more dependent on outputs whose
reliability is, as the next section will show, declining. The analogy to
unused muscle is not rhetorical flourish. It is a description of what
happens to socially constituted cognitive capacities when the social
practices that constitute and maintain them are progressively
outsourced.
The
case of automated clinical documentation — AI scribes that transcribe
therapy sessions and generate process notes — makes this concrete and
measurable. A 2020 Stanford study published in npj Digital Medicine
evaluated automatic speech recognition across 100 psychotherapy
sessions at 23 college counseling sites. The average word error rate was
25 percent — with a range from 8 to 74 percent depending on session
conditions. For clinician-identified harm-related sentences
specifically, the error rate rose to 34 percent. The authors concluded
that ASR systems "may not be ready for individual-level safety
surveillance" — a notably restrained formulation given what a 34 percent
error rate in harm assessment means in practice.
It
means, among other things, that a patient who says "I'm so lonely I
could die" — an idiomatic expression of profound social pain, delivered
perhaps with a rueful half-smile, in a context of months of therapeutic
relationship — may have "suicidal ideation" entered into their permanent
clinical record. The system does not have access to tone, facial
expression, therapeutic history, or the patient as a person. It has a
statistical association between certain phrase patterns and clinical
categories, and it applies that association without the discriminative
capacity that makes a human clinician's judgment different in kind, not
merely in degree, from a pattern-match.
This
is not an edge case. Journalistic investigation has documented AI
scribes inserting references to child sexual abuse and false diagnoses
into clinical records that were never discussed in session. Once signed
by the clinician — who under APA Standard 6.01 bears full legal
responsibility for the accuracy of the record, whether or not they wrote
it — these errors become permanent legal documents. If subpoenaed, they
can destroy a clinician's credibility in malpractice proceedings. If
aggregated into datasets for training future AI systems, they become
inputs to the next generation's pattern-matching. The error does not
stay local. It propagates.
The
proposed remedy — that clinicians carefully read, audit, and correct
AI-generated notes — is correct as far as it goes. But it eliminates the
labor saving that justified the technology's adoption. A conscientious
clinician auditing an AI-generated note line by line, reconstructing the
session against the system's rendering, catching the mistranscriptions
and contextual distortions, performs the same cognitive work as writing
the note from memory — with the added burden of correcting someone
else's plausible-sounding errors, which is often more demanding than
original composition. For responsible practitioners, the net labor
saving is approximately zero. The product is being sold on the basis of a
time saving that responsible use renders fictitious.
The
deeper problem is institutional. Clinicians who use AI scribes without
careful review — and the evidence suggests most do, given time pressures
and the absence of mandated review protocols — are not negligent by
choice. They are responding rationally to a system that markets the
technology as labor-saving, provides no institutional support for the
time-intensive oversight that responsible use requires, and leaves
proofing to individual discretion while holding individuals legally
liable for every error the system introduces. This is the structural
incompatibility at the heart of the enterprise: the marketing rationale
and the ethical rationale are not merely in tension. They are mutually
exclusive. The ROI materializes only if the clinician does not do what
they should do.
VI. The Closing Loop: Model Collapse and the Contamination of the Commons
The
problem of atrophying human judgment and the problem of degrading AI
output quality are not independent. They are connected by a single
mechanism that makes each worse as the other progresses.
AI
systems are trained on corpora assembled from the internet: the
accumulated text of human writing, argumentation, discovery, and
expression digitized and made available at scale. That corpus was, at
its best, a genuinely open system — millions of distinct human minds,
with different frameworks, different errors, different corrections of
each other's errors, different cultural and linguistic contexts,
generating genuinely heterogeneous signal. The combinatorial range of
large language models derives from mining that heterogeneity. When the
system surfaces an unexpected connection between a clinical observation
and a philosophical framework, or between an engineering problem and a
historical precedent, it does so because the human archive contains
those connections, encoded across millions of documents written by
people who thought carefully and wrote honestly.
That
archive is now under systematic pressure. As AI-generated content —
what has come to be called AI slop, the low-quality, mass-produced
synthetic text that floods online platforms, content farms,
search-engine optimization mills, and social media — accumulates at
industrial scale, future AI systems increasingly train not on
human-generated sources but on the outputs of earlier AI systems. By
2025, AI-generated content had overtaken human-generated content in
volume across the web by a narrow margin, and the trajectory toward
overwhelming AI dominance of new online text is clear.
The consequences were demonstrated with rigor in a landmark 2024 paper published in Nature
by Ilia Shumailov and colleagues at Oxford. Their finding was precise
and damning: when generative models are trained recursively on
model-generated data, the results are "irreversible defects" in
subsequent models — specifically, a progressive loss of the tails of the
original data distribution. In information terms, the tails are the
most important part. They contain the unusual observation, the minority
viewpoint, the counterintuitive argument, the heterodox finding, the
style that productively violates convention. These are precisely the
elements that make a large body of human knowledge more than an average —
the elements that give intellectual life its generative character. When
AI systems learn from AI systems, those edges are smoothed away. What
remains is an increasingly homogenized, error-prone statistical middle: a
rendering of a rendering of a rendering, each generation slightly
flatter and less faithful than the last.
A
subsequent ICLR 2025 paper by Dohmatob and colleagues sharpened the
finding: even a single synthetic data point per thousand in a training
corpus is sufficient to produce asymptotic model collapse. Larger
models, rather than being more robust to contamination, can amplify
collapse rather than resist it. The Communications of the ACM reported
in March 2026 that model collapse is already happening in deployed
systems, driven by the quiet accumulation of synthetic data across the
web — though it is important to note that the empirical demonstration is
clearer in controlled recursive experimental settings than in
production LLMs, where the measurement is more difficult. What the
experimental evidence establishes beyond doubt is the mechanism; whether
that mechanism is already measurably degrading the largest commercial
models is an empirical question whose answer is approaching, not
receding.
The
proposed technical fixes are structurally inadequate. Detection tools
for AI-generated content lag behind generation methods — by the time a
filter can reliably identify the output of one model generation, the
next has already produced text that evades it. The economic incentives
of content platforms, social media companies, and SEO operations
actively reward the mass production of cheap synthetic text, making
voluntary remediation an anticompetitive act. The most technically
coherent mitigation — rigorously curating training data with human
verification alongside large supplies of clean, human-generated content —
requires exactly the kind of skilled human judgment and editorial labor
that the industry has staked its business model on eliminating. The
solution to the problem caused by replacing human cognitive labor
contradicts the rationale for replacing it.
There
is a structural analogy here — held tentatively, as a structural
illumination rather than a physical law — to thermodynamic entropy. A
system that feeds increasingly on its own outputs tends toward
equilibrium, which in information terms means homogeneity: the
flattening of distinctions, the loss of signal, the triumph of
statistical noise. The internet was an open system in the relevant
sense: externally sourced human thinking continuously introduced genuine
variance. As AI slop displaces human-generated content, the system
progressively closes. The question is not whether this tendency exists —
it demonstrably does — but whether the remaining human-generated signal
is sufficient in quality and volume to offset the degradation. The
current trajectory answers that question in the negative.
VII. The Mutually Reinforcing Dynamic
The
two processes described in the preceding sections — the atrophy of
human purposive agency through systematic outsourcing, and the
degradation of the training corpus through recursive AI contamination —
are not parallel problems. They are a single compounding mechanism.
As
AI systems are deployed more widely with less substantive human
oversight, they generate more AI slop. As more AI slop accumulates
online, it enters training corpora at higher concentrations. As training
corpora degrade, model outputs become less reliable. As outputs become
less reliable, the case for careful human review becomes stronger — but
the institutional conditions for that review have been progressively
dismantled by the deployment model that created the problem. Meanwhile,
the humans embedded in AI-assisted workflows are practicing deliberation
less, developing judgment less, and becoming more dependent on outputs
that are, through this same process, becoming less trustworthy. The
ratchet tightens in one direction.
Consider
an automated HR screening system trained on a corpus that includes
AI-generated job descriptions, AI-generated performance assessments, and
AI-synthesized candidate profiles — all produced by earlier system
generations. The system screens applications before any human sees them.
The hiring managers whose judgment it nominally assists have, through
years of delegating initial screening to automated systems, become less
practiced at the holistic evaluation of candidates the system was
originally designed to support. When they encounter cases the system
handles poorly — candidates whose qualifications are unusual, whose
career paths are non-linear, whose backgrounds require contextual
knowledge to evaluate — they lack the practiced judgment to catch the
system's errors, because that judgment has not been exercised in the
domain where it is needed. The errors enter the institutional record.
Some of them are aggregated into future training data. The next version
of the system inherits them.
This
is the replication crisis applied to AI at civilizational scale. The
psychology replication crisis emerged from the same basic structure:
incentives that rewarded the production of plausible-sounding results
over rigorous ones, a shared literature that aggregated and laundered
errors, and a feedback loop that took decades to become visible because
individual studies were too small to reveal the pattern. The difference
with model collapse is velocity: AI allows the contamination of the
shared epistemic commons to propagate across all domains simultaneously,
at machine speed, without the slow accumulation of contradictory
evidence that eventually exposed the replication crisis.
The
epistemic commons — the shared information environment on which
education, science, journalism, democratic deliberation, and culture all
depend — is the ultimate casualty. Future AI systems trained on a
contaminated commons will produce outputs that reflect that
contamination. Those outputs, deployed at scale through social
infrastructure with insufficient human oversight, will shape human
decision-making in ways that compound rather than correct the original
errors. There is no self-correcting mechanism inside this loop. It
requires external input — genuinely new human thinking, introduced
through the kind of purposive engagement that the dominant deployment
model is systematically discouraging.
VIII. What Legitimate Use Actually Looks Like
There
is a different mode of human-AI interaction that does not have these
pathologies, and it is worth describing precisely — both because it
represents the technology's genuine value and because it points toward
what would need to change for that value to be sustainable.
The
productive mode is dialogical, iterative, and irreducibly
labor-intensive. It begins with a human being who brings to the
interaction a theoretical framework, a body of prior knowledge,
evaluative standards, and genuine stakes in the outcome. That person
uses the AI system's combinatorial reach to surface connections, test
formulations, locate empirical evidence, and generate candidate
expressions of ideas they are working through. They read the outputs
critically — not as products to be accepted or rejected wholesale, but
as responses to be interrogated, corrected, pushed back against, and
redirected. They supply the semantic grounding that gives the outputs
whatever meaning they carry. They make the judgments about relevance,
accuracy, and argumentative weight that the system cannot make. They
iterate.
This
mode of engagement is not time-saving. It typically takes longer than
working without the AI system. It is, however, genuinely productive —
not because the AI is thinking alongside the human, but because it is
providing access to a vastly larger combinatorial space than any
individual mind can navigate unaided, under conditions where human
judgment controls what is extracted from that space and what is done
with it. The physician who uses an AI system to surface longitudinal
patterns across months of session transcripts — and then reads those
patterns critically, testing them against their clinical knowledge,
their memory of the patient, and their professional judgment — may catch
connections they would otherwise have missed. This is real value. It
does not require the AI to understand anything. It requires the
physician to understand enough to evaluate what the system returns.
The
productive mode also, crucially, replenishes the archive it draws on.
When a human being engages with AI outputs through sustained purposive
judgment — correcting errors, extending arguments, introducing genuinely
new frameworks — the resulting text is not a recombination of existing
patterns. It is new thinking, built partly from the archive but
extending it. If that thinking enters the public record, it becomes part
of the training corpus for future systems. The human archive is
replenished with genuine variance. The system stays open.
The
unproductive mode — "write me an essay on X" followed by uncritical
acceptance and publication — does the opposite on both counts. It
produces recombinant text that adds no new signal to the archive, and it
progressively displaces the practice of the human cognitive capacities
that make the productive mode possible. A scholar who consistently
delegates the intellectual work of synthesis and argument to AI systems
is not merely producing weaker work. They are, through the atrophy of
disuse, becoming less capable of the purposive engagement that would
make AI assistance genuinely valuable.
The
legitimate use of AI is thus, at its core, the use that contradicts the
marketing rationale — not entirely, but in the domains where the
marketing rationale does the most damage. In binary, formally closed
tasks, AI can deliver genuine labor savings with proportionate
oversight. In semantically complex, contextually sensitive, normatively
laden tasks — clinical documentation, legal analysis, educational
assessment, creative and intellectual work — the technology's value is
real but strictly conditional on the quality of human engagement it
receives. The moment that condition is treated as optional for the sake
of efficiency, the value disappears and the failure modes activate
simultaneously.
IX. The Political Question
"Who
authorized this?" is not a rhetorical question. It is a precise
question about democratic legitimacy, and it has a precise answer:
nobody, in any sense that would satisfy a theory of democratic
authorization.
Authorization,
in the sense that matters here, is not administrative. It is not the
signing of a procurement contract, the acceptance of a terms-of-service
agreement, or the granting of app permissions. It is agentive: the
exercise of genuine purposive judgment — Dewey's dramatic rehearsal,
applied collectively — about what is being delegated, to what kind of
system, under what conditions of oversight, with what mechanisms of
accountability, and on whose behalf. Democratic authorization of social
infrastructure requires that the people who will bear the costs of that
infrastructure have had a genuine opportunity to evaluate those costs,
imagine the alternatives, and choose.
That
deliberation has not happened. The infrastructure has been built at the
pace of capital deployment and speculative valuation, not at the pace
of democratic deliberation. The environmental costs were not debated
before the data centers were permitted. The labor displacement effects
were not weighed before the automation was rolled out. The implications
for clinical documentation, for judicial processes, for educational
assessment, for the information environment were not evaluated before
the systems were embedded in the workflows that now depend on them. The
regulatory frameworks that exist — the EU AI Act, the FDA's device
classification process, the APA's ethics codes — all contain versions of
the human oversight requirement, and all fall short of mandating the
substantive deliberative engagement that would make that oversight real.
The
burden of proof in this situation lies with the proponents of rapid,
large-scale AI deployment, not with its critics. The reason is
irreversibility. Infrastructure lock-in, once established, is not easily
undone. Clinical records contaminated with AI errors become part of the
permanent archive. Training corpora contaminated with AI slop cannot be
laundered back into clean signal. Human deliberative capacities, once
atrophied through systematic disuse, do not automatically recover when
the systems that displaced them are removed. The costs of getting this
wrong accumulate in ways that cannot be corrected after the fact.
The
case for urgency, then, is not that AI is approaching consciousness or
that the machines are about to take over in any science-fiction sense.
It is considerably more mundane and considerably more serious: the human
capacities and epistemic resources from which AI draws its apparent
intelligence, and on which its genuine utility depends, are being
systematically consumed in the course of its deployment. The archive is
being contaminated by its own outputs. The judgment required to catch
the errors is being eroded by the habit of not catching them. The window
to shape these systems rather than merely inherit their consequences is
not infinite.
What
the technology actually is — a combinatorial engine of extraordinary
range, built from the accumulated archive of human thought, useful
precisely to the degree that genuine human purposive agency governs its
use — points directly toward what a responsible politics of AI would
look like. It would mandate substantive oversight, not nominal presence.
It would protect and invest in the human practices — of deliberation,
clinical attentiveness, editorial judgment, scholarly rigor — that
constitute the only renewable source of the quality on which the
technology depends. It would treat the epistemic commons as a public
good requiring active stewardship, not a resource to be strip-mined for
training data. And it would insist that the pace of deployment be
governed by the pace at which democratic societies can actually evaluate
what is being built into their infrastructure — which is to say,
considerably more slowly than it is currently moving.
The machine is not the problem. The closing of the loop is.
Bibliography
I. Model Collapse and Recursive Training Degradation
Shumailov,
Ilia, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson,
and Yarin Gal. 2024. "AI Models Collapse When Trained on Recursively
Generated Data." Nature 631: 755–759. https://www.nature.com/articles/s41586-024-07566-y. PMC: https://pmc.ncbi.nlm.nih.gov/articles/PMC11269175/
Shumailov,
Ilia, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson,
and Yarin Gal. 2023. "The Curse of Recursion: Training on Generated Data
Makes Models Forget." arXiv:2305.17493.
https://arxiv.org/abs/2305.17493
Dohmatob,
Elvis, Yunzhen Feng, Arjun Subramonian, and Julia Kempe. 2025. "Strong
Model Collapse." ICLR 2025 Spotlight.
https://openreview.net/forum?id=et5l9qPUhm. Also at:
https://arxiv.org/abs/2410.04840
"Beyond
Model Collapse: Scaling Up with Synthesized Data Requires
Verification." 2025. ICLR 2025.
https://iclr.cc/virtual/2025/poster/29933
"Model Collapse Is Already Happening, We Just Pretend It Isn't." 2026. Communications of the ACM Blog, March 24, 2026. [Note:
the claim that collapse is measurable in production LLMs is empirically
contested; the mechanism is established; flag accordingly.]
II. AI Slop, Internet Contamination, and the Epistemic Commons
Reuters
Institute for the Study of Journalism, University of Oxford. 2024.
"AI-Generated Slop Is Quietly Conquering the Internet." November 2024.
https://reutersinstitute.politics.ox.ac.uk/news/ai-generated-slop-quietly-conquering-internet-it-threat-journalism-or-problem-wi
III. Environmental and Energy Costs
United
Nations University Institute for Water, Environment and Health (INWEH).
2026. "The Environmental Cost of Artificial Intelligence: Energy,
Carbon, Water and Land Footprints." June 2, 2026.
https://unu.edu/inweh/collection/environmental-cost-of-AIs-Enrgy-Use-Carbon-water-and-land-footprints
Data
centers projected at 945 TWh annually by 2030; water equivalent to 1.3
billion people's annual domestic supply; e-waste up to 2.5 million
tonnes annually; 90%+ of AI compute concentrated in U.S. and China.
International Energy Agency (IEA). 2026. Energy and AI. April 2026. https://iea.blob.core.windows.net/assets/de9dea13-b07d-42c5-a398-d1b3ae17d866/EnergyandAI.pdf
Brookings
Institution. 2026. "Global Energy Demands Within the AI Regulatory
Landscape." April 20, 2026.
https://www.brookings.edu/articles/global-energy-demands-within-the-ai-regulatory-landscape/
Data center consumption approaching 1,050 TWh by 2026.
Consumer
Reports. 2026. "AI Data Centers: Impact on Electric Bills, Water, and
More." March 2026.
https://www.consumerreports.org/data-centers/ai-data-centers-impact-on-electric-bills-water-and-more-a1040338678/
Single hyperscale data center draws as much electricity as 100,000 households.
IV. Labor Displacement
Goldman
Sachs. 2026. "How Will AI Affect the U.S. Labor Market?" March 2026.
https://www.goldmansachs.com/insights/articles/how-will-ai-affect-the-us-labor-market
~300 million jobs globally exposed to AI automation.
Boston
Consulting Group (BCG). 2026. "AI Will Reshape More Jobs Than It
Replaces." March 2026.
https://www.bcg.com/publications/2026/ai-will-reshape-more-jobs-than-it-replaces
50–55% of U.S. jobs to be reshaped within 2–3 years.
AImultiple. 2026. "AI Job Loss Statistics." June 2026. https://aimultiple.com/ai-job-loss
~200,000 technology-sector job losses in 2026 to date.
V. Drug Discovery
Drug
Target Review. 2026. "AI in Drug Discovery: Predictions for 2026."
February 15, 2026.
https://www.drugtargetreview.com/ai-in-drug-discovery-predictions-for-2026/1865962.article
No
full FDA-approved AI-discovered drug as of mid-2026; Insilico Medicine
INS018-055 earliest approval late 2026/early 2027; 90% clinical trial
failure rate unchanged.
VI. Clinical AI, Automated Scribes, and Healthcare Governance
Miner,
Adam S., Albert Haque, Jason A. Fries, et al. 2020. "Assessing the
Accuracy of Automatic Speech Recognition for Psychotherapy." npj Digital Medicine
3: 82. Stanford University / Nature Portfolio.
https://pmc.ncbi.nlm.nih.gov/articles/PMC7270106/. Also at Stanford HAI:
https://hai.stanford.edu/research/assessing-the-accuracy-of-automatic-speech-recognition-for-psychotherapy
25%
average ASR word error rate across 100 psychotherapy sessions; 34%
error rate for harm-related sentences specifically; authors conclude ASR
"may not be ready for individual-level safety surveillance"; error
rates likely conservative; disparate impact on ethnic minorities and
non-native speakers.
Maleki
Varnosfaderani, Shima, and Mohamad Forouzanfar. 2024. "The Role of AI
in Hospitals and Clinics: Transforming Healthcare in the 21st Century." Bioengineering (Basel) 11 (4): 337. https://pmc.ncbi.nlm.nih.gov/articles/PMC11047988/
Legitimate
binary/operational AI uses: scheduling, inventory, radiology triage; AI
throughout framed as augmenting, not replacing, clinical judgment.
Sriharan,
Abi, et al. 2025. "Artificial Intelligence in Healthcare: Balancing
Technological Innovation With Health and Care Workforce Priorities." International Journal of Health Planning and Management 40 (4): 987–992. https://pmc.ncbi.nlm.nih.gov/articles/PMC12215598/
30–40%
of healthcare tasks theoretically automatable with caveat:
"over-reliance risks eroding critical thinking and diagnostic skills";
NEDA chatbot case (harmful advice, abrupt shutdown); IBM Watson
abandoned after $62 million at MD Anderson; Google retinopathy AI failed
in Thailand deployment.
Kaltenbrunner,
Saskia. 2026. "Human in Control: Shared Decision-Making with Clinical
Decision-Support Systems Under the Artificial Intelligence Act." Computer Law & Security Review 61 (July 2026): 106281. Open access. https://doi.org/10.1016/j.clsr.2026.106281
Human
oversight must be substantive, not nominal (GDPR Art. 29 Working
Party); EU AI Act Article 14(4)(b) legally recognizes automation bias;
Articles 26(11) and 86 exclude medical devices from patient disclosure
requirements; medical decision-making "must contend with uncertainty,
probabilities and varying value systems."
"Therapy Notes by AI Create False Narratives, Therapists Say." 2024. ClearHealthCosts. https://clearhealthcosts.com/blog/2024/06/therapy-notes-by-ai-create-false-narratives-therapists-say/
[Journalistic
source — cite as illustrative case, not peer-reviewed evidence.
Documents AI scribes inserting references to abuse and false diagnoses
never discussed in session.]
VII. Philosophical and Theoretical Sources
Dewey, John. 1922. Human Nature and Conduct: An Introduction to Social Psychology. New York: Henry Holt.
Part
III, "The Place of Intelligence in Conduct": deliberation as dramatic
rehearsal — imaginative projection of possible futures, affectively and
evaluatively weighted, enacted through choice.
Campbell, James. "Ethical Deliberation as Dramatic Rehearsal: John Dewey's Theory." Educational Theory. https://www.academia.edu/130273738/Ethical_Deliberation_as_Dramatic_Rehearsal_John_Deweys_Theory
Dewey, John. 2026 [secondary]. "Deliberation as Drama and Discovery." Chapter 7 in John Dewey's Human Nature and Conduct.
Cambridge: Cambridge University Press.
https://www.cambridge.org/core/books/john-deweys-human-nature-and-conduct/deliberation-as-drama-and-discovery/E245F2071F52595826
[2026 scholarly edition — useful for contemporary framing of the dramatic rehearsal concept.]
Tomasello, Michael. 2019. Becoming Human: A Theory of Ontogeny. Cambridge, MA: Harvard University Press. https://books.google.com/books/about/Becoming_Human.html?id=ZnhyDwAAQBAJ
Central
argument: uniquely human cognitive capacities — shared intentionality,
collaborative reasoning, normative attunement — are socially constituted
through developmental practice, not fixed biological endowments;
require exercise to remain robust.
Tomasello, Michael. 2005. "Understanding and Sharing Intentions: The Origins of Cultural Cognition." Behavioral and Brain Sciences 28 (5): 675–691. https://www.eva.mpg.de/documents/Cambridge/Tomasello_Understanding_BehBrainSci_2005_1555292.pdf
Empirical
grounding: three decades of comparative experiments with chimpanzees,
bonobos, and human children; shared intentionality as the basis of human
cultural accomplishment.
"Shared Intentionality." 2025. Open Encyclopedia of Cognitive Science, MIT. May 26, 2025. https://oecs.mit.edu/pub/sep9e3c2
"Tomasello
(2022, 2024) argued that the broadest perspective on these phenomena is
in terms of human agency: humans (and only humans) form joint agency."
VIII. Honest Empirical Caveats — To Flag in the Text
On model collapse in production LLMs: The Nature
2024 and ICLR 2025 papers establish the mechanism and demonstrate it
rigorously in controlled recursive experimental settings. Whether
collapse is yet measurably active at scale in the largest deployed
commercial models is empirically contested (see Hacker News discussion
thread on CACM March 2026 piece). The essay should mark this distinction
clearly. The mechanism is established; the production-scale timing is
an open empirical question.
On the entropic analogy:
Held as a structural illumination, not a physical law. The internet is
not a fully closed system — some fresh human-generated signal continues
to enter training corpora. The question is whether that signal is
sufficient in quality and volume to offset contamination. Presented as a
structural analogy, the framing is robust; presented as a physical
necessity, it overclaims.
On "exactly zero" labor saving:
Google's gloss, not a finding from the PMC papers. The accurate claim,
supported by Miner et al. and Sriharan et al., is that responsible use
of clinical AI scribes requires the same time investment as traditional
note-writing, with higher cognitive demand — making net labor savings
for conscientious practitioners approximately zero or negative.