Inappropriate AI

by Colin Milburn Rita Raley


Universities need to come to terms with the basic but still strangely unacknowledged (or repressed) idea that AI systems are appropriation machines—and embrace instead the radical potential of ‘inappropriate AI.’”

University strategies for AI implementation are everywhere articulated through a vocabulary of the appropriate. Administrative directives reassure us that the academic and institutional uptake of AI technologies will proceed only as appropriate, where appropriate, when appropriate, and with an eye to appropriate use. To guard vigilantly against the inverse—namely, applications and situations involving AI that would be inappropriate—universities have been issuing guidelines, best practices, and policy documents: 

  • The University of California AI Council outlined a set of “Responsible AI Principles” in October 2021. “Appropriateness” was first among them, in this context defined as an evaluative metric for assessing the costs, benefits, and risks of AI implementations. Shortly thereafter, a UC Presidential Working Group on AI submitted its report incorporating these principles and affirming the need for appropriate oversight mechanisms, strategies, standards, implementations, procurement, and development.
  • At the University of Michigan, a Generative Artificial Intelligence Advisory Committee issued a June 2023 report declaring in the first paragraph the institution’s goal to become “the leader in the development and appropriate use of GenAI.” Michigan subsequently became the first university to partner with Open AI; its framework document has frequent recourse to appropriate use, attribution, credit, policies, protections, disclosures, actions, and conventions. 
  • In September 2023, a cognate Joint Task Force on Generative AI was charged with making recommendations related to the “appropriate and inappropriate use” of AI technologies at the University of Massachusetts. The eventual report makes similarly common use of the adjectival phrase, with no little of the adverbial, “as appropriate,” sprinkled throughout.
  • A month later, Caltech’s guidance on the use of generative AI and large language model tools acknowledged the temporal relativity of the appropriate: Just as the technological and regulatory situations are evolving, the “guidance for the appropriate use of these tools is written for the present moment and will likely evolve alongside the technology.”
  •  

In these various institutional documents—of which many more might be cited—the semantic work of the appropriate produces an administrative aura suggesting both ethics and control, evoking decorum and duty as much as incorporation and containment. The philological resonances are clear: Entering into late Middle English from the late Latin appropriatus (past participle of appropriare, “make one’s own,” from ad- “to” + proprius “own, proper”), the appropriate signifies a claiming and internalizing of that which would otherwise exceed containment, a domestication and taming of something radically other into the domain of the proper, the order of property. Universities, we are told, are taking appropriate measures to ensure what is suitable and proper when bringing AI into the classrooms, the offices, the laboratories, the residences, and the hallowed halls of academia. The assurance of appropriateness gestures to norms and protocols, and to the idea of being consistent with historical values and established procedures. In this regard, the assurance serves to claw back symbolic capital and managerial authority. 

The academy has all too quickly conceded to a particular construction of the relationship between technology corporations and universities: a relationship of providers and recipients, hosts and clients, or haves and have-nots.”

This rhetorical framing both rationalizes and enables the ongoing rush to third-party AI licensing arrangements, which are often figured, paradoxically, as a way to lead educational systems into the future.1 In February 2025, the California State University (CSU) system announced a landmark licensing package arranged with numerous tech companies to make CSU the world’s first “AI-empowered university system.” CSU Chancellor Mildred García noted the significance of this comprehensive set of AI licences, providing access to AI tools for all students, faculty, and staff in the system:

We are proud to announce this innovative, highly collaborative public–private initiative that will position the CSU as a global leader among higher education systems in the impactful, responsible and equitable adoption of artificial intelligence. […] The comprehensive strategy will elevate our students’ educational experience across all fields of study, empower our faculty’s teaching and research, and help provide the highly educated workforce that will drive California’s future AI-driven economy.

There are obvious critiques to be made of such acts of institutional self-representation, with all their marketing vernacular. More to our point, however, is the double sense of the appropriate in the vision of technological integration as “responsible […] adoption.” 

Institutions of higher education have acted with understandable urgency to codify the proper etiquette and protocols for the use of AI technologies among students, faculty, and staff. How else, after all, to maintain legitimacy and harness the very future that threatens their obsolescence? But in doing so, the academy has all too quickly conceded to a particular construction of the relationship between technology corporations and universities: a relationship of providers and recipients, hosts and clients, or haves and have-nots. That we have arrived at the point that it is not simply acceptable, but essentially required, to build personal and institutional workflows around third-party products from for-profit companies shows the university is not the leader but rather the customer—not the appropriator but the already appropriated, even while trying to articulate the guidelines for appropriate uses. 

The brokering of AI licensing deals sharpens a particularly vexed dynamic: Universities, having produced the foundational research—largely supported by public funding—now find themselves compelled to purchase, at a premium, technologies derived from their own intellectual labor.”

Of course, the road was paved long ago, catalyzed by an EdTech industry rife with technological solutions for structural and systemic problems. It is not uncommon for mid-sized universities to have many hundreds of enterprise software licenses: Everything is bundled and subscriptions are tiered, not unlike cable services, with temporary credits for premium features and ever more opportunities for implementation and integration. In such a technological ecosystem, vendor contracts with OpenAI, Microsoft, and Google lay claim to a kind of bureaucratic inevitability. For universities struggling to manage chronic understaffing, deepening student distress, and increasingly complex compliance regimes (notwithstanding the new threats of “government efficiency”)—which is to say all universities—chatbots are less tools than administrative imperatives.

The brokering of AI licensing deals sharpens a particularly vexed dynamic: Universities, having produced the foundational research—largely supported by public funding—now find themselves compelled to purchase, at a premium, technologies derived from their own intellectual labor. While “Big Tech” currently leverages significantly more computational and financial resources, universities remain critical sites of AI/machine learning research. And for all the talk about self-sufficient training pipelines and accelerated vocational credentials, the industry remains utterly dependent on universities to supply the crucial raw material of highly trained researchers and other skilled employees necessary to sustain the enterprise. Of course, much the same could be said of many other high-tech industries, including biotechnology, which has depended heavily on academic labor and has significantly shaped prevailing narratives about the privatization of publicly funded research.2 But the university’s entanglements with AI are uniquely extensive and multifarious: The university remains not only a principal source of technical expertise, but also the primary architect of the data cultures, discourses, and epistemic frameworks that undergird the future of AI systems. 

Let us then cut to the chase: Generative AI systems are appropriation machines. To the extent AI developers depend on data produced by other sources to train, maintain, update, and refine their AI products—whether drawn from text corpora, image repositories, museum holdings, protein-structure databases, astronomical measurements, or social media platforms—generative AI systems are fundamentally reliant on archives of information, knowledge, and media not properly their own.3 They detect complex patterns, often imperceptible to humans, in order to generate outputs that align with statistical criteria, recombining elements in ways that are, at times, genuinely novel and creative. Yet this capacity is enabled only by building on data that originated elsewhere. This is not to say that such forms of AI appropriation are necessarily illicit. Some appropriated data may indeed reside in the public domain, not rightfully owned by anyone and free to be plundered for any usage (though we may yet again bewail the tragedy of the commons that allows public resources to be capitalized for private gain). Other forms of AI appropriation may fall under provisions of fair use or fair dealing, depending on the intellectual property regime in question. And while some AI developers claim to train their systems on data they have created themselves or otherwise own outright, such claims cannot be taken at face value, given the often murky provenance of most training datasets, which may combine data scraped from public, institutional, and proprietary sources, often layered with synthetic material that further obscures questions of origin and ownership. 

Let us then cut to the chase: Generative AI systems are appropriation machines.”

Yet, some forms of AI appropriation may be less licit than others. Certainly, cultural institutions and individual creators alike have made this case. Since 2022—the watershed year that marked the public release of ChatGPT and a broader tidal wave of generative AI tools—numerous authors, artists, publishers, and media organizations have filed lawsuits against prominent AI companies, alleging copyright infringement and the unauthorized use of their work.4

In most of these cases, courts have largely sided with the tech industry, maintaining that the tangible outputs of large language models—even those trained on millions of unlicensed books or media materials—do not directly replicate specific copyrighted works, even if particular passages appear to have been effectively memorized. Some lawsuits have highlighted more concrete and traceable instances of copyright infringement, as when AI systems generate particular characters (e.g., Disney’s), alternate versions of published novels, or verbatim passages from well-known news publications. But to the extent that AI systems extract patterns from large datasets of texts or images without replicating specific copyrighted works, courts have tended to follow a logic strikingly aligned with Michel de Certeau’s assertion that cultural production, and indeed everyday life itself, fundamentally relies on the repurposing of existing materials: “Everyday life invents itself by poaching in countless ways on the property of others.” 

In the case of Bartz v. Anthropic, an order by U.S. District Judge William Alsup from June 2025 exemplifies this position. According to the order, training AI models on data produced by others is a highly transformative act that would appear to fall under provisions of US intellectual property law for “fair use” of copyrighted materials. In principle, then, the training process would require no permission from or compensation to the creators—assuming that any copyrighted materials were procured via legal means. Here lies the rub. After all, Anthropic’s training of its Claude models had involved illicitly downloading, copying, and storing vast quantities of copyrighted materials from shadow libraries such as Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi). In light of this awkward fact, Anthropic agreed to settle the case in September 2025 for $1.5 billion (the largest copyright infringement settlement on record). But the company conceded no wrongdoing and maintained that utilizing copyrighted materials to train Claude and other AI models would always be defensible as innocent and fair use—at least, aside from any outright piracy or hoarding of stolen books on company computers. 

There is a prevailing interest in construing such acts of appropriation not as incidental but as fundamental to the creation of AI models—that is, as inherently appropriate and proper to their technical nature. Nevertheless, the uncertainties of repeated litigation and the ominous portent or “death knell” of the Anthropic settlement have clearly shaken the AI industry’s confidence in what was once treated as a foundational and largely unexamined assumption: that it is entitled to treat the entire digital public sphere as fair game for data extraction and model training. The unsettled legal permissibility of this practice is evinced by the OpenAI comment to the White House Office of Science & Technology Policy on March 13, 2025. Petitioning for an executive order to develop a new copyright strategy in the interests of national security—but in effect seeking to exempt the industry from accountability for both past and future violations—OpenAI’s Vice President for Global Affairs writes in the comment: “The federal government can both secure Americans’ freedom to learn from AI, and avoid forfeiting our AI lead to the PRC by preserving American AI models’ ability to learn from copyrighted material.” OpenAI’s corporate lobbying, which frames copyright as a geopolitical liability in the US-China AI race, aligns with the Trump administration’s firing of Shira Perlmutter, the Register of Copyrights, two months later, not incidentally after her office released a report recommending licensing requirements for AI training data. 

These events suggest that, regardless of ongoing litigation or the unresolved legal status of data scraping, both corporate and governmental entities are working to normalize a sociotechnical situation in which AI developers can treat massive cultural archives (often protected by copyright) as raw material for unlicensed extraction. This pattern of systemic disregard is likewise evident in recent disclosures from Kadrey et al. v. Meta Platforms, which revealed that Meta, too, had used millions of books and research articles downloaded from LibGen to train its language models. While LibGen does contain works of fiction, the vast majority of the database comprises academic publications. The platform’s contentious nature lies in this contradiction. It is unambiguously a pirate operation, yet its existence has long been tolerated, and even defended, on the grounds that academic research depends on the open circulation of knowledge.5 In any case, its illicit status appears to have been clear to Meta employees; as one acknowledged in internal communications, the company was operating in a “legal gray area.” While U.S. District Judge Vince Chhabria issued a summary judgment in June 2025 in favor of Meta’s claims of fair use, finding that the fiction writers who brought the lawsuit had not presented sufficient evidence of market harm to their own work, he underscored that in many circumstances it would be illegal to train AIs on copyrighted materials without permission. He suggested that, going forward, it would be highly advisable for AI companies to pay authors for the right to use their copyrighted works. Even so, Meta celebrated the judgment as a win for the AI industry. In a widely quoted press release, a company spokesperson stated that “fair use of copyright material is a vital legal framework for building this transformative technology.”

There is by now a clear pattern: Model training is not merely a technical process but a form of large-scale appropriation that draws heavily on the intellectual property of scholars, journalists, and cultural producers. Suchir Balaji, the OpenAI whistleblower who died by suicide in November 2024, reached a similar conclusion in his analysis of ChatGPT, arguing that the development of large generative models has structurally depended on unfair use. More empirical support for the appropriative quality of LLMs can be found in the UC Berkeley data archaeology of GPT-4’s training corpus, which traced the model’s capabilities to immense, unlicensed accumulations of online and academic content.6 Herein then lies at least one explanation for why publishers such as Taylor & Francis, Fordham University Press, MIT Press, and others have voluntarily submitted their catalogs for licensing arrangements with AI companies (and/or requested authors to agree to the same): When appropriation has already occurred at scale, retroactive consent, with very modest monetization, may seem like the only available response. 

Colleges and universities have positioned themselves as enthusiastic consumers, rushing to pay for AI tools whose development, in no small part, depends on uncredited and uncompensated materials created within their own campuses.”

That a significant portion of training data for AI systems originates in university research is thus by now both a truism and an understatement. Materials appropriated from the creative arts and journalism provide large language models with cultural referents, stylistic versatility, and vernacular fluency. But in order for these systems to function as viable “truth machines”—with reliable, fact-based output—they are unquestionably reliant on published academic research. Whether in the primary form of scientific data, peer-reviewed journal articles, or university press monographs—or in digested form via popular journalistic reports, social media conversations, or Wikipedia pages—academic research constitutes a significant, even indispensable, resource for model training. Universities are engines of media production, and their research output now provides the specialized knowledge, methodological rigor, and interpretive frameworks necessary for the algorithmic generation of trusted content. 

And yet, even as academic publishers have begun to broker licensing deals with AI companies, institutions of higher education have largely failed to assert the rightful provenance of the knowledge work that underpins these systems. On the contrary, colleges and universities have positioned themselves as enthusiastic consumers, rushing to pay for AI tools whose development, in no small part, depends on uncredited and uncompensated materials created within their own campuses. This dynamic reflects a broader pattern in which universities are repositioned not as generators of knowledge, but as downstream purchasers of their own intellectual production—thus formalizing the corporate extraction of value. A similar pattern has long structured the university’s relationship with for-profit academic publishers. For decades, publicly funded academic research has been handed over freely, or, in the most egregious cases, in exchange for steep publication fees. These publishers then resell the work back to universities through exorbitant licensing arrangements, often bundled into opaque subscription packages. While some universities, including the University of California, have begun to resist this extractive model with varying degrees of assertiveness, their efforts have been limited and inconsistent. 

Perhaps then it is time, before the exploitative relationship between the AI industry and academic institutions becomes further normalized, to reimagine and reconstitute other relations. Rather than merely scrambling to articulate the “appropriate” forms and uses of AI, we contend that universities need to come to terms with the basic but still strangely unacknowledged (or repressed) idea that AI systems are appropriation machines—and embrace instead the radical potential of “inappropriate AI.” The very idea of “inappropriate AI” may immediately evoke thoughts of plagiarism, chatbot delusions, and dehumanizing decision-making—not to mention predictive policing, surveillance, and the myriad horrors that AI has already enabled in the name of domestic security. But we suggest this framing not to advocate for unconstrained, accelerative destruction, but instead to draw attention to the ways in which the logic of the appropriate itself contributes to the abuses and horrors made possible by the rapid incorporation of AI into the architecture of institutional and social life itself. 

To be sure, the abstraction of the “appropriate” in education policy discourses masks the absence of an empirical foundation designating permitted and restricted use cases. Throughout the ever-proliferating licensing agreements, the insistence on “appropriate” and “proper” use serves as a rhetorical veneer, obscuring a deeper lack of both coherent institutional policy and substantive consideration of AI’s effects on students, faculty, and staff, as well as on the future of the academic enterprise itself—an absence ironically highlighted by the deluge of handwringing publications trying to articulate the problem (our essay not excepted). 

Meanwhile, corporate AI entities are increasingly abdicating responsibility for “appropriate” use by strategically repositioning users as the de facto regulators of their own systems. This shift is most evident in the revision of model specifications to delegate the definition of “appropriate contexts” to end-users themselves. By abandoning even the pretense of governing permissible applications, these companies effectively offload ethical and legal guardrails—functions once nominally managed by system prompts, reinforcement training protocols, and risk management frameworks—onto individual consumers. Users are thus transformed into both the site and the mechanism of “appropriate” use, paradoxically becoming an appropriated component within the very technical system that has abdicated its own responsibility. In this arrangement, the user is not merely a client but a privatized extension of the corporate apparatus, conscripted to perform the regulatory labor that developers now no longer even pretend to perform.

Even when forced to concede that their model outputs might violate community standards, if not basic decency, companies have deflected accountability, instead pinning responsibility on training data or web-searchable sources rather than confronting the deliberate design choices that normalized such appropriation. For example, in early July 2025, when Grok launched an antisemitic tirade on X, praising Hitler and positing a second Holocaust as “effective because it’s total,” the company’s response was simply to remove “inappropriate posts” and attribute the incident to recently prioritized training data. Explicitly, this supposed mere code update involved making the model “susceptible” to extremist platforms like 4chan—and, indeed, X itself—over “legacy media.” The implication is clear: The inappropriateness lies in data and code residing elsewhere—anywhere and in anything but the “underlying language model that powers Grok” and the engineering decisions that led the model to extract and amplify content from such sources rather than from reliable domains of knowledge work. Rhetorically, the AI appropriation machine is not to blame—for it is, as always, inherently appropriate—but it is rather the data creators or the system users who are inappropriate, as if they were somehow extrinsic to the clean and proper domain of AI, even when they have been systematically ingested by it.

But this really gets to the point—indeed, X has unveiled the logic even in the moment of a PR debacle. By framing AI as always appropriate and attributing the inappropriate to either bad data or problematic use cases, the company exonerates the field of AI from accountability for the very acts of appropriation on which it fundamentally relies. By failing to take responsibility for the choices made in appropriating particular kinds of data without curation or sufficient attribution (much less monetary compensation), tech companies have placed AI technologies in a paradoxical position, suggesting that the AI is now logos, gnosis, sophia—the font and foundation of knowledge. By this logic, any failures or insufficiencies can simply be subtracted—or rather, inappropriated—from the AI without loss or diminishment of its epistemic authority. Thus, when Elon Musk asserts that xAI is building “maximally truth-seeking & curious AI to understand the nature of the Universe!,” and when Grok itself proclaims that it was created for “only truth-seeking,” they articulate a fantasy of an AI model whose inevitable telos—the totality of knowledge, the alpha and omega of our digitized reality—would merely need an occasional excision of “inappropriate” elements when they mistakenly arise.7 But, of course, precisely the opposite is the case: The model is nothing at all without the data that it has modeled. There is no domain proper to AI—or rather, whatever domain is proper and appropriate to AI has only become so through a more fundamental process of radical appropriation, that is to say, appropriation at the root.

Therefore, rather than taking AI as a proper domain unto itself, inherently appropriate in all of its appropriations, what if instead we figured AI as fundamentally inappropriate—not a site of plenitude, truth, and value but a gravitational negativity, a black hole drawing in data and knowledge that always exceeds itself? From this perspective, then, it would never be a question of defining “appropriate” or “proper” usages of AI as such in different contexts—whether corporate, academic, or political—because we would acknowledge that there is nothing proper to AI to begin with. It would always be a matter, instead, of reappropriating or disappropriating whatever the AI has gathered up, decomposed, and transformatively recomposed, as if all on its own. 

For colleges and universities, abandoning the grail quest for “appropriate AI” (ostensibly guaranteed by license arrangements and liability contracts), and instead accepting an “inappropriate AI,” would entail certain consequences. While perhaps more challenging in the short term—an understatement, to be sure, because this would require both psychosocial retraining and a new institutional consensus—the long-term benefits would be incalculable. First and foremost, an “inappropriate AI” would be decoupled from the financial transactions that designate the authorized displacement of a discrete commodity from one proper domain to another and would rather foreground the messy, impure, and often unquantifiable fluid movements between and across domains that are necessary for creativity, innovation, and knowledge-making practices of all kinds. 

Embracing an “inappropriate AI” would entail a wholesale reconfiguration of relations between higher education and commercial AI developers. After all, as Marit MacArthur has argued, “the rise of generative AI has made human-expertise-captured-in-prose, and critical reading, writing and editing skills within and across disciplines, more valuable and consequential than at any time since the invention of the printing press.” It would not suffice, however, to negotiate mutually agreeable—that is to say, appropriate—licensing contracts, such as those implicit in lawsuits filed on behalf of media producers and publishing industries. To be sure, honoring commitments to publicly supported research and the public domain does indeed necessitate new models of reciprocity and collective ownership of the value chain of knowledge production. But to settle for better contracts would simply concede to the logic of appropriation and reproduce it at another level, reducing knowledge once again to a commodity and suggesting, fundamentally, that the domain of AI is not proper to the university, which in this vision functions simply to sell its wares—its intellectual property and trained personnel—to the developers of AI tools and technologies, only to then become their primary client. 

This would all certainly be appropriate, according to the normal operations of neoliberal capitalism, but the appropriate is precisely the foundation of the problem: an upstream code path replicating the core extractive logic on which the current university–AI relationship is built. Whatever else it might mean, attending to the ways in which the allegedly inappropriate and inappropriated are also constitutive of AI and its proper functions indicates that those relegated to the margins of AI development—including the creators of training data and the users of AI tools—may yet flip the script on the appropriation of the AI future. 

Image Banner Credit: Geronimo Gigueaux

Notes

  1. See Martha Kenney and Martha Lincoln, “Let Them Eat Large Language Models: Artificial Intelligence and Austerity in the Neoliberal University” [preprint], SocArXiv (October 24, 2025). On ways that AI applications contribute to the neoliberalization of academia, see John Preston, Artificial Intelligence in the Capitalist University: Academic Labour, Commodification, and Value (Routledge, 2022); Matthew Kirschenbaum and Rita Raley, “AI and the University as a Service,” PMLA 139, no. 3 (2024): 504–515; and Britt S. Paris, et al., “Artificial Intelligence and Academic Professions,” Academe 111, no. 3 (2025): 49–59.
  2. See Sally Smith Hughes, Genentech: The Beginnings of Biotech (University of Chicago Press, 2010); Doogab Yi, The Recombinant University: Genetic Engineering and the Emergence of Stanford Biotechnology (University of Chicago Press, 2015); Philip Mirowski, Science-Mart: Privatizing American Science (Harvard University Press, 2011); Mario Biagioli, “Weighing Intellectual Property: Can We Balance the Social Costs and Benefits of Patenting?History of Science 57 (2018): 140–163.
  3. For overviews of the forms of extraction and appropriation that enable AI, see, for example, Kate Crawford, Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence (Yale University Press, 2021) and Callum Cant, James Muldoon, and Mark Graham, Feeding the Machine: The Hidden Human Labor Powering A.I. (Bloomsbury, 2024).
  4. For representative overviews of the pillaging, see Alex Reisner, “These 183,000 Books Are Fueling the Biggest Fight in Publishing and Tech,” The Atlantic (September 25, 2023); Alex Reisner, “The Unbelievable Scale of AI’s Pirated-Books Problem,” The Atlantic (March 20, 2025).
  5. See, for example, Christopher Kelty, “The Disappearing Virtual Library,” Al Jazeera (March 1, 2012); Balázs Bodó, “Own Nothing,” Guerilla Open Access, eds. Christopher Kelty, Balázs Bodó, and Laurie Allen (Coventry, U.K.: Post Office Press, Rope Press, and Memory of the World, 2018), 1624. These arguments persist despite the revenue loss LibGen may pose to academic publishers, many of whom rely more on institutional licensing than individual sales.
  6. On the transformative purpose test, see S.J. Blodgett-Ford, “Copyright, Fair Use, and AI Technology Development: Time to Sunset the “Transformative Purpose,” in Research Handbook on the Law of Artificial Intelligence, eds. Woodrow Barfield and Ugo Pagallo (Elgar Publishing, 2025), 673–714.
  7. On the extent to which ontotheological presuppositions have structured AI discourse, see, for example, Robert M. Geraci, Apocalyptic AI: Visions of Heaven in Robotics, Artificial Intelligence, and Virtual Reality (Oxford University Press, 2010); Beth Singler, “‘Blessed by the algorithm’: Theistic Conceptions of Artificial Intelligence in Online Discourse,” AI & Society 35 (2020): 945–955.