Authors suspect that all of their books have already been purloined by the AI companies … at least in the case of Anthropic, it’s apparently as many as 7 million digital book files.
An even larger fear seems to be that at some point generative AI is going to be good enough to write books that the public will purchase … its ability to do so will stem from the books that have already been ingested.
So-called “artificial intelligence” seems like automated plagiarism to me.
The printing press was automated plagiarism too. Scribes knew how to copy books without mechanical assistance. Same complaints about dictionaries and thesauruses - why should authors get help finding synonyms?
AI is a tool. Some will misuse it with bad results. Some will use it well. If you think AI ruined a book, read a different book and vote with your wallet for the uses you want to see.
I get the concern about copyright and compensation - those are real issues worth addressing. But the blanket “AI is plagiarism” take ignores that we’ve had this panic with every new tool that changed how people create things.
The anti-AI hysteria matches the pro-AI hype right now.
The publishing industry already churns out a lot of chaff that the public will purchase. In some cases, the AI-generated books will be an improvement.
I think we can distinguish between careful, thoughtful, well-crafted, and original texts of genuine merit and the kind of formulaic writing that might as well be cranked out by a machine. The latter might not be plagiarism in fact, but are very nearly so in spirit.
PS - I wholeheartedly agree that creators of all kinds—be they wordsmiths, visual artists, performers, or whatever—should absolutely be compensated in some way for the content they made that the AI labs have sucked into their models.
I quoted a few lines from Mr. McIlroy’s blog post hoping to stimulate interest in the entire piece that he wrote. As usual, early replies share pre-conceived opinions triggered mostly by the brief quotes. There is so much more to the issues raised by so-called artificial intelligence. If you can’t be bothered to read the whole thing, I wish you wouldn’t bother replying.
I’m sorry, @karlnyhus. My response matched what you gave me.
It appeared you made an emotional claim, so I pushed back with an analogy.
If you want to engage with the article, feel free to highlight something substantive you were hoping we’d discuss. It’s a very long article. Were you hoping for a deeper dive into the legal framework, the economics, the technical distinctions, or the industry trends?
Your “automated plagiarism” opener didn’t give me the impression that’s the discussion you were really after. My apologies.
I assumed the quotes you excerpted captured the article’s thesis and expressed the point you wished to make: that you consider AI to be automated plagiarism. It wasn’t clear to me that I needed to read the entire article before addressing the claims made in the quoted material or your own claim. If you think an article you’ve linked to is worth reading in its entirety, either because it adds nuance to the quotes or because it raises other points worth discussing, you need only say so—you’re a regular and valued contributor to this forum, and I think many of us would try to make time for the article based on your recommendation.
Yes, my response was prompted by the quotes: I read them as the starting point for a discussion, not as a call to read the linked article. And I do read widely about AI—both as a technology of considerable promise and as a disruptive force in our lives as individuals, as creators, as citizens, and as members of a community: I’d like to think that my thoughts on the topic are more than pre-conceived opinions.
Every time I’ve tried to use AI to assist in reading a book by listing characters and giving synopsis I get about 50% garbage. The trouble is I don’t know which 50% is correct and which is wrong. I figure that if it can’t really help me with reading a book it won’t be able to write a decent one either. And if you look behind the curtain, there is no intelligence there to be able to write a decent book.
Sadly as always legislation fails to keep up with tech, these companies are being pursued by owners of the content they’ve used without permission, but due to the funding they receive they have the ability to get very good lawyers.
I think that AI is different from previous tools, AI isn’t being used to copy the published work, but it could be used to replace them… as a bare minimum it’s profiting from someone else’s copyright.
In the mean time, how something is written makes a difference.
The latest frontier models are most definitely capable of generating Cliff Notes at scale. Now even more people will be able to talk about books they haven’t read at cocktail parties.
In all seriousness, I’m much less concerned about AI being used to write books that I am about AI obviating the need to directly engage with a text in the first place. That being said, I’m not too proud to admit that I’ve found AI helpful in navigating my way through a dense, complex text or helping me put it into context with other work that addresses the same topic.
Here’s a screenshot from that Washington Post (WaPo) article I share for irony:
(Irony aside, WaPo appears to use Together.ai for this. It is not at all clear how the models Together.ai provides were trained. Maybe WaPo trained its own models from the ground up on its own content, and thus there are no ethical conflicts with this criticism of Anthropic?)
I’ve had my books and articles plagiarized by Anthropic, among other LLMs. I don’t mean a sentence here or there, but chunks of text, sometimes consecutive pages. I know that entire books and articles were used in training various LLMs.
Partly that’s because in some obscure niches, my stuff is the main source.
I have removed a lot of content from various digital places because my work is lifted without attribution, and sometimes used in ways that don’t make sense, because an LLM can’t understand what it “reads,” it just does rapid, sophisticated pattern-matching.
The lifting without citation, and lifting without the context are what really annoy me. Generally speaking, outside of text books and academic promotion & tenure, scholarly writing doesn’t make money,
But scholars are paid by citations, much like the Web runs on links.
Thanks for sharing. I don’t think that limiting licensing to RAG is going to work long-term but it’s still reasonable to pursue for awhile. Agree with what I understood to be part of the conclusion, that the court decision allowing training on pirated books won’t be reversed.
My favorite pullouts from the article:
A fundamental mistake that publishers are making is imagining that the information and knowledge contained within the books they publish has more value to advanced LLMs than does the data that underlies all of the interpretations and conclusions reached by often-expert, but oh-so-human authors.
Could AI be, in fact, a validation of traditional publishing? Unless we move into an entirely post-literate world, the careful reasoning and expression in long-form periodicals and books has no substitute.
Certain books will continue to make people wise; that will not stop being valuable, even if the publishers stop making money from them.
Totally agree. I hope AI in future is made to pay for everyone’s work that it has taken. Government’s have been appalling in addressing this because of the benefits of AI.
I have two books on Bible study and asked AI to synthesise their methods into one. It did it without asking me for any content. How is this possible without it having access to the complete text of those books? I’m hoping the bubble will burst and AI will be seen for what it is and that is benefiting from others people’s work and made to pay just remuneration.