Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.
Curious if the AI company actually bought those books or if they just came across them by pirating.
Oh, they’re 100% pirated. Sorry this isn’t open, but the preview should give you enough information. The database is available elsewhere, IIRC. https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/
Ok so it’s been stealing art now it’s coming for authors. At what point do we hold the coalition who started this shit culpable for numerous accounts of plagiarism?
This is no different than every other capitalist enterprise. The whole system works on taking a public resource, claiming private ownership of it, and then selling it back to the public for profit.
First it was farmland, then coal and minerals, oil, seafood, and now ideas. Its how the system works and is the whole reason people have been trying to stop it for the past 150 years.
The people making the laws are there because they and/or their parents and/or grandparents did the exact same thing. As despicable and corrupt as it is you won’t change it by complaining and no-one is going to make a law to stop it.
Does this fall under fair-use part of copyright?
The training argument is probably going to come up dry by the time the court works its way through expert testimony, as the underlying argument for training as infringement is insane.
But where OpenAI is probably in hot water is that torrenting 100k books in the first place runs afoul of existing copyright legislation.
Everyone is debating the training in these suits, but the real meat and potatoes is going to be the initial infringement of obtaining the books, not how they were subsequently used.
Fair use is any copying of copyrighted material done for a limited and “transformative” purpose, such as to comment upon, criticize, or parody a copyrighted work.
I don’t see why it should.
The creation of the AI model is transformative. The AI’s model does not contain a literal copy of the copyrighted work.
There’s an idea by Barath Raghavan about an AI dividend that companies pay each netizen a share for the data they use to train these models.
I am into this idea if companies can’t even do a simple opt-in mechanism.