Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.

8 points

Curious if the AI company actually bought those books or if they just came across them by pirating.

permalink
report
reply
4 points

Oh, they’re 100% pirated. Sorry this isn’t open, but the preview should give you enough information. The database is available elsewhere, IIRC. https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/

permalink
report
parent
reply
11 points

Ok so it’s been stealing art now it’s coming for authors. At what point do we hold the coalition who started this shit culpable for numerous accounts of plagiarism?

permalink
report
reply
4 points

TIL “culpable” is an English word too. Culpable means guilty in Spanish and I thought you were a Spanish speaker doing spanglish. Now I know you’re just a man of culture.

permalink
report
parent
reply
49 points

This is no different than every other capitalist enterprise. The whole system works on taking a public resource, claiming private ownership of it, and then selling it back to the public for profit.

First it was farmland, then coal and minerals, oil, seafood, and now ideas. Its how the system works and is the whole reason people have been trying to stop it for the past 150 years.

The people making the laws are there because they and/or their parents and/or grandparents did the exact same thing. As despicable and corrupt as it is you won’t change it by complaining and no-one is going to make a law to stop it.

permalink
report
reply
13 points

God damned right. Every “new” thing tends to be stolen. In more event history, its stolen from other capital, or from innovation with a free license, rather than artwork. Publishers might actually be able to make a problem out of this.

permalink
report
parent
reply
3 points

Does this fall under fair-use part of copyright?

permalink
report
reply
4 points

The training argument is probably going to come up dry by the time the court works its way through expert testimony, as the underlying argument for training as infringement is insane.

But where OpenAI is probably in hot water is that torrenting 100k books in the first place runs afoul of existing copyright legislation.

Everyone is debating the training in these suits, but the real meat and potatoes is going to be the initial infringement of obtaining the books, not how they were subsequently used.

permalink
report
parent
reply
4 points
*
Deleted by creator
permalink
report
parent
reply
8 points

It hasn’t been tested in court yet but I don’t see why it shouldn’t.

permalink
report
parent
reply
3 points

Fair use is any copying of copyrighted material done for a limited and “transformative” purpose, such as to comment upon, criticize, or parody a copyrighted work.

I don’t see why it should.

permalink
report
parent
reply
7 points

The creation of the AI model is transformative. The AI’s model does not contain a literal copy of the copyrighted work.

permalink
report
parent
reply
7 points

There’s an idea by Barath Raghavan about an AI dividend that companies pay each netizen a share for the data they use to train these models.

I am into this idea if companies can’t even do a simple opt-in mechanism.

permalink
report
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 11K

    Posts

  • 506K

    Comments