Facebook "Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal"

posted 9 days ago

blakestacey@awful.systems

techtakes@awful.systems

15 commentshide report

Kate Knibbs reports in Wired magazine:

Against the company’s wishes, a court unredacted information alleging that Meta used Library Genesis (LibGen), a notorious so-called shadow library of pirated books that originated in Russia, to help train its generative AI language models. […] In his order, Chhabria referenced an internal quote from a Meta employee, included in the documents, in which they speculated, “If there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.” […] These newly unredacted documents reveal exchanges between Meta employees unearthed in the discovery process, like a Meta engineer telling a colleague that they hesitated to access LibGen data because “torrenting from a [Meta-owned] corporate laptop doesn’t feel right 😃”. They also allege that internal discussions about using LibGen data were escalated to Meta CEO Mark Zuckerberg (referred to as “MZ” in the memo handed over during discovery) and that Meta’s AI team was “approved to use” the pirated material.

Sort:

Hot Top Controversial New Old

[ - ]

jaschop@awful.systems

31 points

8 days ago

Did they seed at least?

permalink

report

[ - ]

froztbyte@awful.systems

24 points

8 days ago

it’s facebook, they probably issued a takedown request for all their logged peers

permalink

report

parent

[ - ]

jaschop@awful.systems

6 points

8 days ago

The pivot-to-ai writeup is out, they did seed! I assume it’s documented then.

Multinational corporations can act ethically after all.

permalink

report

parent

[ - ]

froztbyte@awful.systems

5 points

8 days ago

Multinational corporations can act ethically after all.

I wouldn’t go that far

permalink

report

parent

Show more comments

[ - ]

David Gerard@awful.systemsM

3 points

7 days ago

It’s clear that they didn’t stop uploads of the torrents. It hasn’t been established in the documents we’ve seen so far that they actually had downloaders in turn. But they did clearly make the works available for upload.

permalink

report

parent

[ - ]

monk@lemmy.unboiled.info

31 points

9 days ago

Nice! Now simply fine them to pay significant royalty to every author in there, say, a millicent per word of everything they’ve generated before they get caught.

permalink

report

[ - ]

JeeBaiChow@lemmy.world

9 points

9 days ago

We should just start a meme movement that makes up an imaginary yet believable fact, like the lemmings jumping off a cliff thing, wait for the ais to repeat it and lobby for royalties. Do one for each of the major ai platforms - openai, reddit, meta, apple, google etc. we would eventually find out which public forums are training which bots.

permalink

report

parent

[ - ]

trolololol@lemmy.world

2 points

7 days ago

You don’t need that, all of them use everything

permalink

report

parent

[ - ]