When corporations scrape academic papers, it's justified. When individuals do it, it's inexcusable.(lemmy.ml)

posted 13 days ago

TheImpressiveX@lemmy.ml

piracy@lemmy.dbzer0.com

75 commentshide report

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

FaceDeer@fedia.io

24 points

13 days ago

AI models don’t actually contain the text they were trained on, except in very rare circumstances when they’ve been overfit on a particular text (this is considered an error in training and much work has been put into coming up with ways to prevent it. It usually happens when a great many identical copies of the same data appears in the training set). An AI model is far too small for it, there’s no way that data can be compressed that much.

permalink

report

parent

[ - ]

EmbarrassedDrum@lemmy.dbzer0.com

8 points

13 days ago

thanks! it actually makes much sense.

welp guess I was wrong. so back to .edu scraping!

permalink

report

parent