Reddit Signs AI Content Licensing Deal Ahead of IPO

what could go wrong with training your ai based on the posts of the most racist and misogynistic people on the internet?

Damn Facebook is owned by Reddit now?

It’s not 4chan… but someone did train one of those once.

My god they’ll create a super redditor

That varies by subreddit, which might actually help in training LLMs to recognize the difference.

Just think, in 1000 years your body will be long dead but you’ll be forced to live on as a poster! Death is not an option. 😌

You know how artists can poison their images for AI… We need a way to poison content on Reddit

Shit posts would do it. That’ll turn the AIs into morons who spurt out “rizz” and “skibidi” instead of anything useful

i think that’s just called posting on reddit

I would say most of the content is already poison.

If you’re talking about Glaze or Nightshade, those techniques are not proven to be particularly effective. Lots of people want them to work but that doesn’t make it so.

I am waiting for an LLM that trained on 4chan it would be pure gold.

I don’t want STDs.

Not good enough it only trained on /pol/ it seems

Fair. The rest of the site is a lot more normal. More being a relative term, of course.

Well… We all knew that was coming. If you still have an account haven’t done so, now’s a good time to purge your account!

Why? How does it harm you in any meaningful way?

Even if it’s just another scheme to further concentrate wealth (and it is at least that), that harms everyone but the 0.1%.

I draw plenty of benefit from AI tools. There are open source models that anyone can run.

Better yet, use an overwrite script to help turn their training models to jelly

That’s what I just did with my account of 10 years. I had all comments overwritten with gibberish and purged them a few days later. I’ll send them a final DSGVO request and delete it afterwards.

Unless you live in the EU or California, odds are that just deletes the public data, I’m sure Reddit retains it and would sell it.

by forcing them to actually be open about it (this will not happen)

Reddit account data has been training AI for over a decade. If you ever used it, you’re already in a training set

That will remove your account from public view, but will it remove it from the data they use for AI training?

If not, you’re just enhancing the value of their proprietary data.

Why wouldn’t they enhance it themselves, like Twitter has been doing for months? Once they make signing in mandatory and implement per-user rate limits the information will disappear from the internet and will only be available to people who are paying in some way.

Done it a few months ago but then again if I was working at reddit and in charge of preparing the dataset to feed to the llm, I’d give it access to both a recent one and a snapshot from before July 2023 (or whenever shit hit the fan and we all came to lemmy), most edits would have been made in protest. And AI can figure out which ones by itself

I’d be very surprised if comments weren’t versioned in some way, so even if you delete or rewrite that data, it’s probably still there and a part of training data.

They said years ago that they only kept one previous version, which is why everyone overwrote and then deleted their stuff.

It’s possible that reddit changed that, but honestly? That requires a level of foresight that I believe is entirely beyond spez. He didn’t foresee AI products, he literally paid all the bandwidth for them to harvest the data, he didn’t foresee changes to API pricing, he didn’t foresee the protests, how long they’d last, or how many people just walked away.

Hell, in the previous big “closed subs” protest they’d never even considered a moderator rebellion: once the mods took the subs private, the admins were accidentally locked out as well - they had to negotiate to get them re-opened while they worked on backdoor changes that wouldn’t break reddit.

I just don’t see them having the foresight to add in preservation code, nor to allocate the database and storage space to keep up with it. I think if you overwrote and then deleted your stuff, reddit doesn’t have it anymore. Of course, it’s still out there, in Google’s cache and the internet archive and all the other snapshots she preservation schemes and the data already harvested for the various AIs, but at least it’s no longer indeed reddit’s control, and they won’t be able to profit from it.

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

Reddit Signs AI Content Licensing Deal Ahead of IPO(www.bloomberg.com)

Technology

!technology@lemmy.ml

Community stats

Community moderators