8 points

what could go wrong with training your ai based on the posts of the most racist and misogynistic people on the internet?

permalink
report
reply
11 points

Damn Facebook is owned by Reddit now?

permalink
report
parent
reply
11 points

It’s not 4chan… but someone did train one of those once.

permalink
report
parent
reply
12 points

My god they’ll create a super redditor

permalink
report
parent
reply
7 points

That varies by subreddit, which might actually help in training LLMs to recognize the difference.

permalink
report
parent
reply
10 points

Just think, in 1000 years your body will be long dead but you’ll be forced to live on as a poster! Death is not an option. 😌

permalink
report
reply
31 points

You know how artists can poison their images for AI… We need a way to poison content on Reddit

permalink
report
reply
11 points

Shit posts would do it. That’ll turn the AIs into morons who spurt out “rizz” and “skibidi” instead of anything useful

permalink
report
parent
reply
27 points

i think that’s just called posting on reddit

permalink
report
parent
reply
56 points

I would say most of the content is already poison.

permalink
report
parent
reply
13 points

If you’re talking about Glaze or Nightshade, those techniques are not proven to be particularly effective. Lots of people want them to work but that doesn’t make it so.

permalink
report
parent
reply
31 points

I am waiting for an LLM that trained on 4chan it would be pure gold.

permalink
report
reply
-4 points

Fuck 4chan

permalink
report
parent
reply
2 points

I don’t want STDs.

permalink
report
parent
reply
1 point

Oh no.

permalink
report
parent
reply
12 points
6 points

Not good enough it only trained on /pol/ it seems

permalink
report
parent
reply
6 points

Fair. The rest of the site is a lot more normal. More being a relative term, of course.

permalink
report
parent
reply
7 points
2 points

She got the sprit

permalink
report
parent
reply
90 points

Well… We all knew that was coming. If you still have an account haven’t done so, now’s a good time to purge your account!

permalink
report
reply
-21 points

Why? How does it harm you in any meaningful way?

permalink
report
parent
reply
23 points

Even if it’s just another scheme to further concentrate wealth (and it is at least that), that harms everyone but the 0.1%.

permalink
report
parent
reply
-23 points

I draw plenty of benefit from AI tools. There are open source models that anyone can run.

permalink
report
parent
reply
12 points

Better yet, use an overwrite script to help turn their training models to jelly

permalink
report
parent
reply
4 points

That’s what I just did with my account of 10 years. I had all comments overwritten with gibberish and purged them a few days later. I’ll send them a final DSGVO request and delete it afterwards.

permalink
report
parent
reply
37 points

Unless you live in the EU or California, odds are that just deletes the public data, I’m sure Reddit retains it and would sell it.

permalink
report
parent
reply
11 points
Deleted by creator
permalink
report
parent
reply
3 points
*

by forcing them to actually be open about it (this will not happen)

permalink
report
parent
reply
18 points

Reddit account data has been training AI for over a decade. If you ever used it, you’re already in a training set

permalink
report
parent
reply
12 points
*

That will remove your account from public view, but will it remove it from the data they use for AI training?

If not, you’re just enhancing the value of their proprietary data.

permalink
report
parent
reply
2 points

Why wouldn’t they enhance it themselves, like Twitter has been doing for months? Once they make signing in mandatory and implement per-user rate limits the information will disappear from the internet and will only be available to people who are paying in some way.

permalink
report
parent
reply
2 points
*

Done it a few months ago but then again if I was working at reddit and in charge of preparing the dataset to feed to the llm, I’d give it access to both a recent one and a snapshot from before July 2023 (or whenever shit hit the fan and we all came to lemmy), most edits would have been made in protest. And AI can figure out which ones by itself

permalink
report
parent
reply
8 points

I’d be very surprised if comments weren’t versioned in some way, so even if you delete or rewrite that data, it’s probably still there and a part of training data.

permalink
report
parent
reply
3 points

They said years ago that they only kept one previous version, which is why everyone overwrote and then deleted their stuff.

It’s possible that reddit changed that, but honestly? That requires a level of foresight that I believe is entirely beyond spez. He didn’t foresee AI products, he literally paid all the bandwidth for them to harvest the data, he didn’t foresee changes to API pricing, he didn’t foresee the protests, how long they’d last, or how many people just walked away.

Hell, in the previous big “closed subs” protest they’d never even considered a moderator rebellion: once the mods took the subs private, the admins were accidentally locked out as well - they had to negotiate to get them re-opened while they worked on backdoor changes that wouldn’t break reddit.

I just don’t see them having the foresight to add in preservation code, nor to allocate the database and storage space to keep up with it. I think if you overwrote and then deleted your stuff, reddit doesn’t have it anymore. Of course, it’s still out there, in Google’s cache and the internet archive and all the other snapshots she preservation schemes and the data already harvested for the various AIs, but at least it’s no longer indeed reddit’s control, and they won’t be able to profit from it.

permalink
report
parent
reply

Technology

!technology@lemmy.ml

Create post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

Community stats

  • 4K

    Monthly active users

  • 2.5K

    Posts

  • 40K

    Comments

Community moderators