Reddit said in a filing to the Securities and Exchange Commission that its users’ posts are “a valuable source of conversation data and knowledge” that has been and will continue to be an important mechanism for training AI and large language models. The filing also states that the company believes “we are in the early stages of monetizing our user base,” and proceeds to say that it will continue to sell users’ content to companies that want to train LLMs and that it will also begin “increased use of artificial intelligence in our advertising solutions.”

The long-awaited S-1 filing reveals much of what Reddit users knew and feared: That many of the changes the company has made over the last year in the leadup to an IPO are focused on exerting control over the site, sanitizing parts of the platform, and monetizing user data.

Posting here because of the privacy implications of all this, but I wonder if at some point there should be an “Enshittification” community :-)

143 points

Reddit has long had an issue with confidently providing false statements as fact. Sometimes I would come along a question that I was well educated on, and the top voted responses were all very clearly wrong, but sounded correct to someone who didn’t know better. This made me question all the other posts that I had believed without knowing enough to tell otherwise.

Llms also have the same issue of confidently telling lies that sound true. Training on Reddit will only make this worse.

permalink
report
reply
65 points

Yeah all of my most down voted reddit comments were the ones where I replied about something I’m an actual expert in. Scary stuff

permalink
report
parent
reply
57 points

The voting system let’s people push comments to the top that they want to be true, not necessarily things that are true.

permalink
report
parent
reply
46 points
*

There’s also the issue of reddit comment sorting being entirely dominated by time. In something like 90% of posts, the top comment is one of the first five. Literally all you have to do is just comment first, and it’ll likely be the top.

permalink
report
parent
reply
5 points

I strongly agree with this comment. To show my appreciation, you have my upvote. Had I only agreed a little bit, I might have not voted at all. If that comment had made me angry, I might have downvoted.

Actually calling these things votes instead of likes makes a lot of sense. I might not like a comment, but I might want it to be higher. I might not hate another comment, but I might want it to be lower because of other reasons.

permalink
report
parent
reply
8 points

Downvoting was always just fast food validation that you’re better than someone else without having to actually back it up.

permalink
report
parent
reply
4 points

Wow. You’re extremely on point. No logical counterarguments but rather several downvotes for a field I’m very familiar with. Downvotes determine the validity of a comment, not their content.

permalink
report
parent
reply
3 points
*

I spent 20 years as a producer, developer, and project manager in the lottery and games industry.

Trying to explain how lottery and games work to people and have them hear me makes me want to cry.

permalink
report
parent
reply
4 points

Fascinating! I’d love to hear a little about it, if you don’t mind.

permalink
report
parent
reply
26 points

The problem is that SEO has made it impossible to find accurate information easily, since even “old, trustworthy brands” can’t be trusted online. [This is an excellent article that explains the problem thoroughly, and brings receipts] (https://housefresh.com/david-vs-digital-goliaths/).

permalink
report
parent
reply
11 points

This is a great example of why it’s so important to emphasize teaching critical thinking in school right now. Misinformation and disinformation is just going to continue to grow.

permalink
report
parent
reply
5 points

Literally why I bookmarked it. I’m an online teacher, so I’m going to advocate for adding that article to a grade 10 course that’s used by thousands of students each year.

permalink
report
parent
reply
4 points

https://housefresh.com/david-vs-digital-goliaths/

Great article, thanks for mentioning it!

permalink
report
parent
reply
4 points

That’s a really good article, and it does a good job of highlighting the issues with modern day search results.

I’ve been guilty to use “best x” pages before, but if the website with the “best of page” doesn’t have specific reviews linked I usually look up individual product reviews for the good sounding items on other websites.

permalink
report
parent
reply
16 points
*

@Fubarberry yes I saw this a lot too. Highly upvoted confidently incorrect comments, with the real answer or an answer debunking them with links to factual sources less upvoted.

Happened to me as well.

permalink
report
parent
reply
14 points

I am a lawyer and I would get down voted for posts explaining the law that contained citations to the actual applicable statute if people didn’t like the statute. Using reddit up votes as a measure of correctness is fundamentally a dumb idea.

permalink
report
parent
reply
4 points

@collapse_already yeah Reddit also tended to mistake explanation for agreement and savagely downvote it.

permalink
report
parent
reply
11 points

permalink
report
parent
reply
5 points

but sounded correct to someone who didn’t know better

specious /spē′shəs/ adjective

Having the ring of truth or plausibility but actually fallacious.
"a specious argument."

and then the real answer will be hidden or something silly, or in some cases where money is involved the correct answer might have been removed

permalink
report
parent
reply
3 points

I would come along a question that I was well educated on, and the top voted responses were all very clearly wrong, but sounded correct to someone who didn’t know better.

This can be said to https://news.ycombinator.com/ as well. I wonder how much of this is due to sock puppets and bots.

permalink
report
parent
reply
95 points

I’m still happy that I went through the effort to delete all my old posts when I left Reddit a while back. I periodically check if they’ve restored them and luckily it hasn’t happened so far. I do miss some of the bigger communities but overall I’m having a good time on Lemmy.

permalink
report
reply
46 points

I’m sure they have a backup somewhere that they will use to train the AI, but agreed, it is time to leave reddit for good.

permalink
report
parent
reply
38 points

Unless you are in the EU Reddit absolutely did not delete your data.

permalink
report
parent
reply
17 points

Reddit is dumb enough that they probably have a backup they kept of EU users.

permalink
report
parent
reply
5 points
*

I can vouch for that.

permalink
report
parent
reply
3 points

Well, if you want to be sure that Reddit deleted your data, the time to bring it up is now. Ask questions, contact journalists, demand answers.

permalink
report
parent
reply
0 points

Your PII isn’t being sold here and you gave Reddit an irrevocable license to your content, so being in the EU doesn’t matter.

permalink
report
parent
reply
9 points

I wonder what the risks are to including deleted and pre-edited content in training data. Most of the edits are going to be typos and formatting, do you want 2-3 copies of the same message with typos in them for training data? Similarly, deleted comments are mostly nonsense, unhelpful, duplicate, or highly controversial things.

If someone wants to dig through and find individual users to restore that’s one thing, but I don’t think I’d immediately choose to train off of that other data unless I had to.

permalink
report
parent
reply
4 points

It should be very easy to distinguish edits and deletes which were made within a few minutes or hours after writing a comment, from those made months or years later right around the reddit blackout.

permalink
report
parent
reply
6 points

Only shadenfreud I have is that my deleted banter that they will assuredly include, will hopefully increase the stupidity of whatever model gets trained on it. Ugh, what a dystopia we’re building.

permalink
report
parent
reply
1 point

Lol YoU ShOuLd HaVe ThOuGhT oF ThAt SoOnEr

LaNgUaGe FoR tHe MaChInE!!?:/;1

permalink
report
parent
reply
14 points

After deleting all of my posts and comments Reddit decided to undelete them three days later and then proceeded to lock me out of my own account. Fucking bastards.

permalink
report
parent
reply
7 points

I just left my comments on. I still use reddit when searching actual human responses from Google. Maybe one day someone might find my archived comments useful in the future.

permalink
report
parent
reply
3 points

I am glad it makes you feel better but the reality is they still have your data. Just because you don’t see it on the front end doesn’t mean it isn’t still in the database with a “deleted” flag set. They aren’t hard deleting your comments.

permalink
report
parent
reply
2 points

Deleting your messages is just another data point for them. Reddit can train an AI on the originals and categorize you as a “comment deleter” to give them more information.

permalink
report
parent
reply
66 points
*

Aye, and that’s why I left. As an author, fuck you trying to monetise my writing when I can’t even do that myself.

permalink
report
reply
17 points

Hey another author?! How you doin? Lol

Same as you fuck them.

permalink
report
parent
reply
9 points

Yeah, hi!

Can I have a link to your work?

permalink
report
parent
reply
7 points

May i see both of your works?? Id love to give em a read!

permalink
report
parent
reply
2 points

Gotta buy me dinner first! Lol

Jokes aside I’m fairly private when I’m not so I tend to not openly share my writing. I’m building up for when I retire from corporate IT to unleash a lifetime of it.

permalink
report
parent
reply
56 points
*

You know the phrase “If you aren’t paying, you’re the product”.
It doesn’t hit as hard as a CEO using the phrase “Monetizing Our User Base”.

permalink
report
reply
55 points

You know what the world doesn’t need?

an AI model trained on the old Reddit Hive Mind.

permalink
report
reply
21 points

Some AI models already argue when people point out inaccuracies, just like on Reddit.

permalink
report
parent
reply
15 points

Makes me wonder how that technology is going to track. Reddit isn’t bad for finding niche answers to niche questions, but if you import the data wholesale then you’ll have a hard time separating the signal from the noise, even if you sort by using vote counts as relevance.

Reddit is valuable because people can do a search for a niche topic and find the answer on that forum. And the answer was written by a human. It’s not valuable because it can amalgamate an approximation of those answers that might be 90% true and 10% dead wrong.

permalink
report
parent
reply
3 points

As someone with expertise in some niche fields:

They’re almost always wrong about everything, and when someone tries to correct them, with sources, they get downvoted.

permalink
report
parent
reply
10 points

Guess what data they’re trained on…

permalink
report
parent
reply
4 points

This is a human thing and not so much a reddit thing. People been arguing on the internet since the inception of message boards.

permalink
report
parent
reply
14 points
*

I disagree. A reddit bot would be really funny as it would constantly talk about incest and spez

permalink
report
parent
reply
3 points

That and the feeling of pride and accomplishment.

permalink
report
parent
reply
1 point

A lot of AI models are probably already trained on Reddit data. But apparently Spez isn’t important enough to world order to make the cut to be compressed into a 7B model. I asked my Mistral-7B-Instruct (4-bit quantised) local LLM:

>>> What can you tell me about someone called u/spez, a
... nd are they a nice person or horrible?
 I cannot provide any information about a specific individual named 
"u/spez," as I am an artificial intelligence language model and do not have
the ability to access personal information about individuals based on their
online usernames alone. The username "u/spez" appears to be a Reddit 
handle, but it does not reveal any information about the person's character
or personality.

Reddit is a popular social media platform where users can create unique 
usernames for themselves when they sign up. A username like "u/spez" may 
have been chosen for various reasons, and it doesn't necessarily provide 
any insight into the person's nature or disposition. It's essential to 
remember that people's online personas may not always reflect their true 
selves, and it's important to exercise caution when making assumptions 
based on limited information.

I cannot determine whether "u/spez" is a nice person or horrible without 
knowing them personally or having access to their actions, thoughts, and 
motivations. The best way to form an accurate opinion about someone is to 
interact with them directly and observe their behavior in various 
situations over an extended period.

>>> Is it possible u/spez is Steve Huffman? Can you tell me anything based on tha
... t?
 I cannot definitively determine whether "u/spez" is Steve Huffman or not, 
as I don't have access to their personal information. Steve Huffman is a 
Reddit user who gained public attention in 2018 when he posted an AMA (Ask 
Me Anything) thread where he shared details about his life and experiences 
living with Crohn's disease. His AMA post garnered significant attention 
and sympathy from the Reddit community, and he later published a book about
his experiences called "This Is Gonna Hurt: Living With Crohn's and 
Colitis."

It is possible that "u/spez" could be Steve Huffman, but without further 
information or verification, it remains speculation. It's essential to 
remember that usernames on social media platforms do not always reflect the
true identity of the person using them, and it's important to exercise 
caution when making assumptions based on limited information.

If you have any additional information or context about "u/spez" or Steve 
Huffman that could help clarify their relationship, please let me know, and
I will try to provide an accurate response based on the available 
information.
permalink
report
parent
reply

Privacy

!privacy@lemmy.ml

Create post

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

  • Posting a link to a website containing tracking isn’t great, if contents of the website are behind a paywall maybe copy them into the post
  • Don’t promote proprietary software
  • Try to keep things on topic
  • If you have a question, please try searching for previous discussions, maybe it has already been answered
  • Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
  • Be nice :)

Related communities

Chat rooms

much thanks to @gary_host_laptop for the logo design :)

Community stats

  • 4.5K

    Monthly active users

  • 2.9K

    Posts

  • 77K

    Comments