The lawsuit alleges OpenAI crawled the web to amass huge amounts of data without people’s permission.

You are viewing a single thread.
View all comments
39 points

Scraping social media posts and reddit posts doesn’t sound like stealing, they’re public posts.

permalink
report
reply
4 points

Here is not just scraping though, it is also using that data to create other content and to potentially also re-publish that data (we have no way of knowing whether chatGPT will spit out any of that nor where did it take what is spitting out).

The expectation that social media data will be read by anybody is fair, but the fact is that the data has been written to be read, not to be resold and published elsewhere too.

It is similar for blog articles. My blog is public and anybody can read it, but that data is not there to be repackaged and sold. The fact that something is public does not mean I can do whatever I want with it.

permalink
report
parent
reply
2 points

I could read your blog post and write my own blog post, using yours as inspiration. I could quote your post, add a link back to your blog post and even add affiliate links to my blog post.I could be hired to do something like that for the whole day

permalink
report
parent
reply
3 points

ChatGPT doesn’t get inspired, the process is different and it could very well spit verbatim the content. You can do all the rest (depending on the license) without issues, but once again this is not what chatGPT does, as it doesn’t provide attribution.

It’s exactly the same with software, in fact.

permalink
report
parent
reply
16 points

I doubt it’s only about some Reddit posts. The scrapping was done on the whole web, capturing everything it could. So besides stealing data and presenting it as its own, it seems to have collected some even more problematic data which wasn’t properly protected.

permalink
report
parent
reply
11 points

But that really isn’t OpenAI’s fault. Whoever was in charge of securing the patients data really fucked up.

permalink
report
parent
reply
13 points

That’s like saying you didn’t lock your front door so whoever robs you is innocent.

permalink
report
parent
reply
18 points

Leaving your front door open isn’t prudent but doesn’t grant permission to others to enter and take/copy your belongings or data.

The security teams may have royally screwed up, but OpenAI has a legal obligation to respect copyright and laws regarding data ownership.

Likewise, they could have scraped pages that included terms of use, copyright, disclaimers, etc., and failed to honor them.

All parties can be in the wrong for different reasons.

permalink
report
parent
reply
6 points

It’s certainly their fault that they used it, though.

If they cared, they could have ensured they weren’t using sensitive or otherwise highly problematic information, but they chose not to. That’s on them.

permalink
report
parent
reply
1 point

They certainly fucked up, but it might well be OpenAI’s post too.

permalink
report
parent
reply
-1 points

if it was unsecured it’s basically public. whomever put that data on a publicly accessible server is at fault

permalink
report
parent
reply
6 points
*

That’s not necessarily true. Even if a company makes the mistake of not securing data correctly, those that make use of this data can still be at fault.

If a company leaves a server wide open, you still can’t legally steal information from it.

permalink
report
parent
reply
10 points
*
Deleted by creator
permalink
report
parent
reply
4 points

Just because something is posted online doesn’t mean it can be taken a resold. Copyright law prevents that. Of course, copyright law and generative AI is new and gray area.

permalink
report
parent
reply
2 points
*
Deleted by creator
permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 17K

    Monthly active users

  • 12K

    Posts

  • 543K

    Comments