OpenAI being Sued for "Stealing" Peoples Content Online

[ - ]

16 points

2 years ago

It’s wild to see people in the piracy community of all places have an issue with someone benefiting from data they got online for free.

permalink

report

reply

[ - ]

arinot@lemmy.world

12 points

2 years ago

It really isn’t that bonkers. A lot software thought is about licensing. See GPL and Creative Commons and all that stuff thats all about how things can be profited from/responsibilities around it. Benefiting from free data is one thing. Privately profiting at the expense or not sharing the capability/advances that came from it is another. Willing to bet there’s GPL violations via the training sets.

Is it even possible to attach licenses to text posts on social media?

permalink

report

parent

reply

[ - ]

Altair@vlemmy.net

28 points

2 years ago

*

Key difference is that they’re making (alot of) money of off the stolen work, and in a way that’s only possible for the already filthy rich

Wouldn’t mind it personally if it was foss though, like their name suggests

permalink

report

parent

reply

[ - ]

whoisearth@lemmy.ca

19 points

2 years ago

FWIW even if it was FOSS I’d still care. For me it’s more about intent. If your business model/livelihood relies on stealing from people there’s a problem. That’s as true on a business level as it is an individual one.

Doesn’t mean I have an answer as sometimes it’s extremely complex. The easy analogy is how we pirate TV shows and movies. Netflix originally proved this could be mitigated by providing the material cheaply and easily. People don’t want to steal (on average).

permalink

report

parent

reply

[ - ]

Botree@lemmy.world

7 points

2 years ago

I find people in general are much more willing to part with their money than the big corps think. I’ll even go to the extent to say that we enjoy doing so. Just look at Twitch – tonnes of money are thrown at streamers because it’s fun and convenient, or at TikTok vendors selling useless stuff on live streaming. We just don’t like to be lied to and treated like cash cows.

report

reply

[ - ]

14 points

2 years ago

They’re using people’s content without authorization, but for a open information ideology or something like that, they are closed source and they are using it to make money. I don’t think that should be illegal, but it is certainly a dick move

permalink

report

parent

reply

[ - ]

Briongloid@aussie.zone

21 points

2 years ago

Many of us are sharing without reward and have strong ethical beliefs regarding for-profit distribution of material versus non-profit sharing.

permalink

report

parent

reply

[ - ]

DankMemeMachine@lemmy.world

23 points

2 years ago

The difference is that they are profitting from other people’s work and properties, I don’t profit from watching a movie or playing a game for free, I just save some money.

permalink

report

parent

reply

[ - ]

Holodeck_Moriarty@lemm.ee

3 points

2 years ago

You do if you make games or movies and those things give you inspiration.

This is just how learning is done though, whether it’s AI or human.

permalink

report

parent

reply

[ - ]

DankMemeMachine@lemmy.world

2 points

2 years ago

Absolutely not comparable. Inspiration and an amalgation of everything a LLM consumes are completely different things.

permalink

report

parent

reply

Show more comments

[ - ]

Geograph6@lemmy.dbzer0.com

20 points

2 years ago

People talk about OpenAI as if its some utopian saviour that’s going to revolutionise society. When in reality its a large corporation flooding the internet with terrible low-quality content using machine learning models that have existed for years. And the fields it is “automating” are creative ones that specifically require a human touch, like art and writing. Language learning models and image generation isn’t going to improve anything. They’re not “AI” and they never will be. Hopefully when AI does exist and does start automating everything we’ll have a better economic system though :D

permalink

report

reply

[ - ]

fiasco@possumpat.io

-13 points

2 years ago

The thing that amazes me the most about AI Discourse is, we all learned in Theory of Computation that general AI is impossible. My best guess is that people with a CS degree who believe in AI slept through all their classes.

permalink

report

parent

reply

[ - ]

IllNess@infosec.pub

3 points

2 years ago

It’s all buzzword exaggerations. It’s marketing.

Remember when hoverboards were for things that actually hover instead of some motorized bullshit on two wheels? Yeah, same bullshit.

permalink

report

parent

reply

[ - ]

leonardo_arachoo@lemm.ee

9 points

2 years ago

*

we all learned in Theory of Computation that general AI is impossible.

I strongly suspect it is you who has misunderstood your CS courses. Can you provide some concrete evidence for why general AI is impossible?

permalink

report

parent

reply

[ - ]

fiasco@possumpat.io

-8 points

2 years ago

Evidence, not really, but that’s kind of meaningless here since we’re talking theory of computation. It’s a direct consequence of the undecidability of the halting problem. Mathematical analysis of loops cannot be done because loops, in general, don’t take on any particular value; if they did, then the halting problem would be decidable. Given that writing a computer program requires an exact specification, which cannot be provided for the general analysis of computer programs, general AI trips and falls at the very first hurdle: being able to write other computer programs. Which should be a simple task, compared to the other things people expect of it.

Yes there’s more complexity here, what about compiler optimization or Rust’s borrow checker? which I don’t care to get into at the moment; suffice it to say, those only operate on certain special conditions. To posit general AI, you need to think bigger than basic block instruction reordering.

This stuff should all be obvious, but here we are.

permalink

report

parent

reply

Show more comments

[ - ]

qfe0@lemmy.dbzer0.com

9 points

2 years ago

The existence of natural intelligence is the proof that artificial intelligence is possible.

permalink

report

parent

reply

[ - ]

argv_minus_one@beehaw.org

7 points

2 years ago

We can simulate all manner of physics using a computer, but we can’t simulate a brain using a computer? I’m having a real hard time believing that. Brains aren’t magic.

permalink

report

parent

reply

[ - ]

fiasco@possumpat.io

-3 points

2 years ago

Computer numerical simulation is a different kind of shell game from AI. The only reason it’s done is because most differential equations aren’t solvable in the ordinary sense, so instead they’re discretized and approximated. Zeno’s paradox for the modern world. Since the discretization doesn’t work out, they’re then hacked to make the results look right. This is also why they always want more flops, because they believe that, if you just discretize finely enough, you’ll eventually reach infinity (or infinitesimal).

This also should not fill you with hope for general AI.

permalink

report

parent

reply

[ - ]

Treemaster099@pawb.social

18 points

2 years ago

*

Good. Technology always makes strides before the law can catch up. The issue with this is that multi million dollar companies use these gaps in the law to get away with legally gray and morally black actions all in the name of profits.

Edit: This video is the best way to educate yourself on why ai art and writing is bad when it steals from people like most ai programs currently do. I know it’s long, but it’s broken up into chapters if you can’t watch the whole thing.

permalink

report

reply

[ - ]

PlebsicleMcGee@feddit.uk

13 points

2 years ago

Totally agree. I don’t care that my data was used for training, but I do care that it’s used for profit in a way that only a company with big budget lawyers can manage

permalink

report

parent

reply

[ - ]

CoderKat@lemm.ee

3 points

2 years ago

*

But if we’re drawing the line at “did it for profit”, how much technological advancement will happen? I suspect most advancement is profit driven. Obviously people should be paid for any work they actually put in, but we’re talking about content on the internet that you willingly create for fun and the fact it’s used by someone else for profit is a side thing.

And quite frankly, there’s no way to pay you for this. No company is gonna pay you to use your social media comments to train their AI and even if they did, your share would likely be pennies at best. The only people who would get paid would be companies like reddit and Twitter, which would just write into their terms of service that they’re allowed to do that (and I mean, they already use your data for targeting ads and it’s of course visible to anyone on the internet).

So it’s really a choice between helping train AI (which could be viewed as a net benefit for society, depending on how you view those AIs) vs simply not helping train them.

Also, if we’re requiring payment, only the super big AI companies can afford to frankly pay anything at all. Training an AI is already so expensive that it’s hard enough for small players to enter this business without having to pay for training data too (and at insane prices, if Twitter and Reddit are any indication).

permalink

report

parent

reply

[ - ]

Programmer Belch@lemmy.dbzer0.com

8 points

2 years ago

Hundreds of projects in github are supported by donations, innovation happens even without profit incentives. It may slow down the pace of AI development but I am willing to wait anothrt decade for AIs if it protects user data and let’s regulation catch up.

permalink

report

parent

reply

[ - ]

Johem@lemmy.world

2 points

2 years ago

Reddit is currently trying to monetize their user comments and other content by charging for API access. Which creates a system where only the corporations profit and the users generating the content are not only unpaid, but expected to pay directly or are monetized by ads. And if the users want to use the technogy trained by their content they also have to pay for it.

Sure seems like a great deal for corporations and users getting fleeced as much as possible.

permalink

report

parent

reply

[ - ]

archomrade [he/him]@midwest.social

11 points

2 years ago

I’m honestly at a loss for why people are so up at arms about OAI using this practice and not Google or Facebook or Microsoft, ect. It really seems we’re applying a double standard just because people are a bit pissed at OpenAI for a variety of reasons, or maybe just vaguely mad at the monetary scale of “tech giants”

My 2 cents: I don’t think content posted on the open internet (especially content produced by users on a free platform being claimed not by those individuals but by the platforms themselves) should be litigated over, when that information isnt even being reproduced but being used on derivative works. I think it’s conceptually similar to an individual reading a library of books to become a writer and charge for the content they produce.

I would think a piracy community would be against platforms claiming ownership over user generated content at all.

permalink

report

parent

reply

[ - ]

Treemaster099@pawb.social

1 point

2 years ago

https://youtu.be/9xJCzKdPyCo

This video can answer just about any question you ask. It’s long, but it’s split up into chapters so you can see what questions he’s answering in that chapter. I do recommend you watch the whole thing if you can. There’s a lot of information that I found very insightful and thought provoking

permalink

report

parent

reply

[ - ]

archomrade [he/him]@midwest.social

1 point

2 years ago

*

Couple things:

While I appreciate this gentleman’s copywrite experience, I do have a couple comments:

his analysis seems primarily focused from a law perspective. While I don’t doubt there is legal precedent for protection under copywrite law, my personal opinion is that copywrite is a capitalist conception that is dependent on an economic reality I fundamentally disagree with. Copywrite is meant to protect the livelihoods of artists, but I don’t think anyone’s livelihood should be dependent on having to sell labor. More often, copywrite is used to protect the financial interests of large businesses, not individual artists. The current litigation is between large media companies and OAI, and any settlement isn’t likely to remunerate much more than a couple dollars to individual artists, and we can’t turn back the clock to before AI could displace the jobs of artists, either.
I’m not a lawyer, but his legal argument is a little iffy to me… Unless I misunderstood something, he’s resting his case on a distinction between human inspiration (i.e. creative inspiration on derivative works) and how AI functions practically (i.e. AI has no subjective “experience” so it cannot bring its own “hand” to a derivative work). I don’t see this as a concrete argument, but even if I did, it is still no different than individual artists creating derivative works and crossing the line into copywrite infringement. I don’t see how this argument can be blanket applied to the use of AI, rather than individual cases of someone using AI on a project that draws too much from a derivative work.

The line is even less clear when discussing LLMs as opposed to T2I or I2I models, which I believe is what is being discussed in the lawsuit against OAI. Unlike images from DeviantArt and Instagram, text datasets from sources like reddit, Wikipedia, and Twitter aren’t protected under copywrite like visual media. The legal argument against the use of training data drawn from public sources is even less clear, and is even more removed to protecting the individual users and is instead a question of protecting social media sites with questionable legal claim to begin with. This is the point id expect this particular community would take issue with: I don’t think reddit or Twitter should be able to claim ownership over their user’s content, nor do I think anyone should be able to revoke consent over fair use just because it threatens our status quo capitalist system.

AI isn’t going away anytime soon, and litigating over the ownership of the training data is only going to serve to solidify the dominant hold over our economy by a handful of large tech giants. I would rather see large AI models be nationalized, or otherwise be protected from monopolization.

permalink

report

parent

reply

Show more comments

[ - ]

redditsucks@lemmy.world

-1 points

2 years ago

Hope it goes through and sets a president.

permalink

report