cross-posted from: https://lemmy.intai.tech/post/43759
cross-posted from: https://lemmy.world/post/949452
OpenAI’s ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models
I can’t speak for others, but I don’t consider posts I made on a website I don’t own to be my property. If anything, it’s amusing to think of my idiotic rants making up a tiny fraction of an AIs “knowledge”.
I don’t see how this is any different than humans copying or being inspired by something. While I hate seeing companies profiting off of the commons while giving nothing of value back, how do you prove that an AI model is using your work in any meaningful or substantial way? What would make me really mad is if this dumb shit leads to even harsher copyright laws. We need less copyright not more.
Curious to see if this goes anywhere.
inal but i think it’s going to come down to the terms of service where the data was scraped from. If the terms say the stuff you post can be shared with third parties then they might not have a leg to stand on. Where it gets sketchy is if someone posted someone else’s work, then the original author had no say in it being shared with a third party, BUT, is that the fault of the third party or the service provider that shared it?
Also, if i were exposed to copyright material through some unauthorised person distributing it can i not summarize the information? I guess i don’t know enough about fair use to answer that.
The wording in the article says they are being sued for stealing their data, this seems like a stretch but i guess i’ll wait for more details of the case.
I agree with the terms of service bit, but the hard part is going through the tos for so many different sites. Sort like how some open source code bases can’t re-license a code base because it is impossible to get into contact with all the people who have contributed to the project over the years. Online platforms already have certain protections from their users posting illegal content to their sites. We will have to see if that is extended to these large language models. When it comes to free use, there is no such thing. Free use must be proven in court. Each and every time. There are no guidelines on what is and isn’t free use when it comes to word of law, so that can swing either way. Just my two cents on the matter. Also, (inal).
If this lawsuit is ruled in favor of the plaintiff, it might lead to lawsuits against those who have collected and used private data more maliciously, from advertisement-targeting services to ALPR services that reveal to law enforcement your driving habits.
It’s wild to see people in the piracy community of all places have an issue with someone benefiting from data they got online for free.
Key difference is that they’re making (alot of) money of off the stolen work, and in a way that’s only possible for the already filthy rich
Wouldn’t mind it personally if it was foss though, like their name suggests
FWIW even if it was FOSS I’d still care. For me it’s more about intent. If your business model/livelihood relies on stealing from people there’s a problem. That’s as true on a business level as it is an individual one.
Doesn’t mean I have an answer as sometimes it’s extremely complex. The easy analogy is how we pirate TV shows and movies. Netflix originally proved this could be mitigated by providing the material cheaply and easily. People don’t want to steal (on average).
I find people in general are much more willing to part with their money than the big corps think. I’ll even go to the extent to say that we enjoy doing so. Just look at Twitch – tonnes of money are thrown at streamers because it’s fun and convenient, or at TikTok vendors selling useless stuff on live streaming. We just don’t like to be lied to and treated like cash cows.
The difference is that they are profitting from other people’s work and properties, I don’t profit from watching a movie or playing a game for free, I just save some money.
You do if you make games or movies and those things give you inspiration.
This is just how learning is done though, whether it’s AI or human.
Absolutely not comparable. Inspiration and an amalgation of everything a LLM consumes are completely different things.
It really isn’t that bonkers. A lot software thought is about licensing. See GPL and Creative Commons and all that stuff thats all about how things can be profited from/responsibilities around it. Benefiting from free data is one thing. Privately profiting at the expense or not sharing the capability/advances that came from it is another. Willing to bet there’s GPL violations via the training sets.
Is it even possible to attach licenses to text posts on social media?