OpenAI says it’s “impossible” to create useful AI models without copyrighted material

Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?

“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

🤖 I’m a bot that provides automatic summaries for articles:

Click here to see the summary

Further, OpenAI writes that limiting training data to public domain books and drawings “created more than a century ago” would not provide AI systems that “meet the needs of today’s citizens.”

OpenAI responded to the lawsuit on its website on Monday, claiming that the suit lacks merit and affirming its support for journalism and partnerships with news organizations.

OpenAI’s defense largely rests on the legal principle of fair use, which permits limited use of copyrighted content without the owner’s permission under specific circumstances.

“Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents,” OpenAI wrote in its Monday blog post.

In August, we reported on a similar situation in which OpenAI defended its use of publicly available materials as fair use in response to a copyright lawsuit involving comedian Sarah Silverman.

OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

Saved 58% of original text.

I will repeat what I have proffered before:

If OpenAI stated that it is impossible to train leading AI models without using copyrighted material, then, unpopular as it may be, the preemptive pragmatic solution should be pretty obvious, enter into commercial arrangements for access to said copyrighted material.

Claiming a failure to do so in circumstances where the subsequent commercial product directly competes in a market seems disingenuous at best, given what I assume is the purpose of copyrighted material, that being to set the terms under which public facing material can be used. Particularly if regurgitation of copyrighted material seems to exist in products inadequately developed to prevent such a simple and foreseeable situation.

Yes I am aware of the USA concept of fair use, but the test of that should be manifestly reciprocal, for example would Meta allow what it did to MySpace, hack and allow easy user transfer, or Google with scraping Youtube.

To me it seems Big Tech wants its cake and to eat it, where investor $$$ are used to corrupt open markets and undermine both fundamental democratic State social institutions, manipulate legal processes, and undermine basic consumer rights.

Agreed.

There is nothing “fair” about the way Open AI steals other people’s work. ChatGPT is being monetized all over the world and the large number of people whose work has not been compensated will never see a cent of that money.

At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

Tech bros are disgusting.

Tech bros are disgusting.

That’s not even getting into the fraternity behavior at work, hyper-reactionary politics and, er, concerning age preferences.

Yup. I said it in another discussion before but think its relevant here.

Tech bros are more dangerous than Russian oligarchs. Oligarchs understand the people hate them so they mostly stay low and enjoy their money.

Tech bros think they are the savior of the world while destroying millions of people’s livelihood, as well as destroying democracy with their right wing libertarian politics.

At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

This right here is the core of the moral issue when it comes down to it, as far as I’m concerned. These text and image models are already killing jobs and applying downward pressure on salaries. I’ve seen it happen multiple times now, not just anecdotally from some rando on an internet comment section.

These people losing jobs and getting pay cuts are who created the content these models are siphoning up. People are not going to like how this pans out.

Any company replacing humans with AI is going to regret it. AI just isn’t that good and probably won’t ever be, at least in it’s current form. It’s all an illusion and is destined to go the way of Bitcoin, which is to say it will shoot up meteorically and seem like the answer to all kinds of problems, and then the reality will sink in and it will slowly fade to obscurity and irrelevance. That doesn’t help anyone affected today, of course.

The flip side of this is that many artists who simply copy very popular art styles are now functionally irrelevant, as it is now just literally proven that this kind of basically plagiarism AI is entirely capable of reproducing established styles to a high degree of basically fidelity.

While many aspects of this whole situation are very bad for very many reasons, I am actually glad that many artists will be pressured to actually be more creative than an algorithm, though I admit this comes from basically a personally petty standpoint of having known many, many, many mediocre artists who themselves and their fans treat like gods because they can emulate some other established style.

I suspect the US government will allow OpenAI to continue doing as it please to keep their competitive advantage in AI over China (which don’t have problem with using copyrighted materials to train their models). They already limit selling AI-related hardware to keep their competitive advantage, so why stop there? Might as well allow OpenAI to continue using copyrighted materials to keep the competitive advantage.

Do musicians not buy the music that they want to listen to? Should they be allowed to torrent any MP3 they want just because they say it’s for their instrument learning?

I mean I’d be all for it, but that’s not what these very same corporations (including Microsoft when it comes to software) wanted back during Napster times. Now they want a separate set of rules just for themselves. No! They get to follow the same laws they force down our throats.

So why is so much information (data) freely available on the internet? How do you expect a human artist to learn drawing, if not looking at tutorials and improving their skills through emulating what they see?

Yep, completely agree.

Case in point: Steam has recently clarified their policies of using such Ai generated material that draws on essentially billions of both copyrighted and non copyrighted text and images.

To publish a game on Steam that uses AI gen content, you now have to verify that you as a developer are legally authorized to use all training material for the AI model for commercial purposes.

This also applies to code and code snippets generated by AI tools that function similarly, such as CoPilot.

So yeah, sorry, either gotta use MIT liscensed open source code or write your own, and you gotta do your own art.

I imagine this would also prevent you from using AI generated voice lines where you trained the model on basically anyone who did not explicitly consent to this as well, but voice gen software that doesnt use the ‘train the model on human speakers’ approach would probably be fine assuming you have the relevant legal rights to use such software commercially.

Not 100% sure this is Steam’s policy on voice gen stuff, they focused mainly on art dialogue and code in their latest policy update, but the logic seems to work out to this conclusion.

Some relevant comments from Ars:

leighno5

The absolute hubris required for OpenAI here to come right out and say, ‘Yeah, we have no choice but to build our product off the exploitation of the work others have already performed’ is stunning. It’s about as perfect a representation of the tech bro mindset that there can ever be. They didn’t even try to approach content creators in order to do this, they just took what they needed because they wanted to. I really don’t think it’s hyperbolic to compare this to modern day colonization, or worker exploitation. ‘You’ve been working pretty hard for a very long time to create and host content, pay for the development of that content, and build your business off of that, but we need it to make money for this thing we’re building, so we’re just going to fucking take it and do what we need to do.’

The entitlement is just…it’s incredible.

4qu4rius

20 years ago, high school kids were sued for millions & years in jail for downloading a single Metalica album (if I remember correctly minimum damage in the US was something like 500k$ per song).

All of a sudden, just because they are the dominant ones doing the infringment, they should be allowed to scrap the entire (digital) human knowledge ? Funny (or not) how the law always benefits the rich.

I would just like to say, with open curiosity, that I think a nice solution would be for OpenAI to become a nonprofit with clear guidelines to follow.

What does that make me? Other than an idiot.

Of that at least, I’m self aware.

I feel like we’re disregarding the significance of artificial intelligence’s existence in our future, because the only thing anybody that cares is trying to do is get back control to DO something about it. But news is becoming our feeding tube for the masses. They’ve masked that with the hate of all of us.

Anyways, sorry, diatribe, happy new year

I think OpenAI (or some part of it) is a non-profit. But corporate fuckery means it can largely be funded by for profit companies which then turn around and profit from that relationship. Corporate law is so weak and laxly enforced that’s it’s a bit of a joke unfortunately.

I agree that AI has an important role to play in the future, but it’s a lot more limited in the current form than a lot of people want to believe. I’m writing a tool that leverages AI as a sort of auto-DM for roleplaying, but AI hasn’t written a line of code in it because the output is garbage. And frankly I find the fun and value of the tool comes from the other humans you play with, not the AI itself. The output just isn’t that good.

I would like to say that you inspire me on your writing of such a tool. I try to write code, and all I can seem to believe in with what I know, is in a website where with words I can write, in a free flow.

I write with a sight, and in that scene I fight, but in the freedom of inaction, I can’t help but feel flight. What signt is there to see, when your blood flows in guts of night?

It is supposedly a non-profit, and that is how the board of Open AI tried to fire Altman but than the big tech (Microsoft) intervened and wrestled the control.

Its basically Microsoft now.

I would like to apologize for the following opinions, because they come from a place of unresolved hypocrisy that is me.

Non-profit my ass. No such thing in America or anywhere else in the world, if you have the perspective to hunt and the money to signify modern value.

Survival of the fittest, and the newborn technology that is at its core a mirror of us, to the most complex level of modern mathematics (I’m of the firm belief that logic is discovered, not created).

With those seemingly unrelated concepts made with vague words, I ask you this:

What does it mean to feel? To know many different kinds of “one,” to live without fear but still be whole? I am sorry, again, I’m naught but gibberish and I’m just so glad you responded. I forgot and came back to find a word I sent, and now I find what I seek, an event in which I can say we’ve been bonded.

But now try to, now that I splay out, all I’ve got and am about, all I can see, is that to you my head, seems to be on my knees.

Again, sorry! Thank you for responding! I’m just glad to vent, and in expression have my soul rend into two, and sent into a new view.

But what I meant to say is that non profit or not by legal definition, money allows for, in the same kind of legal, an easy and simple transition.

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

OpenAI says it’s “impossible” to create useful AI models without copyrighted material(arstechnica.com)

Technology

!technology@beehaw.org

Community stats

Community moderators