Apparently, stealing other peopleâs work to create product for money is now âfair useâ as according to OpenAI because they are âinnovatingâ (stealing). Yeah. Move fast and break things, huh?
âBecause copyright today covers virtually every sort of human expressionâincluding blogposts, photographs, forum posts, scraps of software code, and government documentsâit would be impossible to train todayâs leading AI models without using copyrighted materials,â wrote OpenAI in the House of Lords submission.
OpenAI claimed that the authors in that lawsuit âmisconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.â
I wish I could upvote this more than once.
What people always seem to miss is that a human doesnât need billions of examples to be able to produce something thatâs kind of âeh, close enoughâ. Artists donât look at billions of paintings. They look at a few, but do so deeply, absorbing not just the most likely distribution of brushstrokes, but why the painting looks the way it does. For a basis of comparison, I did an art and design course last year and looked at about 300 artworks in total (course requirement was 50-100). The research component on my design-related degree course is one page a week per module (so basically one example from the field the module is about, plus some analysis). The real bulk of the work humans do isnât looking at billions of examples: itâs looking at a few, and then practicing the skill and developing a process that allows them to convey the thing theyâre trying to express.
If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.
Exactly! You can glean so much from a single work, not just about the work itself but who created it and what ideas were they trying to express and what does that tell us about the world they live in and how they see that world.
This doesnât even touch the fact that Iâm learning to draw not by looking at other drawings but what exactly Iâm trying to draw. I know at a base level, a drawing is a series of shapes made by hand whether itâs through a digital medium or traditional pen/pencil and paper. But the skill isnât being able replicate other drawings, itâs being able to convert something I can see into a drawing. If Iâm drawing someone sitting in a wheelchair, then Iâll get the pose of them sitting in the wheelchair but I can add details I want to emphasise or remove details I donât want. Thereâs so much that goes into creative work and Iâm tired of arguing with people who have no idea what it takes to produce creative works.
It seems that most of the people who think what humans and AIs do is the same thing are not actually creatives themselves. Their level of understanding of what it takes to draw goes no further than âwell anyone can draw, children do it all the timeâ. They have the same respect for writing, of course, equating the ability to string words together to write an email, with the process it takes to write a brilliant novel or script. They donât get it, and to an extent, thatâs fine - not everybody needs to understand everything. But they should at least have the decency to listen to the people that do get it.
Well, thatâs not me. Iâm a creative, and I see deep parallels between how LLMs work and how my own mind works.
When people say that the âmodel is learning from its training dataâ, it means just that, not that it is human, and not that it learns exactly humans. It doesnât make sense to judge boats on how well they simulate human swimming patterns, just how well they perform their task.
Every human has the benefit of as a baby training on things around them and being trained by those around them, building a foundation for all later skills. Generative models rely on many text and image pairs to describe things to them because they lack the ability to poke, prod, rotate, and disassemble for themselves.
For example, when a model takes in a thousand images of circles, it doesnât âlearnâ a thousand circles. It learns what circle GENERALLY is like, the concept of it. That representation, along with random noise, is how you create images with them. The same happens for every concept the model trains on. Everything from âcatâ to more complex things like color relationships and reflections or lighting. Machines are not human, but they can learn despite that.
In general I agree with you, but AI doesnât learn the concept of what a circle is. AI reproduces the most fitting representation of what we call a circle. But there is no understanding of the concept of a circle. This may sound nit picking, but I think itâs important to make the distinction.
That is why current models arenât regarded as actual intelligence, although people already call them thatâŚ
It makes sense to judge how closely LLMs mimic human learning when people are using it as a defense to AI companies scraping copyrighted content, and making the claim that banning AI scraping is as nonsensical as banning human learning.
But when itâs pointed out that LLMs donât learn very similarly to humans, and require scraping far more material than a human does, suddenly AIs shouldnât be judged by human standards? I donât know if itâs intentional on your part, but thatâs a pretty classic example of a motte-and-bailey fallacy. You canât have it both ways.
What you count as âoneâ example is arbitrary. In terms of pixels, youâre looking at millions right now.
The ability to train faster using fewer examples in real time, similar to what an intelligent human brain can do, is definitely a goal of AI research. But right now, we may be seeing from AI what a below average human brain could accomplish with hundreds of lifetimes to study.
If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.
I mean, no, if you only ever look at public domain stuff you literally wouldnât know the state of the art, which is historically happening for profit. Even the most untrained artist âdoing their own thingâ watches Disney/Pixar movies and listens to copyrighted music.
If weâre going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.
And humans donât require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just⌠go outside and draw things they see themselves, because the sky above them and the tree across the street arenât copyrighted. And in fact, Iâd argue that a good artist should go out and find real things to draw.
OpenAIâs argument is literally that their AI cannot learn without using copyrighted materials in vast quantities - too vast for them to simply compensate all the creators. So it genuinely is not comparable to a human, because humans can, in fact, learn without using copyrighted material. If OpenAIâs argument is actually that their AI canât compete commercially with modern art without using copyrighted works, then they should be honest about that - but then theyâd be showing their hand, wouldnât they?
Sure, if they want to compete with modern artists, they would need to look at modern artists
Which is the literal goal of Dall-E, SD, etc.
But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works
They could definitely learn some amount of skill, I agree. Iâd be very interested to see the best that an AI could achieve using only PD and CC content. It would be interesting. But youâd agree that it would look very different from modern art, just as an alien who has only been consuming earth media from 100+ years ago would be unable to relate to us.
the sky above them and the tree across the street arenât copyrighted.
Yeah, Iâd consider that PD/CC content that such an AI would easily have access to. But obviously the real sky is something entirely different from what is depicted in Starry Night, Star Wars, or H.P. Lovecraftâs description of the cosmos.
OpenAIâs argument is literally that their AI cannot learn without using copyrighted materials in vast quantities
Yeah, Iâd consider that a strong claim on their part; what they really mean is, itâs the easiest way to make progress in AI, and we wouldnât be anywhere close to where we are without it.
And you could argue âconvenient that it both saves them money, and generates money for them to do it this wayâ, but Iâd also point out that the alternative is they keep the trained models closed source, never using them publicly until they advance the tech far enough that theyâve literally figured out how to build/simulate a human brain that is able to learn as quickly and human-like as youâre describing. And then we find ourselves in a world where one or two corporations have this incredible proprietary ability that no one else has.
Personally, Iâd rather live in the world where the information about how to do all of this isnât kept for one or two corporations to profit from, I would rather live in the version where they publish their work publicly, early, and often, show that it works, and people are able to reproduce it, open source it, train their own models, and advance the technology in a space where anyone can use it.
You could hypothesize of a middle ground where they do the research, but arenât allowed to profit from it without licensing every bit of data they train on. But the reality of AI research is that it only happens to the extent that it generates revenue. Itâs been that way for the entire history of AI. Douglas Hofstadter has been asking deep important questions about AI as it relates to consciousness for like 60 years (ex. GEB, I am a Strange Loop), but thereâs a reason he didnât discover LLMs and tech companies did. Thatâs not to say his writings are meaningless, in fact I think theyâre more important than ever before, but he just wasnât ever going to get to this point with a small team of grad students, a research grant, and some public domain datasets.
So, itâs hard to disagree with OpenAI there, AI definitely wouldnât be where it is without them doing what theyâve done. And Iâm a firm believer that unless we figure our shit out with energy generation soon, the earth will be an uninhabitable wasteland. Weâre playing a game of climb the Kardashev scale, we opted for the âburn all the fossil fuels as fast as possibleâ strategy, and now weâre a the point where either spent enough energy fast enough to figure out the tech needed to survive this, or we suffocate on the fumes. The clock is ticking, and AI may be our best bet at saving the human race that doesnât involve an inordinate number of people dying.
It isnât wrong to use copyrighted works for training. Let me quote an article by the EFF here:
and
What you want would swing the doors open for corporate interference like hindering competition, stifling unwanted speech, and monopolization like nothing weâve seen before. There are very good reasons people have these rights, and we shouldnât be trying to change this. Ultimately, itâs apparent to me, you are in favor of these things. That you believe artists deserve a monopoly on ideas and non-specific expression, to the detriment of anyone else. If Iâm wrong, please explain to me how.
If weâre going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.
Humans benefit from years of evolutionary development and corporeal bodies to explore and interact with their world before theyâre ever expected to produce complex art. AI need huge datasets to understand patterns to make up for this disadvantage. Nobody pops out of the womb with fully formed fine motor skills, pattern recognition, understanding of cause and effect, shapes, comparison, counting, vocabulary related to art, and spatial reasoning. Datasets are huge and filled with image-caption pairs to teach models all of this from scratch. AI isnât human, and we shouldnât judge it against them, just like we donât judge boats on their rowing ability.
And humans donât require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just⌠go outside and draw things they see themselves, because the sky above them and the tree across the street arenât copyrighted. And in fact, Iâd argue that a good artist should go out and find real things to draw.
AI donât require most modern art in order to learn to make images either, but the range of expression would be limited, just like a humanâs in this situation. You can see this in cave paintings and early sculptures. They wouldnât be limited to this same degree, but you would still be limited.
It took us 100,000 years to get from cave drawings to Leonard Da Vinci. This is just another step for artists, like Camera Obscura was in the past. Itâs important to remember that early man was as smart as we are, they just lacked the interconnectivity to exchange ideas that we have.
When you look at one painting, is that the equivalent of one instance of the painting in the training data? There is an infinite amount of information in the painting, and each time you look you process more of that information.
Iâd say any given painting you look at in a museum, you process at least a hundred mental images of aspects of it. A painting on your wall could be seen ten thousand times easily.