-11 points
*

Amazing how every new generation of technology has a generation of users of the previous technology who do whatever they can do stop its advancement. This technology takes human creativity and output to a whole new level, it will advance medicine and science in ways that are difficult to even imagine, it will provide personalized educational tutoring to every student regardless of income, and these people are worried about the technicality of what the AI is trained on and often don’t even understand enough about AI to even make an argument about it. If people like this win, whatever country’s legal system they win in will not see the benefits that AI can bring. That society is shooting themselves in the foot.

Your favorite musician listened to music that inspired them when they made their songs. Listening to other people’s music taught them how to make music. They paid for the music (or somebody did via licensing fees or it was freely available for some other reason) when they listened to it in the first place. When they sold records, they didn’t have to pay the artist of every song they ever listened to. That would be ludicrous. An AI shouldn’t have to pay you because it read your book and millions like it to learn how to read and write.

permalink
report
reply
31 points

You’re humanizing the software too much. Comparing software to human behavior is just plain wrong. GPT can’t even reason properly yet. I can’t see this as anything other than a more advanced collage process.

Open used intellectual property without consent of the owners. Major fucked.

If ‘anybody’ does anything similar to tracing, copy&pasting or even sampling a fraction of another person’s imagery or written work, that anybody is violating copyright.

permalink
report
parent
reply
7 points
*

If ‘anybody’ does anything similar to tracing, copy&pasting or even sampling a fraction of another person’s imagery or written work, that anybody is violating copyright.

Ok, but tracing is literally a part of the human learning process. If you trace a work and sell it as your own that’s bad. If you trace a work to learn about the style and let that influence your future works that is what every artist already does.

The artistic process isn’t copyrighted, only the final result. The exact same standards can apply to AI generated work as already do to anything human generated.

permalink
report
parent
reply
2 points

i don’t know the specifics of the lawsuit but i imagine this would parallel piracy.

in a way you could say that Open has pirated software directly from multiple intellectual properties. Open has distributed software which emulates skills and knowledge. remember this is a tool, not an individual.

permalink
report
parent
reply
-8 points

You’re mystifying and mythologising humans too much. The learning process is very equivalent.

permalink
report
parent
reply
-6 points

Well, there still a shit ton we don’t understand about human.

We do, however, understand everything about machine learning.

permalink
report
parent
reply
3 points

amazing

permalink
report
parent
reply
2 points

sampling a fraction of another person’s imagery or written work.

So citing is a copyright violation? A scientific discussion on a specific text is a copyright violation? This makes no sense. It would mean your work couldn’t build on anything else, and that’s plain stupid.

Also to your first point about reasoning and advanced collage process: you are right and wrong. Yes an LLM doesn’t have the ability to use all the information a human has or be as precise, therefore it can’t reason the same way a human can. BUT, and that is a huge caveat, the inherit goal of AI and in its simplest form neural networks was to replicate human thinking. If you look at the brain and then at AIs, you will see how close the process is. It’s usually giving the AI an input, the AI tries to give the desired output, them the AI gets told what it should have looked like, and then it backpropagates to reinforce it’s process. This already pretty advanced and human-like (even look at how the brain is made up and then how AI models are made up, it’s basically the same concept).

Now you would be right to say “well in it’s simplest form LLMs like GPT are just predicting which character or word comes next” and you would be partially right. But in that process it incorporates all of the “knowledge” it got from it’s training sessions and a few valuable tricks to improve. The truth is, differences between a human brain and an AI are marginal, and it mostly boils down to efficiency and training time.

And to say that LLMs are just “an advanced collage process” is like saying “a car is just an advanced horse”. You’re not technically wrong but the description is really misleading if you look into the details.

And for details sake, this is what the paper for Llama2 looks like; the latest big LLM from Facebook that is said to be the current standard for LLM development:

https://arxiv.org/pdf/2307.09288.pdf

permalink
report
parent
reply
5 points

I don’t think that Sarah Silverman and the others are saying that the tech shouldn’t exist. They’re saying that the input to train them needs to be negotiated as a society. And the businesses also care about the input to train them because it affects the performance of the LLMs. If we do allow licensing, watermarking, data cleanup, synthetic data, etc. in a way that is transparent, I think it’s good for the industry and it’s good for the people.

permalink
report
parent
reply
3 points

I don’t need to negotiate with Sarah Silverman if Im handed her book by a friend, and neither should an AI

permalink
report
parent
reply
1 point

Except the AI owner does. It’s like sampling music for a remix or integrating that sample into a new work. Yes, you do not need to negotiate with Sarah Silverman if you are handed a book by a friend. However if you use material from that book in a work it needs to be cited. If you create an IP based off that work, Sarah Silverman deserves compensation because you used material from her work.

No different with AI. If the AI used intellectual property from an author in its learning algorithm, than if that intellectual property is used in the AI’s output the original author is due compensation under certain circumstances.

permalink
report
parent
reply
3 points

But you do need to negotiate with Sarah Silverman, if you take that book, rearrange the chapters, and then try sell it for profit. Obviously that’s extremified but it’s The argument they’re making.

permalink
report
parent
reply
1 point

An LLM isn’t human and shouldn’t be treated the same as a human. It’s as foolish as corporate personhood.

permalink
report
parent
reply
23 points
*
Deleted by creator
permalink
report
parent
reply
31 points
*

No that’s not how it works. It stores learned information like “word x is more likely to follow word y than word a” or “people from country x are more likely to consume food a than b”. That is what is distributed when the AI model is shared. To learn that, it just reads books zillions of times and updates its table of likelihoods. Just like an artist might listen to a Lil Wayne album hundreds of times and each time they learn a little bit more about his rhyme style or how beats work or whatever. It’s more complicated than that, but that’s a layperson’s explanation of how it works. The book isn’t stored in there somewhere. The book’s contents aren’t transferred to other parties.

permalink
report
parent
reply
7 points
*

The learning model is artificial, vs a human that is sentient. If a human learns from a piece of work, that’s fine if they emulate styles in their own work. However, sample that work, and the original artist is due compensation. This was a huge deal in the late 80s with electronic music sampling earlier musical works, and there are several cases of copyright that back original owners’ claim of royalties due to them.

The lawsuits allege that the models used copyrighted work to learn. If that is so, writers are due compensation for their copyrighted work.

This isn’t litigation against the technology. It’s litigation around what a machine can freely use in its learning model. Had ChatGPT, Meta, etc., used works in the public domain this wouldn’t be an issue. Yet it looks as if they did not.

EDIT

And before someone mentions that the books may have been bought and then used in the model, it may not matter. The Birthday Song is a perfect example of copyright that caused several restaurant chains to use other tunes up until the copyright was overturned in 2016. Every time the AI uses the copied work in its’ output it may be subject to copyright.

permalink
report
parent
reply
2 points

When you download Vicuna or Stable Diffusion XL, they’re a handful of gigabytes. But when you go download LAION-5B, it’s 240TB. So where did that data go if it’s being copy/pasted and regurgitated in its entirety?

permalink
report
parent
reply
2 points

Exactly! If it were just out putting exact data they wouldn’t care about making new works and just pivot as the world’s greatest source of compression.

Though there is some work researchers have done to heavily modify these models to over fit to do exactly this.

permalink
report
parent
reply
5 points

Its less about copying the work, its more like looking at patterns that appear in a work.

To bring a very rudimentary example, if I wanted a word and the first letter was Q, what would the second letter be.

Of course, statistically, the next letter is u, and its not common for words starting with Q to have a different letter after that. ML/AI is like taking these small situations, but having a ridiculous amount of parameters to come up with something based on several internal models. These paramters of course generally have some context.

Its like if you were told to read a book thoroughly, and then after was told to reproduce the same book. You probably cannot make it 1:1, but could probably get the general gist of a story. The difference between you and the machine is the machine read a lot of books, and contextually knows patterns so that it can generate something similar faster and more accurate, but not exactly the original one for one thing.

permalink
report
parent
reply
-2 points

This technology takes human creativity and output to a whole new level,

No, it doesn’t. There’s nothing “human” or “creative” about the output of AI.

permalink
report
parent
reply
-1 points

Amazing how every generation of technology has an asshole billionaire or two stealing shit to be the first in line to try and monopolize society’s progress.

permalink
report
parent
reply
3 points

its a bit more than that if the ai is told to make something in the style of.

permalink
report
parent
reply
3 points

I mean people have doing new works in the style of other artists for a while as well.

permalink
report
parent
reply
4 points

yeah again they can’t crank out a new one every 5 minutes and actually it would overwhelm the courts as its very easy for those works to be to similar. take the guy who tried to sue disney by writing a book based on finding nemo when he found out they were making a story like that. He was shady and tried to play timeline games but he did not need to make a story just like it.

permalink
report
parent
reply
1 point

Yeah, and a person could make something in the style of someone else. And it would only be copyright infringement if the work does not meaningfully change the original and give credit to the original artist.

How is this any different?

permalink
report
parent
reply
1 point

mainly because its just to easy. We should limit time periods for ip but while its in force it should not be able to be used by ai to me. Keep ip to 20 years and let ai have it at that point.

permalink
report
parent
reply
5 points

I take it we don’t use the phrase “good writers borrow, great writers steal” in this day and age…

permalink
report
reply
10 points

Wait till they find out photographers spend their whole careers trying to emulate the style of previous generations. Or that Adobe has been implementing AI-driven content creation into Photoshop and Lightroom for years now, and we’ve been pretending we don’t notice because it makes our jobs easier.

permalink
report
parent
reply
26 points

At the crux of the author’s lawsuit is the argument that OpenAI is ruthlessly mining their material to create “derivative works” that will “replace the very writings it copied.”

The authors shoot down OpenAI’s excuse that “substantial similarity is a mandatory feature of all copyright-infringement claims,” calling it “flat wrong.”

Goodbye Star Wars, Avatar, Tarantino’s entire filmography, every slasher film since 1974…

permalink
report
reply
10 points

OpenAI is trying to argue that the whole work has to be similar to infringe, but that’s never been true. You can write a novel and infringe on page 302 and that’s a copyright infringement. OpenAI is trying to change the meaning of copyright otherwise, the output of their model is oozing with various infringements.

permalink
report
parent
reply
0 points
*

I can quote work that’s already been published, that’s allowable and I don’t have to get to the author’s consent to do that. I don’t have to get consent to do that because I’m not passing the work off my own, I am quoting it with reference.

So if I ask the AI to produce something in the style of Stephen King no copyright is violated because it’s all original work.

If I ask the AI to quote Stephen King (and it actually does it) then it’s a quote and it’s not claiming the work is its own.

Under the current interpretation of copyright law (and current law is broken beyond belief, but that’s a completely different issue) a copyright breach has not occurred in either scenario.

The only arguement I can see working is that if the AI actually can quote Stephen King that will prove that it has the works of Stephen King in its data set, but that doesn’t really prove anything other than the works of Stephen King are in its data set. It doesn’t definitively prove openAI didn’t pay for the works.

permalink
report
parent
reply
5 points
*

You can quote a work under fair use, and if it’s legal depends on your intent. You have to be quoting it for such uses as “commentary, criticism, news reporting, and scholarly reports.”

There is no cheat code here. There is no loophole that LLMs can slide on through. The output of LLMs is illegal. The training of LLMs without consent is probably illegal.

The industry knows that its activity is illegal and it strategy is not to win but rather to make litigation expensive, complex and slow through such tactics as:

  1. Diffusion of responsibility: (note the companies compiling the list of training works, gathering those works, training on those works and prompting the generation of output are all intentionally different entities). The strategy is that each entity can claim “I was only doing X, the actual infringement is when that guy over there did Y”.
  2. Diffusion of infringement: so many works are being infringed that it becomes difficult, especially on the output side, to say who has been infringed and who has standing. What’s more, even in clear cut cases like, for instance, when I give an LLM a prompt and it regurgitates some nontrivial recognizable copyrighted work, the LLM trainer will say you caused the infringement with your prompt! (see point 1)
  3. Pretending to be academic in nature so they could wrap themselves in the thick blanket of affirmative defense that fair use doctrine affords the academy, and then after the training portion of the infringement has occurred (insisting that was fair use because it was being used in an academic context) “whoopseeing” it into a commercial product.
  4. Just being super cagey about the details of the training sets that were actually used and how they were used. This kind of stuff is discoverable but you have to get to discovery first.
  5. and finally magic brain box arguments. These is typically some variation of “all artists have influences.” It is a rhetorical argument that would be blown right past in court, but it muddies the public discussion and is useful to them in that way.

Their purpose is not to win. It’s to slow everything down, and limit the number of people who are being infringed who have the resources to pursue them. The goal is that if they can get LLMs to “take over” quickly then they can become, you know, too big and too powerful to be shut down even after the inevitable adverse rulings. It’s classic “ask for forgiveness, not permission” silicon valley strategy.

Sam Altman’s goal in creeping around Washington is to try to get laws changed to carve out exceptions for exactly the types of stuff he is already doing. And it is just the same thing SBF was doing when he was creeping around Washington trying to get a law that would declare his securitized ponzi tokens to be commodities.

permalink
report
parent
reply
2 points

It doesn’t definitively prove openAI didn’t pay for the works.

But since they are a business/org and has all of those works and using them for profit. Then it kind of would be provable if openAI did or didn’t pay the correct licenses as they and/or the publisher/Stephen King (if he directly were to handle those agreements) would have a receipt/license document of some kind to show it. I don’t agree with how copyrights are done and agree that things should be public domain much sooner. But a for-profit thing like openAI shouldn’t be just allowed to have all these exceptions that avoids needing any level of permission and paying for ones that ask for it to use it. At least not while us regular people that aren’t using these sources for profits/business also aren’t allowed to just use whatever we want.

The only way that (I at least) see such an open use of everything at the level of all the data/information being fine is in a socialist/communist system of some kind. As the main concern for generally keeping stuff like entertainment/information/art/etc at a creator level is to have money to live in modern society where basic and crucial needs (food/housing/healthcare/etc) costs money. So for the average author/writer/artist/inventor a for-profit company just being able to take their shit and much more directly impact their ability to live.

It is a highly predatory level of capitalism and should not have exceptions. It is just setting up a different version of the shit that needs to also be stopped in the entertainment/technology industries. Where the actual creators/performers/etc are fucked by the studios/labs/corps by not being paid anywhere near the value being brought in and may not have control over it. So all of the companies and the capitalist system are why a private entity/business/org shouldn’t just be allowed to pull this shit.

permalink
report
parent
reply
17 points

Uh, yeah, a massive corporation sucking up all intellectual property to milk it is not the own you think it is.

permalink
report
parent
reply
12 points

But this is literally people trying to strengthen copyright and its scope. The corporation is, out of pure convenience, using copyright as it exists currently with the current freedoms applied to artists.

permalink
report
parent
reply
-5 points
*

Listen, it’s pretty simple. Copyright was made to protect creators on initial introduction to market. In modern times it’s good if an artist has one lifetime, i.e their lifetime of royalties, so that they can at least make a little something - because for the small artist that little something means food on their plate.

But a company, sitting on a Smaug’s hill worth of intellectual property, “forever less a day”? Now that’s bonkers.

But you, scraping my artwork to resell for pennies on the dollar via some stock material portal? Can I maybe crawl up your colon with sharp objects and kindling to set up a fire? Pretty please? Oh pretty please!

Also, if you AI copies my writing style, I will personally find you, rip open your skull AND EAT YOUR BRAINS WITH A SPOON!!! Got it, devboy?

Won’t be Mr Hotshot with a pointy objects and a fire up you ass, as well as less than half a brain… even though I just took a couple of bites.

Chew on that one.

EDIT: the creative writer is doomed, I tells ya! DOOOOOOMED!

permalink
report
parent
reply
9 points
*

AI training isn’t only for mega-corporations. We can already train our own open source models, so we shouldn’t applaud someone trying to erode our rights and let people put up barriers that will keep out all but the ultra-wealthy. We need to be careful not weaken fair use and hand corporations a monopoly of a public technology by making it prohibitively expensive to for regular people to keep developing our own models. Mega corporations already have their own datasets, and the money to buy more. They can also make users sign predatory ToS allowing them exclusive access to user data, effectively selling our own data back to us. Regular people, who could have had access to a corporate-independent tool for creativity, education, entertainment, and social mobility, would instead be left worse off with fewer rights than where they started.

permalink
report
parent
reply
1 point
*

Speaking of slasher films, does anybody know of any movies that have terrible everything except a really good plot?

permalink
report
parent
reply
2 points

The Godfather Part III

permalink
report
parent
reply
16 points
*

Is actually reminds me of a Sci-Fi I read where in the future, they use an ai to scan any new work in order to see what intellectual property is the big Corporation Zone that may have been used as an influence in order to Halt the production of any new media not tied to a pre-existing IP including 100% of independent and fan-made works.

Which is one of the contributing factors towards the apocalypse. So 500 years later after the apocalypse has been reversed and human colonies are enjoying post scarcity, one of the biggest fads is rediscovering the 20th century, now that all the copyrights expired in people can datamine the ruins of Earth to find all the media that couldn’t be properly preserved heading into Armageddon thanks to copyright trolling.

It’s referred to in universe as “Twencen”

The series is called FreeRIDErs if anyone is curious, unfortunately the series may never have a conclusion, (untimely death of co creator) most of its story arcs were finished so there’s still a good chunk of meat to chew through and I highly recommend it.

permalink
report
parent
reply
19 points

seethe

Very concerning word use from you.

The issue art faces isn’t that there’s not enough throughput, but rather there’s not enough time, both to make them and enjoy them.

permalink
report
reply
-23 points

Headline is stupid.

Millenails journalism is fucking got to stop with these clown word choices…

permalink
report
parent
reply
29 points

Honestly it’s refreshing to not see the word “slammed” for once…

permalink
report
parent
reply
-2 points

Haha… This person gets it.

permalink
report
parent
reply
1 point

Let the boys be boys

permalink
report
parent
reply
15 points

That’s always been the case, though, imo. People had to make time for art. They had to go to galleries, see plays and listen to music. To me it’s about the fair promotion of art, and the ability for the art enjoyer to find art that they themselves enjoy rather than what some business model requires of them, and the ability for art creators to find a niche and to be able to work on their art as much as they would want to.

permalink
report
parent
reply
23 points

I don’t care what works a neural network gets trained on. How else are we supposed to make one?

Should I care more about modern eternal copyright bullshit? I’d feel more nuance if everything a few decades old was public-domain, like it’s fucking supposed to be. Then there’d be plenty of slightly-outdated content to shovel into these statistical analysis engines. But there’s not. So fuck it: show the model absolutely everything, and the impact of each work becomes vanishingly small.

Models don’t get bigger as you add more stuff. Training only twiddles the numbers in each layer. There are two-gigabyte networks that have been trained on hundreds of millions of images. If you tried to store those image, verbatim, they would each weigh barely a dozen bytes. And the network gets better as that number goes down.

The entire point is to force the distillation of high-level concepts from raw data. We’ve tried doing it the smart way and we suck at it. “AI winter” and “good old-fashioned AI” were half a century of fumbling toward the acceptance that we don’t understand how intelligence works. This brute-force approach isn’t chosen for cost or ease or simplicity. This is the only approach that works.

permalink
report
reply
-5 points
*
Deleted by creator
permalink
report
parent
reply
13 points

Right, copyright law.

permalink
report
parent
reply
1 point
*
Deleted by creator
permalink
report
parent
reply
3 points

Models don’t get bigger as you add more stuff.

They will get less coherent and/or “forget” the earlier data if you don’t increase the parameters with the training set.

There are two-gigabyte networks that have been trained on hundreds of millions of images

You can take a huge tiff of an image, put it through JPEG with the quality cranked all the way down and get a tiny file out the other side, which is still a recognizable derivative of the original. LLMs are extremely lossy compression of their training set.

permalink
report
parent
reply
4 points

which is still a recognizable derivative of the original

Not in twelve bytes.

Deep models are a statistical distillation of a metric shitload of data. Smaller models with more training on more data don’t get worse, they get more abstract - and in adversarial uses they often kick big networks’ asses.

permalink
report
parent
reply

Technology

!technology@lemmy.ml

Create post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

Community stats

  • 2K

    Monthly active users

  • 2.7K

    Posts

  • 42K

    Comments

Community moderators