Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

[ - ]

db0@lemmy.dbzer0.com

116 points

1 year ago

I agree with google, only I go a step further and say any AI model trained on public data should likewise be public for all and have its data sources public as well. Can’t have it both ways Google.

permalink

report

reply

[ - ]

Domi@lemmy.secnd.me

49 points

1 year ago

To be fair, Google releases a lot of models as open source: https://huggingface.co/google

Using public content to create public models is also fine in my book.

But since it’s Google I’m also sure they are doing a lot of shady stuff behind closed doors.

permalink

report

parent

reply

[ - ]

gjghkk@lemmy.dbzer0.com

1 point

1 year ago

I hope that too, but I’m less optimistic. We live in a capitalistic world.

permalink

report

parent

reply

[ - ]

FaceDeer@kbin.social

58 points

1 year ago

Copyright law already allows generative AI systems to scrape the internet. You need to change the law to forbid something, it isn’t forbidden by default. Currently, if something is published publicly then it can be read and learned from by anyone (or anything) that can see it. Copyright law only prevents making copies of it, which a large language model does not do when trained on it.

permalink

report

reply

[ - ]

maynarkh@feddit.nl

35 points

1 year ago

A lot of licensing prevents or constrains creating derivative works and monetizing them. The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

If yes, Github Copilot is illegal as it produces code that should comply to multiple conflicting license requirements. If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt, and run a leaked copy of Windows through it, then go around selling Binbows and MSFT can’t do anything about it.

The truth is mostly between the two, this is just piracy, which always has been a gray area because of the difficulty of prosecuting it, previously because the perpetrators were many and hard to find, now it’s because the perpetrators are billion dollar companies with expensive lawyer teams.

permalink

report

parent

reply

[ - ]

FaceDeer@kbin.social

22 points

1 year ago

The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

This question is completely independent of whether the code was generated by an AI or a human. You compare code A with code B, and if the judge and jury agree that code A is a derivative work of code B then you win the case. If the two bodies of work don’t have sufficient similarities then they aren’t derivative.

If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt

You’ve reinvented copy-and-paste, not an “AI.” AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

Filtering the code through something you call an AI isn’t going to have any impact on whether you get sued. If the resulting code looks like copyrighted code, then you’re in trouble. If it doesn’t look like copyrighted code then you’re fine.

permalink

report

parent

reply

[ - ]

maynarkh@feddit.nl

13 points

1 year ago

AIs are deliberately designed to not copy-and-paste.

AI is a marketing term, not a technical one. You can call anything “AI”, but it’s usually predictive models that get called that.

AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

For example if the powers that be decided to say licenses don’t apply once you feed material through an “AI”, and failed to define AI, you could say you wrote this awesome OS using an AI that you trained exclusively using Microsoft proprietary code. Their licenses and copyright and stuff doesn’t apply to AI training data so you could sell that new code your AI just created.

It doesn’t even have to be 100% identical to Windows source code. What if it’s just 80%? 50%? 20%? 5%? Where is the bar where the author can claim “that’s my code!”?

Just to compare, the guys who set out to reimplement Win32 APIs for use in Linux (the thing that made it into MacOS as well now) deliberately would not accept help from anyone who ever saw any Microsoft source code for fear of being sued. The bar was that high when it was a small FOSS organization doing it. It was 0%, proven beyond a doubt.

Now that Microsoft is the author, it’s not a problem when Github Copilot spits out GPL code word for word, ironically together with its license.

report

reply

[ - ]

10 points

1 year ago

If the resulting code looks like copyrighted code, then you’re in trouble. If it doesn’t look like copyrighted code then you’re fine.

^^ Very much this.

Loads of people are treating the process of AI creating works as either violating copyright or not. But that is not how copyright works. It applies to the output of a process not the process itself. If someone ends up writing something that happens to be a copy of something they read before - that is a violation of copy write laws. If someone uses various works and creates something new and unique then that is not a violation. It does not - at this point in time at least - matter if that someone is a real person or an AI.

AI can both violate copy write on one work and not on another. Each case is independent and would need to be legislated differently. But AI can produce so much content so quickly that it creates a real problem for a case by case analysis of copy write infringement. So it is quite likely the laws will need to change to account for this and will likely need to treat AI works differently from human created works. Which is a very hard thing to actually deal with.

Now, one could also argue the model itself is a violation of copyright. But that IMO is a stretch - a model is nothing like the original work and the copyright law also does not cover this case. It would need to be taken to court to really decide on if this is allowed or not.

Personally I don’t think the conversation should be on what the laws currently allow - they were not designed for this. But instead what the laws should allow. So we can steer the conversation towards a better future. Lots of artists are expressing their distaste for AI models to be trained on their works - if enough people do this laws can be crafted to backup this view.

permalink

report

parent

reply

[ - ]

AbsolutelyNotABot@feddit.it

6 points

1 year ago

then go around selling Binbows and MSFT can’t do anything about it

I think this already happen. A very practical example, windows GUI has been copied by many Linus distros. And with windows 11 there’s clearly a reference to Apple MacOS GUI with a sparkling of Google material design.

Should apple and Google be able to sue Microsoft because it “copied” their work? Should Google be able to sue apple because they “copied” the notification drop-down in iOS?

As you say it’s really a grey area because the only reason we consider AI code to be “regurgitated” while human code to be “inspired” is only because we give humans more recognition of their intellectual abilities.

permalink

report

parent

reply

[ - ]

Boinketh@lemm.ee

23 points

1 year ago

*

Deleted by creator

permalink

report

parent

reply

Show more comments

[ - ]

Even_Adder@lemmy.dbzer0.com

5 points

1 year ago

You should read this.

permalink

report

parent

reply

[ - ]

lostmypasswordanew@feddit.de

10 points

1 year ago

An AI model is a derivative work of its training data and thus a copyright violation if the training data is copyrighted.

permalink

report

parent

reply

[ - ]

BlameThePeacock@lemmy.ca

17 points

1 year ago

A human is a derivative work of its training data, thus a copyright violation if the training data is copyrighted.

The difference between a human and ai is getting much smaller all the time. The training process is essentially the same at this point, show them a bunch of examples and then have them practice and provide feedback.

If that human is trained to draw on Disney art, then goes on to create similar style art for sale that isn’t a copyright infringement. Nor should it be.

permalink

report

parent

reply

[ - ]

Phanatik@kbin.social

15 points

1 year ago

*

This is stupid and I’ll tell you why.
As humans, we have a perception filter. This filter is unique to every individual because it’s fed by our experiences and emotions. Artists make great use of this by producing art which leverages their view of the world, it’s why Van Gogh or Picasso is interesting because they had a unique view of the world that is shown through their work.
These bots do not have perception filters. They’re designed to break down whatever they’re trained on into numbers and decipher how the style is constructed so it can replicate it. It has no intention or purpose behind any of its decisions beyond straight replication.
You would be correct if a human’s only goal was to replicate Van Gogh’s style but that’s not every artist. With these art bots, that’s the only goal that they will ever have.

I have to repeat this every time there’s a discussion on LLM or art bots:
The imitation of intelligence does not equate to actual intelligence.

report

reply

[ - ]

10 points

1 year ago

a human does not copy previous work exactly like these algorithms, whats this shit take?

permalink

report

parent

reply

Show more comments

[ - ]

lostmypasswordanew@feddit.de

8 points

1 year ago

Humans and AI are not the same and an equivalence should never be drawn.

permalink

report

parent

reply

Show more comments

[ - ]

conciselyverbose@kbin.social

8 points

1 year ago

Derivative works are only copyright violations when they replicate substantial portions of the original without changes.

The entirety of human civilization is derivative works. Derivative works aren’t infringement.

permalink

report

parent

reply

[ - ]

lostmypasswordanew@feddit.de

8 points

1 year ago

That’s just not true

report

reply

[ - ]

5 points

1 year ago

It is not a derivative work, the model does not contain any recognizable part of the original material that it was trained on.

permalink

report

parent

reply

[ - ]

frog 🐸@beehaw.org

14 points

1 year ago

Except when it produces exact copies of existing works, or when it includes a recognisable signature or watermark?

permalink

report

parent

reply

Show more comments

[ - ]

ConsciousCode@beehaw.org

47 points

1 year ago

To be honest I’m fine with it in isolation, copyright is bullshit and the internet is a quasi-socialist utopia where information (an infinitely-copyable resource which thus has infinite supply and 0 value under capitalist economics) is free and humanity can collaborate as a species. The problem becomes that companies like Google are parasites that take and don’t give back, or even make life actively worse for everyone else. The demand for compensation isn’t so much because people deserve compensation for IP per se, it’s an implicit understanding of the inherent unfairness of Google claiming ownership of other people’s information while hoarding it and the wealth it generates with no compensation for the people who actually made that wealth. “If you’re going to steal from us, at least pay us a fraction of the wealth like a normal capitalist”.

If they made the models open source then it’d at least be debatable, though still suss since there’s a huge push for companies to replace all cognitive labor with AI whether or not it’s even ready for that (which itself is only a problem insofar as people need to work to live, professionally created media is art insofar as humans make it for a purpose but corporations only care about it as media/content so AI fits the bill perfectly). Corporations are artificial metaintelligences with misaligned terminal goals so this is a match made in superhell. There’s a nonzero chance corporations might actually replace all human employees and even shareholders and just become their own version of skynet.

Really what I’m saying is we should eat the rich, burn down the googleplex, and take back the means of production.

permalink

report

reply

[ - ]

superkret@feddit.de

15 points

1 year ago

*

Deleted by creator

permalink

report

parent

reply

[ - ]

ConsciousCode@beehaw.org

6 points

1 year ago

That’s fair, also congratulations. Idk if I would count that towards contributing to the internet though, since it’s all within their walled garden on their own terms. It’s helpful for people, but only insofar as it helps Google. 10 years ago I might be less critical since they were still in their “don’t be evil” phase and creating open source projects like Android left and right, something they’re evidently regretting now and trying to lock down using propriety core apps. It’s also worth noting Google’s AI employees authored “Attention is all you need”, the paper which laid the groundwork for modern Transformer-based LLMs, though that’s an architecture and not a full model or code.

permalink

report

parent

reply

[ - ]

Ubermeisters@lemmy.zip

10 points

1 year ago

Okay so I took back the means of production but it says it’s a subscription basis now

permalink

report

parent

reply

[ - ]

ConsciousCode@beehaw.org

10 points

1 year ago

That’s late-stage capitalism for you – even revolution comes with a subscription fee

permalink

report

parent

reply

[ - ]

SpaceCowboy@lemmy.ca

5 points

1 year ago

Probably shoulda read the Revolution TOS before clicking “I Agree”.

permalink

report

parent

reply

[ - ]

cambriakilgannon@beehaw.org

10 points

1 year ago

Or, if it was some non-profit doing the work for the good of everyone :')

permalink

report

parent

reply

[ - ]

ConsciousCode@beehaw.org

9 points

1 year ago

If only there were some kind of open AI research lab lmao. In all seriousness Anthropic is pretty close to that, though it appears to be a public benefit corporation rather than a nonprofit. Luckily the open source community in general is really picking up the slack even without a centralized organization, I wouldn’t be surprised if we get something like the Linux Foundation eventually.

permalink

report

parent

reply

[ - ]

andresil@lemm.ee

40 points

1 year ago

*

Copyright law is gaslighting at this point. Piracy being extremely illegal but then this kind of shit being allowed by default is insane.

We really are living under the boot of the ruling classes.

permalink

report

reply

[ - ]

FaceDeer@kbin.social

7 points

1 year ago

If you want “this kind of stuff” (by which I assume you mean the training of AI) to not be allowed by default, then you are basically asking for a world in which the only legal generative AIs belong to giant well-established copyright holders like Adobe and Getty. That path leads deeper underneath the boots of those ruling classes, not out from under them.

permalink

report

parent

reply

[ - ]

andresil@lemm.ee

8 points

1 year ago

*

I don’t think it should be allowed to be trained off any of this stuff for entertainment/art/etc. at all. Like the dream future of AI was all the shitty boring stuff handled for us so we could sit back, chill and focus on arts, real scientific research, general individual betterment etc.

Instead we have these companies trying to get them doing all the art and interesting things whilst we all either have no job, money, or good standard of living, or the dangerous / shitty jobs.

permalink

report

parent

reply

[ - ]

FaceDeer@kbin.social

1 point

1 year ago

So to avoid being “under the boot of the ruling classes” you want the government to be in charge of deciding what is and is not the correct way to produce our entertainment and art?

I use Stable Diffusiuon to generate illustrations for tabletop roleplaying game adventures that I run for my friends. I use ChatGPT to brainstorm ideas for those adventures and come up with dialogue or descriptive text. How big a fine would I be facing under these laws?

permalink

report

parent

reply

Show more comments

[ - ]