402 points
*

It’s so ridiculous when corporations steal everyone’s work for their own profit, no one bats an eye but when a group of individuals do the same to make education and knowledge free for everyone it’s somehow illegal, unethical, immoral and what not.

permalink
report
reply
94 points

Using publically available data to train isn’t stealing.

Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can’t use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.

They want to kill the open-source scene and are manipulating you to do so. Don’t build their moat for them.

permalink
report
parent
reply
56 points
*

And using publicly available data to train gets you a shitty chatbot…

Hell, even using copyrighted data to train isn’t that great.

Like, what do you even think they’re doing here for your conspiracy?

You think OpenAI is saying they should pay for the data? They’re trying to use it for free.

Was this a meta joke and you had a chatbot write your comment?

permalink
report
parent
reply
25 points

Was this a meta joke and you had a chatbot write your comment?

if someone said this to me I’d cry

permalink
report
parent
reply
19 points
*

The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn’t copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.

This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.

The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.

EDIT: In case it isn’t clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.

permalink
report
parent
reply
5 points

Hey man, that’s damn hurtful

permalink
report
parent
reply
3 points

I’m not sure if someone else has brought this up, but I could see OpenAI and other early adopters pushing for tighter controls of training data as a means to be the only players in town. You can’t build your own competing AI because you won’t have the same amount of data as us and we’ll corner the market.

permalink
report
parent
reply
1 point

If the data has to be paid for, openAI will gladly do it with a smile on their face. It guarantees them a monopoly and ownership of the economy.

Paying more but having no competition except google is a good deal for them.

permalink
report
parent
reply
-2 points
Deleted by creator
permalink
report
parent
reply
-4 points
*

Maybe Grimy does have concerns, but they’ve never used the words “open source” outside of talking about AI.

And yes, I checked all of their comments :)

permalink
report
parent
reply
43 points

OpenAI is definitely not the one arguing that they have stole data to train their AIs, and Disney will be fine whether AI requires owning the rights to training materials or not. Small artists, the ones protesting the most against it, will not. They are already seeing jobs and commission opportunities declining due to it.

Being publicly available in some form is not a permission to use and reproduce those works however you feel like. Only the real owner have the right to decide. We on the internet have always been a bit blasé about it, sometimes deservedly, but as we get to a point we are driving away the very same artists that we enjoy and get inspired by, maybe we should be a bit more understanding about their position.

permalink
report
parent
reply
-1 points
*

Thats basically my main point, Disney doesn’t need the data, Getty either. AI isn’t going away and the jobs will be lost no matter what.

Putting a price tag in the high millions for any kind of generative model only benefits the big players.

I feel for the artists. It was already a very competitive domain that didn’t really pay well and it’s now much worse but if they aren’t a household name, they aren’t getting a dime out of any new laws.

I’m not ready to give the economy to Microsoft, Google, Getty and Adobe so GRRM can get a fat payday.

permalink
report
parent
reply
27 points

That depends on what your definition of “publicly available” is. If you’re scraping New York Times articles and pulling art off Tumblr then yeah, it’s exactly stealing in the same way scihub is. Only difference is, scihub isn’t boiling the oceans in an attempt to make rich people even richer.

permalink
report
parent
reply
3 points
*
Deleted by creator
permalink
report
parent
reply
21 points

We have a mechanism for people to make their work publically visible while reserving certain rights for themselves.

Are you saying that creators cannot (or ought not be able to) reserve the right to ML training for themselves? What if they want to selectively permit that right to FOSS or non-profits?

permalink
report
parent
reply
11 points

That’s exactly what they’re saying. The AI proponents believe that copyright shouldn’t be respected and they should be able to ignore any licensing because “it’s hard to find data otherwise”

permalink
report
parent
reply
-4 points

Essentially yes. There isn’t a happy solution where FOSS gets the best images and remains competitive. The amount of data needed is outside what can be donated. Any open source work will be so low in quality as to be unusable.

It also won’t be up to them. The platforms where the images are posted will be selling and brokering. No individual is getting a call unless they are a household name.

None of the artists are getting paid either way so yeah, I’m thinking of society in general first.

permalink
report
parent
reply
14 points
*

They want to kill the open-source scene

Yeah, by using the argument you just gave as an excuse to “launder” copyleft works in the training data into permissively-licensed output.

Including even a single copyleft work in the training data ought to force every output of the system to be copyleft. Or if it doesn’t, then the alternative is that the output shouldn’t be legal to use at all.

permalink
report
parent
reply
1 point

100% agree, making all outputs copyleft is a great solution. We get to keep the economic and cultural boom that AI brings while keeping the big companies in check.

permalink
report
parent
reply
13 points

Scientific research papers are generally public too, in that you can always reach out to the researcher and they’ll provide the papers for free, it’s just the “corporate” journals that need their profit off of other peoples work…

permalink
report
parent
reply
9 points
*

The point is the entire concept of AI training off people’s work to make profit for others is wrong without the permission of and compensation for the creator regardless if it’s corporate or open source.

permalink
report
parent
reply
7 points

I think I’ve decided to not publish anything that I want to keep ownership of, just in case. There’s an entire planet’s worth of countries, which will all have their own sets of laws. It takes waay too long to polish something, only to just give it away for free haha. Someone else is free to do that work if it is that easy. No skin off my back.

I think it’s similar to many other hand-made crafts/items. Most people will buy their clothes from stores, but there are definitely still people who make beautiful clothing from hand better than machines could.

Don’t even get me started on stuff like knitting. It already costs the creator a crap ton of money just for the materials. It takes a crap ton of time to make those, too. Despite the costs, many people just expect those knitted pieces for practically free. The people who expect that pricing are also free to go with machine-produced crafts/items instead.

It comes down to what people want, and what they’re willing to pay, imo. Some people will find value in something physically being put together by another human, and other people will find value in having more for less. Neither is “wrong” necessarily, so long as no one is literally ripped off. (With over 8 billion people, it’s bound to happen at least once. I feel bad for whoever that is.)

That being said, we’ll never be able to honestly say that the specific skills and techniques that are currenty required are the exact same. It would be like calling a photographer amazing at realism painting because their photo looks like real life. Photographers and painters both have their place, but they are not the exact same.

I think that’s also part of what’s frustrating so many artists. Coding AI is not the same as using the colour wheel, choosing materials, working fine motor control, etc. It’s not learning about shadows, contrast, focal points, etc. I can definitely understand people not wanting those aspects to be brushed off, especially since it usually takes most of a lifetime to achieve. A music generator and a violin may both make great music, but they are not the same, and they require different technical skills.

I’ll never buy AI art if I have any say in the matter. I’ll support handmade stuff first, every time.

permalink
report
parent
reply
3 points

I love that the people who push this kind of rhetoric often consider themselves left wing, it’s just so silly.

‘every word you ever utter must be considered private property and no other human may benefit from it without payments!’

I mean yes I know you’re going to say socialism is about workers getting fair pay but come on, this is just pure rent seeking. We’re a global community of people, if this comment helps train an ai that can help other people better live their lives, better access medicine and education or other services then I think that’s a wonderful thing.

And yes of course it should be open source and free to all people, that’s why these pushes to make sure only corporations can afford ai are so infuriating

permalink
report
parent
reply
7 points
*

All of the AI fear mongering is fuelled by mega corps who fear that AI in some sort will eat into their profits and they can’t make money off of it.

Image generation also had similar outcry because open source models smoked all the commercial ones.

permalink
report
parent
reply
3 points

Yeah, just wait until they see the ai design tools that allow anyone to casually describe the spare part or upgrade they want and it’ll be designed and printed at home or local fab shop.

Lot of once fairly safe monopolies are going to start looking very shaky, and then things like natural language cookery toolarms disrupting even more…

We’ve only barely started to see what the tech we have now is able to do, yes a million shitty chat bots / img gen apps are cashing in on the hype but when we start seeing some killer apps emerge it’s when people won’t be able to ignore it any longer

permalink
report
parent
reply
4 points

Too bad

If you can’t afford to pay the authors of the data required for your project to work, then that sucks for you, but doesn’t give you the right to take anything you want and violate copyright.

Making a data agnostic model and releasing the source is fine, but a released, trained model owes royalties to its training data.

permalink
report
parent
reply
4 points

True, Big Tech loves monopoly power. It’s hard to see how there can be an AI monopoly without expanding intellectual property rights.

It would mean a nice windfall profit for intellectual property owners. I doubt they worry about open source or competition but only think as far as lobbying to be given free money. It’s weird how many people here, who are probably not all rich, support giving extra money to owners, merely for owning things. That’s how it goes when you grow up on Ayn Rand, I guess.

permalink
report
parent
reply
2 points

This is the hardest thing to explain to people. Just convert it into a person with unlimited memory.

Open AI is sending said person to view every piece of human work, learns and makes connections, then make art or reports based on what you tell/ask this person.

Sci-Hub is doing the same thing but you can ask it for a specific book and they will write it down word for word for you, an exact copy.

Both morally should be free to do so. But we have laws that say the sci-hub human is illegally selling the work of others. Whereas the open ai human has to be given so many specific instructions to reproduce a human work that it’s practically like handing it a book and it handing the book back to you.

permalink
report
parent
reply
0 points

What data is public?

permalink
report
parent
reply
23 points
*

Cue the Max Headroom episode where the blanks (disconnected people) are chased by the censors because the blanks steal cable so their children can watch the educational shows and learn to read, and they are forced to use clandestine printing presses to teach them.

permalink
report
parent
reply
15 points

what’s this? an anti-corporate message that sneers at cable TV companies??? CANCEL THAT SHOW!!!

that show was so amazingly prescient: the theme of the first episode was how advertising literally kills its viewers and the news covers things up. No wonder they didn’t get renewed. ;)

permalink
report
parent
reply
3 points
-7 points
*
Deleted by creator
permalink
report
parent
reply
7 points

Because it’s easy to get these chatbots to output direct copyrighted text…

Even ones the company never paid for, not even just a subscription for a single human to view the articles they’re reproducing. Like, think of it as buying a movie, then burning a copy for anyone who asks.

Which reproducing word for word for people who didn’t pay is still a whole nother issue. So this is more like torrenting a movie, then seeding it.

permalink
report
parent
reply
-3 points

It’s not that easy, don’t believe the articles being broadcasted every day. They are heavily cherry picked.

Also, if someone is creating copyright works, it is on that person to be responsible if they release or sell it, not the tool they used. Just because the tool can be good (learns well and responds well when asked to make a clone of something) doesn’t mean it is the only thing it does or must do. It is following instructions, which were to make a thing. The one giving the instructions is the issue, and the intent of that person when they distribute is the issue.

If I draw a perfect clone of Donald Duck in the privacy of my home after looking at hundreds of Donald Duck images online, there is nothing wrong with that. If I go on Etsy and start selling them without a license, they will come after ME. Not because I drew it, but because I am selling it and violating a copyright. They won’t go after the pencil or ink manufacturer. And they won’t go after Adobe if I drew it on a computer with Photoshop.

permalink
report
parent
reply
3 points

Because humans have more rights than tools. You are free to look at copyrighted text and pictures, memorize them and describe them to others. It doesn’t mean you can use a camera to take and share pictures of it.

Acting like every right that AIs have must be identical to humans’, and if not that means the erosion of human rights, is a fundamentally flawed argument.

permalink
report
parent
reply
135 points

I pirated 90% of the texts I used to write my thesis at university, because those books would have cost me hundreds of euros that I didn’t have.

Fuck you, capitalism.

permalink
report
reply
18 points

I pirated texts for my thesis even when I had access to them through my university. A lot of journals are just too annoying to use.

permalink
report
parent
reply
9 points

unfathomably based

permalink
report
parent
reply
8 points

He has me so inspired imma go pirate a bunch of textbooks just because I can. I don’t even need them.

permalink
report
parent
reply
119 points

What really breaks the suspension of disbelief in this reality of ours is that fucking advertising is the most privacy invasive activity in the world. Seriously, even George Orwell would call bullshit on that.

permalink
report
reply
46 points
*

The amount of advertisements you have to consume weather you consent or not is wild. Billboards on roads, bus banners, marquees, you have no choice unless you don’t leave you house, and then you’re still subject to ads, just ones you sort of consented to by buying TV or Internet service.

permalink
report
parent
reply
22 points

Road billboards are always a trip when I visit the US. Not only do they have everything on them from Jesus to abortion to guns they are also incredibly distracting physically, especially at night.

permalink
report
parent
reply

Sign right on the merge of a major highway: “Car accident? Call our injury lawyer hotline.”

permalink
report
parent
reply
6 points
*

Agreed. I hate ads passionately. Ive been able to eliminate every source of ads from inside my house except websites, but I immediately back any site that won’t do simple or reading view.

Every moment of my attention taken by some stupid billboard or hearing tvs at a gas station I had to stop at is a moment I could have been thinking about something better. Or nothing, which sometimes would be nice.

permalink
report
parent
reply
2 points
*

Same. I go out of my way to learn how to get around ads. I often wonder, especially now with autism and sensory issues being more highlighted, when will start realizing ads in this overwhelming capacity are absolutely a health issue. Sensory issue are real, and an ad shouldn’t have to give people seizures before it’s considered unhealthy.

permalink
report
parent
reply
1 point

Why do people hopl their TV’s up to the internet? I’ll never get it.

permalink
report
parent
reply
11 points
*
Deleted by creator
permalink
report
parent
reply
9 points

Ads know your profile better than yourself. It’s telling you you’re a cheap bastard who won’t actually buy popcorn at the movies, making the theater run at a loss.

/S

permalink
report
parent
reply
117 points

Make the AI folks use public domain training data or nothing and maybe we’ll see the “life of the author + 75 years” bullshit get scaled back to something reasonable.

permalink
report
reply
70 points

Exactly this. I can’t believe how many comments I’ve read accusing the AI critics of holding back progress with regressive copyright ideas. No, the regressive ideas are already there, codified as law, holding the rest of us back. Holding AI companies accountable for their copyright violations will force them to either push to reform the copyright system completely, or to change their practices for the better (free software, free datasets, non-commercial uses, real non-profit orgs for the advancement of the technology). Either way we have a lot to gain by forcing them to improve the situation. Giving AI companies a free pass on the copyright system will waste what is probably the best opportunity we have ever had to improve the copyright system.

permalink
report
parent
reply

They let the Mouse die finally, maybe there is hope for change.

permalink
report
parent
reply
24 points

The Mouse isn’t dead, he is risen anew. Freed from the shackles of his creators, he is now more powerful than he could ever have hoped to be before. The mighty tremble beneath the footsteps of old Steamboat Willie. He is a living sign of a new era, one in which it is possible to strike back against his old captors.

permalink
report
parent
reply
17 points

Tbf that number was originally like 20+ years and then Disney lobbied several times to expand it

permalink
report
parent
reply
17 points

19 years. It wasn’t life of the author either. It was 19 years after creation date plus an option to renew for another 19 at the end of that period. It was sensible. That’s why we don’t do it anymore.

permalink
report
parent
reply
2 points

Wow, I really really like this take. These corporate bitches want to eat there cake and have it, too.

permalink
report
parent
reply
96 points
*

AFAIK the individual researchers who get their work pirated and put on Sci-Hub don’t seem to particularly mind.

Check out blog post critical of sci-hub and how it appeals to academic faculty:

By freeing published scholarship from the chains of toll access and copyright protection and making them freely available to all, it can feel like you are helping a Robin Hood figure rob from the rich and give to the poor.

It goes on to explain potential security issues, but it doesn’t even try to attack the concept of freely providing academic papers to begin with.

I’m starting to think the term “piracy” is morally neutral. The act can be either positive or negative depending on the context. Unfortunately, the law does not seem to flow from morality, or even the consent of the supposed victims of this piracy.

permalink
report
reply
55 points

AFAIK the individual researchers who get their work pirated and put on Sci-Hub don’t seem to particularly mind.

Why would they?

They don’t get paid when people pay for articles.

Back before everyone left twitter, the easiest way to get a paywalled study was hit up to be of the authors, they can legally give a copy to anyone, and make no money from paywalls

permalink
report
parent
reply
11 points

Also, no researcher would even exist if grad students had to pay for the papers they read and cite. A lot of people is not fortunate enough to have access to these publications through their uni. Heck, even when I had it, I’d still go to sci-hub just for the sake of convenience.

Like a lot of services nowadays, they offer a mediocre service and still charge for it.

permalink
report
parent
reply
10 points
*

That’s still the easiest way. Email them, don’t tweet them.

permalink
report
parent
reply
2 points

It still works. The journal websites always include author contact info, just e-mail them.

permalink
report
parent
reply
-4 points

legally

Not necessarily. They often do not own the copyright, so then it depends on fair use exceptions. The real owners have gone after authors, which may be the reason they don’t make their articles downloadable by default.

permalink
report
parent
reply
4 points
*

The asking makes it legal if I recall correctly.

They can’t host a site with all their articles/papers/research, but if anyone asks for a single copy, they can provide it at their discretion.

And since they don’t make any money either way, most provide it and are happy to do so.

permalink
report
parent
reply
2 points

Even if not legally, then morally… At least, in my opinion. We’re talking about the creator giving their stuff to somebody else, and this isn’t some exceptional case like sharing state secrets or something.

permalink
report
parent
reply
33 points

Academics don’t care because they don’t get paid for them anyway. A lot of the time you have to pay to have your paper published. Then companies like Elsevier just sit back and make money.

permalink
report
parent
reply
15 points

I follow a few researchers with interesting youtube channels, and they often mention that if you ask them or their colleagues for a publication of theirs, chances are they’ll be glad to send it to you.

A lot of them love sharing their work, and don’t care at all for science journal paywalls.

permalink
report
parent
reply
3 points

Other than be happy for that attention and being curious of what extra things you can find in their field, they get quoted and that pushes their reputation a little higher. Locking up works heavily limits that, and the only reason behind that is a promise of a basic quality control when accepting works - and it’s not ideal, there are many shady publications. Other than that it’s cash from simple consumers, subscriptions money from institutes for works these company took a hold of and maybe don’t have physical editions anymore just because, return to fig. 1, they depend on being published and quoted.

permalink
report
parent
reply
2 points

Sure, that’s a motivation too, but they were also talking about random people who’d find a reference and were curious about their work, not just other researchers who may quote them. It’s not all about h-index.

When a guy literally makes, among other things, regular paleontology news reports and whole videos of his own university course material during summer breaks, and puts all that to youtube it’s safe to assume he just likes popularizing his subject.

permalink
report
parent
reply
11 points

Don’t mind? Hell, we want people to read that shit. We don’t profit at all if it’s paywalled, it hurts us and hurts science in general. This is 100% the wishes of scientific for profit journals.

permalink
report
parent
reply
10 points

I’m starting to think the term “piracy” is morally neutral. The act can be either positive or negative depending on the context. Unfortunately, the law does not seem to flow from morality, or even the consent of the supposed victims of this piracy.

The morals of piracy also depend on the economic system you’re under. If you have UBI, the “support artists” argument is far less strong, because we’re all paying taxes to support the UBI system that enables people to become skilled artists without worrying about starving or homelessness - as has already happened to a lesser degree before our welfare systems were kneecapped over the last 4 decades.

But that’s just the art angle, a tonne of the early-stage (i.e. risky and expensive) scientific advancements had significant sums of government funding poured into them, yet corporations keep the rights to the inventions they derive from our government funded research. We’re paying for a lot of this stuff, so maybe we should stop pretending that someone else ‘owns’ these abstract idea implementations and come up with a better system.

permalink
report
parent
reply

When you publish something in an academic journal, the journal owns the work. The journal also sells that work and it’s how it makes its money.

permalink
report
parent
reply
10 points

Yes it is, and that’s the problem. I work my butt off to identify mechanisms to reduce musculoskeletal injury risk, and then to maintain my employment, I have to hand the rights to that work to a private organization that profits over it. To make matters worse, I then do the work to ensure the quality of other publications for the journal through the peer review process and am not compensated for it.

permalink
report
parent
reply
4 points

the journal owns the work.

Fortunately, open access has made some inroads. It is not universally true anymore. The situation is still pretty bad, though.

permalink
report
parent
reply

I know for the law journal that I used to edit the journal owned the final form of the edited and stylised work and granted the author a license to freely use it in perpetuity with attribution as to the original publication.

So the author was free to share free copies as long as it was in the original form with the journal’s name and logo on the first page, or manuscript forms as long as the original publication info was cited. My journal sold electronic and print fornats and had some licensing deals with legal research companies. But we also hosted free electronic copies for anyone that wanted to download an article

For my journal, the significant costs were paid by a foundation and the university that it was a part of. The sales were just to buy like coffee for the office and stuff help offset costs. I know especially in medicine and physical sciences there’s a lot more money involved in this stuff.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 16K

    Monthly active users

  • 12K

    Posts

  • 556K

    Comments