148 points

The biggest problem with AI is that they’re illegally harvesting everything they can possibly get their hands on to feed it, they’re forcing it into places where people have explicitly said they don’t want it, and they’re sucking up massive amounts of energy AMD water to create it, undoing everyone else’s progress in reducing energy use, and raising prices for everyone else at the same time.

Oh, and it also hallucinates.

permalink
report
reply
29 points

Eh I’m fine with the illegal harvesting of data. It forces the courts to revisit the question of what copyright really is and hopefully erodes the stranglehold that copyright has on modern society.

Let the companies fight each other over whether it’s okay to pirate every video on YouTube. I’m waiting.

permalink
report
parent
reply
72 points

So far, the result seems to be “it’s okay when they do it”

permalink
report
parent
reply
0 points

Yeah… Nothing to see here, people, go home, work harder, exercise, and don’t forget to eat your vegetables. Of course, family first and god bless you.

permalink
report
parent
reply
33 points
*

I would agree with you if the same companies challenging copyright (protecting the intellectual and creative work of “normies”) are not also aggressively welding copyright against the same people they are stealing from.

With the amount of coprorate power tightly integrated with the governmental bodies in the US (and now with Doge dismantling oversight) I fear that whatever comes out of this is humans own nothing, corporations own everything. Death of free independent thought and creativity.

Everything you do, say and create is instantly marketable, sellable by the major corporations and you get nothing in return.

The world needs something a lot more drastic then a copyright reform at this point.

permalink
report
parent
reply
1 point

It’s seldom the same companies, though; there are two camps fighting each other, like Gozilla vs Mothra.

permalink
report
parent
reply
12 points

AI scrapers illegally harvesting data are destroying smaller and open source projects. Copyright law is not the only victim

https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/

permalink
report
parent
reply
0 points

That article is overblown. People need to configure their websites to be more robust against traffic spikes, news at 11.

Disrespecting robots.txt is bad netiquette, but honestly this sort of gentleman’s agreement is always prone to cheating. At the end of the day, when you put something on the net for people to access, you have to assume anyone (or anything) can try to access it.

permalink
report
parent
reply
-1 points

In this case they just need to publish the code as a torrent. You wouldn’t setup a crawler if there was all the data in a torrent swarm.

permalink
report
parent
reply
13 points

Oh, and it also hallucinates.

Oh, and people believe the hallucinations.

permalink
report
parent
reply
11 points
*

They’re not illegally harvesting anything. Copyright law is all about distribution. As much as everyone loves to think that when you copy something without permission you’re breaking the law the truth is that you’re not. It’s only when you distribute said copy that you’re breaking the law (aka violating copyright).

All those old school notices (e.g. “FBI Warning”) are 100% bullshit. Same for the warning the NFL spits out before games. You absolutely can record it! You just can’t share it (or show it to more than a handful of people but that’s a different set of laws regarding broadcasting).

I download AI (image generation) models all the time. They range in size from 2GB to 12GB. You cannot fit the petabytes of data they used to train the model into that space. No compression algorithm is that good.

The same is true for LLM, RVC (audio models) and similar models/checkpoints. I mean, think about it: If AI is illegally distributing millions of copyrighted works to end users they’d have to be including it all in those files somehow.

Instead of thinking of an AI model like a collection of copyrighted works think of it more like a rough sketch of a mashup of copyrighted works. Like if you asked a person to make a Godzilla-themed My Little Pony and what you got was that person’s interpretation of what Godzilla combined with MLP would look like. Every artist would draw it differently. Every author would describe it differently. Every voice actor would voice it differently.

Those differences are the equivalent of the random seed provided to AI models. If you throw something at a random number generator enough times you could–in theory–get the works of Shakespeare. Especially if you ask it to write something just like Shakespeare. However, that doesn’t meant the AI model literally copied his works. It’s just doing it’s best guess (it’s literally guessing! That’s how work!).

permalink
report
parent
reply
10 points

The problem with being like… super pedantic about definitions, is that you often miss the forest for the trees.

Illegal or not, seems pretty obvious to me that people saying illegal in this thread and others probably mean “unethically”… which is pretty clearly true.

permalink
report
parent
reply
7 points
*

I wasn’t being pedantic. It’s a very fucking important distinction.

If you want to say “unethical” you say that. Law is an orthogonal concept to ethics. As anyone who’s studied the history of racism and sexism would understand.

Furthermore, it’s not clear that what Meta did actually was unethical. Ethics is all about how human behavior impacts other humans (or other animals). If a behavior has a direct negative impact that’s considered unethical. If it has no impact or positive impact that’s an ethical behavior.

What impact did OpenAI, Meta, et al have when they downloaded these copyrighted works? They were not read by humans–they were read by machines.

From an ethics standpoint that behavior is moot. It’s the ethical equivalent of trying to measure the environmental impact of a bit traveling across a wire. You can go deep down the rabbit hole and calculate the damage caused by mining copper and laying cables but that’s largely a waste of time because it completely loses the narrative that copying a billion books/images/whatever into a machine somehow negatively impacts humans.

It is not the copying of this information that matters. It’s the impact of the technologies they’re creating with it!

That’s why I think it’s very important to point out that copyright violation isn’t the problem in these threads. It’s a path that leads nowhere.

permalink
report
parent
reply
7 points

The issue I see is that they are using the copyrighted data, then making money off that data.

permalink
report
parent
reply
0 points

…in the same way that someone who’s read a lot of books can make money by writing their own.

permalink
report
parent
reply
3 points

This is an interesting argument that I’ve never heard before. Isn’t the question more about whether ai generated art counts as a “derivative work” though? I don’t use AI at all but from what I’ve read, they can generate work that includes watermarks from the source data, would that not strongly imply that these are derivative works?

permalink
report
parent
reply
0 points

If you studied loads of classic art then started making your own would that be a derivative work? Because that’s how AI works.

The presence of watermarks in output images is just a side effect of the prompt and its similarity to training data. If you ask for a picture of an Olympic swimmer wearing a purple bathing suit and it turns out that only a hundred or so images in the training match that sort of image–and most of them included a watermark–you can end up with a kinda-sorta similar watermark in the output.

It is absolutely 100% evidence that they used watermarked images in their training. Is that a problem, though? I wouldn’t think so since they’re not distributing those exact images. Just images that are “kinda sorta” similar.

If you try to get an AI to output an image that matches someone else’s image nearly exactly… is that the fault of the AI or the end user, specifically asking for something that would violate another’s copyright (with a derivative work)?

permalink
report
parent
reply
8 points

I see the “AI is using up massive amounts of water” being proclaimed everywhere lately, however I do not understand it, do you have a source?

My understanding is this probably stems from people misunderstanding data center cooling systems. Most of these systems are closed loop so everything will be reused. It makes no sense to “burn off” water for cooling.

permalink
report
parent
reply
11 points
*

data centers are mainly air-cooled, and two innovations contribute to the water waste.

the first one was “free cooling”, where instead of using a heat exchanger loop you just blow (filtered) outside air directly over the servers and out again, meaning you don’t have to “get rid” of waste heat, you just blow it right out.

the second one was increasing the moisture content of the air on the way in with what is basically giant carburettors in the air stream. the wetter the air, the more heat it can take from the servers.

so basically we now have data centers designed like cloud machines.

Edit: Also, apparently the water they use becomes contaminated and they use mainly potable water. here’s a paper on it

permalink
report
parent
reply
2 points
*

Also the energy for those datacenters has to come from somewhere and non-renewable options (gas, oil, nuclear generation) also use a lot of water as part of the generation process itself (they all relly using the fuel to generate the steam to power turbines which generate the electricity) and for cooling.

permalink
report
parent
reply
7 points

Oh, and it also hallucinates.

This is arguably a feature depending on how you use it. I’m absolutely not an AI acolyte. It’s highly problematic in every step. Resource usage. Training using illegally obtained information. This wouldn’t necessarily be an issue if people who aren’t tech broligarchs weren’t routinely getting their lives destroyed for this, and if the people creating the material being used for training also weren’t being fucked…just capitalism things I guess. Attempts by capitalists to cut workers out of the cost/profit equation.

If you’re using AI to make music, images or video… you’re depending on those hallucinations.
I run a Stable Diffusion model on my laptop. It’s kinda neat. I don’t make things for a profit, and now that I’ve played with it a bit I’ll likely delete it soon. I think there’s room for people to locally host their own models, preferably trained with legally acquired data, to be used as a tool to assist with the creative process. The current monetisation model for AI is fuckin criminal…

permalink
report
parent
reply
-6 points

Tell that to the man who was accused by Gen AI of having murdered his children.

permalink
report
parent
reply
9 points

Ok? If you read what I said, you’ll see that I’m not talking about using ChatGPT as an information source. I strongly believe that using LLMs as a search tool is incredibly stupid…for exactly reasons like it being so very confident when relaying inaccurate or completely fictional information.
What I was trying to say, and I get that I may not have communicated that very well, was that Generative Machine Learning Algorithms might find a niche as creative process assistant tools. Not as a way to search for publicly available information on your neighbour or boss or partner. Not as a way to search for case law while researching the defence of your client in a lawsuit. And it should never be relied on to give accurate information about what colour the sky is, or the best ways to make a custard using gasoline.

Does that clarify things a bit? Or do you want to carry on using an LLM in a way that has been shown to be unreliable, at best, as some sort of gotcha…when I wasn’t talking about that as a viable use case?

permalink
report
parent
reply
2 points
*

It varies massivelly depending on the ML.

For example things like voice generation or object recognition can absolutelly be done with entirelly legit training datasets - literally pay a bunch of people to read some texts and you can train a voice generation engine with it and the work in object recognition is mainly tagging what’s in the images on top of a ton of easilly made images of things - a researcher can literally go around taking photos to make their dataset.

Image generation, on the other hand, not so much - you can only go so far with just plain photos a researcher can just go around and take on the street and they tend to relly a lot on artistic work of people who have never authorized the use of their work to train them, and LLMs clearly cannot be do without scrapping billions of pieces of actual work from billions of people.

Of course, what we tend to talk about here when we say “AI” is LLMs, which are IMHO the worst of the bunch.

permalink
report
parent
reply
4 points

We spend energy on the most useless shit why are people suddenly using it as an argument against AI? You ever saw someone complaining about pixar wasting energies to render their movies? Or 3D studios to render TV ads?

permalink
report
parent
reply
2 points

Well, the harvesting isn’t illegal (yet), and I think it probably shouldn’t be.

It’s scraping, and it’s hard to make that part illegal without collateral damage.

But that doesn’t mean we should do nothing about these AI fuckers.

In the words of Cory Doctorow:

Web-scraping is good, actually.

Scraping against the wishes of the scraped is good, actually.

Scraping when the scrapee suffers as a result of your scraping is good, actually.

Scraping to train machine-learning models is good, actually.

Scraping to violate the public’s privacy is bad, actually.

Scraping to alienate creative workers’ labor is bad, actually.

We absolutely can have the benefits of scraping without letting AI companies destroy our jobs and our privacy. We just have to stop letting them define the debate.

permalink
report
parent
reply
-1 points

And also it’s using machines to catch up to living creation and evolution, badly.

A but similar to how Soviet system was trying to catch up to in no way virtuous, but living and vibrant Western societies.

That’s expensive, and that’s bad, and that’s inefficient. The only subjective advantage is that power is all it requires.

permalink
report
parent
reply
-5 points

I don’t care much about them harvesting all that data, what I do care about is that despite essentially feeding all human knowledge into LLMs they are still basically useless.

permalink
report
parent
reply
33 points
*

That it’s controlled by a few is only a problem if you use it… my issue with it starts before that.

My biggest gripe with AI is the same problem I have with anything crypto: It’s out of control power consumption relative to the problem it solves or purpose it serves. And by extension the fact nobody with any kind of real political power is addressing this.

Here we are using recycled bags, banning straws, putting explosive refrigerant in fridges and using led lights in everything, all in the name of the environment, while at the same time in some datacenter they are burning kwh’s by the bucket loads generating pictures of cats in space suits.

permalink
report
reply
3 points

Here we are using recycled bags, banning straws, putting explosive refrigerant in fridges and using led lights in everything, all in the name of the environment, while at the same time in some datacenter they are burning kwh’s by the bucket loads generating pictures of cats in space suits.

That’s, #1, fashion and not about environment, #2, fashion promoted because it’s cheaper for the industry.

And yes, power saved somewhere will just be spent elsewhere. Cheaper. Cause that means reduced demand for power (or grown not as fast as otherwise).

permalink
report
parent
reply
1 point

Here we are using recycled bags, banning straws, putting explosive refrigerant in fridges and using led lights in everything

lol, sucker. none of that does shit and industry was already destroying the planet just fine before ai came along.

permalink
report
parent
reply
5 points

Dare I assume you are aware we have “industry” because we consume?

permalink
report
parent
reply
-4 points

yes. we are cancer. i live on as little as possible but i don’t delude myself into thinking my actions have any effect on the whole.

i spent nearly 20 years not using paper towels until i realized how pointless it was. now i throw my trash out the window. we’re all fucked. if we want to change things, there’s only one tool that will fix it. until people realize that, i really don’t fucking care any more.

permalink
report
parent
reply
-1 points

My biggest gripe with AI is the same problem I have with anything crypto crypto: It’s out of control power consumption relative to the problem it solves or purpose it serves.

Don’t thrown all crypto under the bus. Only bitcoin and other proof of work protocols are power hungry. 2nd and 3rd generation crypto use mostly proof of stake and ZKrollups for security. Much more energy efficient.

permalink
report
parent
reply
5 points

Sure, but despite all the crypto bros assurances to the contrary, the only real-world applications for it is buying drugs, paying ransoms and getting scammed. Which means that any non-zero amount of energy is too much energy.

permalink
report
parent
reply
4 points
*

I’m aware of this, but it still mostly just something for people speculate on. Something people buy, sit on, and then hopefully sell with a profit.

Bitcoin was supposed to be a decentralized money alternative, but the amount of people actually, legitimately, buying things with crypto are highly negligible. And honestly even if it did serve it’s actual purpose, the cumulative power consumption would still be a point of debate.

permalink
report
parent
reply
1 point
*

Yes, most people buy, sit on, and then hopefully sell with a profit.

However, there are a large number of devs building useful things (supply chain, money transfer, digital identity). Most as good as, but not yet better than incumbent solutions.

My main challenge is the energy misconception. The cumulative power of the ethereum network runs on the energy equivalent of a single wind turbine.

permalink
report
parent
reply
1 point

And honestly even if it did serve it’s actual purpose, the cumulative power consumption would still be a point of debate.

Yeah, but at that point you’d have to consider it against how much power the traditional banking system uses.

permalink
report
parent
reply
11 points

AI business is owned by a tiny group of technobros, who have no concern for what they have to do to get the results they want (“fuck the copyright, especially fuck the natural resources”) who want to be personally seen as the saviours of humanity (despite not being the ones who invented and implemented the actual tech) and, like all big wig biz boys, they want all the money.

I don’t have problems with AI tech in the principle, but I hate the current business direction and what the AI business encourages people to do and use the tech for.

permalink
report
reply
1 point

Well I’m on board for fuck intellectual property. If openai doesn’t publish the weights then all their datacenter get visited by the killdozer

permalink
report
parent
reply
17 points

Two intrinsic problems with the current implementations of AI is that they are insanely resource-intensive and require huge training sets. Neither of those is directly a problem of ownership or control, though both favor larger players with more money.

permalink
report
reply
10 points
*

And a third intrinsic problem is that the current models with infinite training data have been proven to never approach human language capability, from papers written by OpenAI in 2020 and Deepmind in 2022, and also a paper by Stanford which proposes AI simply have no emergent behavior and only convergent behavior.

So yeah. Lots of problems.

permalink
report
parent
reply
2 points
*

While I completely agree with you, that is the one thing that could change with just one thing going right for one of all the groups that work on just that problem.

It’s what happens after that that’s really scary, probably. Perhaps we all go into some utopian AI driven future, but I highly doubt that’s even possible.

permalink
report
parent
reply
1 point

If gigantic amounts of capital weren’t available, then the focus would be on improving the models so they don’t need GPU farms running off nuclear reactors plus the sum total of all posts on the Internet ever.

permalink
report
parent
reply
11 points

For some reason the megacorps have got LLMs on the brain, and they’re the worst “AI” I’ve seen. There are other types of AI that are actually impressive, but the “writes a thing that looks like it might be the answer” machine is way less useful than they think it is.

permalink
report
reply
3 points
*

most LLM’s for chat, pictures and clips are magical and amazing. For about 4 - 8 hours of fiddling then they lose all entertainment value.

As for practical use, the things can’t do math so they’re useless at work. I write better Emails on my own so I can’t imagine being so lazy and socially inept that I need help writing an email asking for tech support or outlining an audit report. Sometimes the web summaries save me from clicking a result, but I usually do anyway because the things are so prone to very convincing halucinations, so yeah, utterly useless in their current state.

I usually get some angsty reply when I say this by some techbro-AI-cultist-singularity-head who starts whinging how it’s reshaped their entire lives, but in some deep niche way that is completely irrelevant to the average working adult.

I have also talked to way too many delusional maniacs who are literally planning for the day an Artificial Super Intelligence is created and the whole world becomes like Star Trek and they personally will become wealthy and have all their needs met. They think this is going to happen within the next 5 years.

permalink
report
parent
reply
4 points

The delusional maniacs are going to be surprised when they ask the Super AI “how do we solve global warming?” and the answer is “build lots of solar, wind, and storage, and change infrastructure in cities to support walking, biking, and public transportation”.

permalink
report
parent
reply
2 points

Which is the answer they will get right before sending the AI back for “repairs.”

As we saw with Grock already several times.

They absolutely adore AI, it makes them feel in-touch with the world and able to feel validated, since all it is is a validation machine. They don’t care if it’s right or accurate or even remotely neutral, they want a biased fantasy crafting system that paints terrible pictures of Donald Trump all ripped and oiled riding on a tank and they want the AI to say “Look what you made! What a good boy! You did SO good!”

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


Community stats

  • 22K

    Monthly active users

  • 15K

    Posts

  • 626K

    Comments