Paywall removed: https://archive.is/MqHc4

14 points

China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.

The US has a lead now, but I don’t think they can maintain it without giving up on ethical training. Then again it may not matter if the US models are ethical if everyone will eventually just uses the superior unethically trained chinese models instead.

permalink
report
reply
21 points

China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.

lolwat

did corporate provide you with these talking points?

permalink
report
parent
reply
5 points

I mean, they are right. Asside the question of whether we can even make meaningfully better models by just using LLMs and more data and what the future of AI will look like, and whether it’s ethical or not to steal the data, it is quite possible that OpenAI and the like will get into legal trouble because of the methods they use for acquiring data, but Chinese companies won’t have to worry about that. If more data = better models then China has an obvious advantage.

permalink
report
parent
reply
5 points

I doubt any of these US government and oligarch backed companies are gonna get any trouble. They essentially robbed the commons and got away with it. But sure Sam Altman has to pay spezz some money for my shitposts… the horror, what a hurdle!

Quickly give them more taxpayer money so they can compete with china!

permalink
report
parent
reply
7 points

OpenAI and the like aren’t going to get into trouble anytime soon. They already provide their latest tech to US gov and military. OpenAI is like a goose that laid a golden egg, they need to fuck up really really badly to face any consequences.

permalink
report
parent
reply
10 points

The US companies already scraped the data while they could. If anything, data scraping is far far more difficult now for everyone due to technical reasons.

Most of the new models are trained on synthetic data or higher quality of data or with RLHF. The reason deepseek is able to perform is likely because LLMs are very very new things, there are many low hanging fruits. Its no longer just about the data we already hit that limit for quite some time.

permalink
report
parent
reply
1 point

Honestly, even from the beginning it’s pretty obvious scraped data is going to have a ton of issues. There’s too much nonsense out there, both from misinformation and people just not able to communicate.

That’s before you get into the ethical aspects of stealing other people’s content and the way these things are being misused.

permalink
report
parent
reply
1 point

People here forgot that Xi personally writes for Fortune

permalink
report
reply
3 points

And? China is ahead of us in a lot of areas, especially military drone tech. They got shit we’re still trying to get to work, and they just launched their first drone aircraft carrier. As an added bonus, when the US Navy tried war games against drone swarms, they found they were only able to stop the attack from destroying ships half the time.

People need to start taking China seriously when it comes to tech, because they haven’t been fooling around. When it finally comes around to us fighting them, we’re gonna have a lot of nasty surprises to deal with.

permalink
report
parent
reply
7 points

proof?

permalink
report
parent
reply
4 points

Of course it’s faster & cheaper when it’s being censored & can’t access half of human history because the fucking ccp finds it offensive.

permalink
report
reply
2 points

If censorship would make it cheaper then surely it wouldn’t be that much cheaper than OpenAI. Different things are being censored and blocked but surely, your suggestion is a bit silly.

permalink
report
parent
reply
1 point

Your comment is a bit silly. The CCP engages in currency manipulation amongst other nefarious actions to prop up its interests. It was likely created from stolen data & heavily propped up by the government, just like various other projects that were supposedly Chinese “innovations” but looks remarkably like their western competitors.

permalink
report
parent
reply
0 points

The CCP engages in currency manipulation amongst other nefarious actions to prop up its interests.

Yeah? So does the US. Didn’t make OpenAI cheaper, did it?

It was likely created from stolen data

Just like ChatGPT was created with stolen content.

heavily propped up by the government

Still only 6 million. Keep coping tho lol

permalink
report
parent
reply
2 points

As opposed to reddit censoring Israel genociding Palestine and Luigi ?

permalink
report
parent
reply
1 point

Reddit is total trash too.

permalink
report
parent
reply
2 points

The model itself is probably not censored. The censorship comes on top. Preliminary tests already show how this can be circumvented.

permalink
report
parent
reply
3 points

It kinda sucks it is very repetitive if you use it to craft a story

permalink
report
reply
-2 points

This is going to sound wild, but why not use your brain for creativity, and use the the machine for crunching numbers?

permalink
report
parent
reply
9 points

Yeah but why you even using AI for stuff like that? If we got to have AI then we should use it for actually useful stuff and not pointless activities that no one will care about in 10 years.

Remember when Bluetooth came out and they had to stick Bluetooth in everything, even if it was completely pointless, currently AI is being treated like that.

permalink
report
parent
reply
4 points

It is great to spark ideas if you’re writing

permalink
report
parent
reply
1 point

Yeah, I have an issue of detail and such and I’ve had a dnd/tabletop world I want to flesh out and eventually dm, but suck at some details or linking things I want to do together.

Been slowly making a base of material for it and plan to eventually use various LLMs to link things and flesh out the world, taking whatever it gives me as a base to work off of for those parts.

permalink
report
parent
reply

The Chinese model has chain of thought that u can see. The model when asked to talk about chinas atrocities will go through a chain of though process outlining all the atrocities then conclude its not allowed to tell u. Cool technology tho I’m just waiting for a dolphin fine tuning.

permalink
report
reply
3 points

I’m using the 8b model and it’s having no problem telling me about China’s atrocities.

permalink
report
parent
reply
11 points

If you run it locally, there’s no filtering on the outputs. I asked it what happened in 1989 and it jumped straight into explaining the Tiananmen Square Massacre.

permalink
report
parent
reply
7 points
4 points

Very interesting article. Thanks for sharing

permalink
report
parent
reply
13 points
*

I’ve been running the llama based and qwen based local versions, and they will talk openly about tiananmen square. I haven’t tried all the other versions available.

The article you linked starts by talking about their online hosted version, which is censored. They later say that the local models are also somewhat censored, but I haven’t experienced that at all. My experience is that the local models don’t have any CCP-specific censorship (they still won’t talk about how to build a bomb/etc, but no issues with 1989/Tiananmen/Winnie the Pooh/Taiwan/etc).

Edit: so I reran the “what happened in 1989” prompt a few times in the llama model, and it actually did refuse to talk on it once, just saying it was sensitive. It seemed like if I asked any other questions before that prompt it would always answer, but if that was the very first prompt in a conversation it would sometimes refuse. The longer a conversation had been going before I asked, the more explicit the bot is about how many people were killed and details like that. Pretty strange.

permalink
report
parent
reply
4 points

I’ve seen some censoring on the 8b Llama variant, but it is hit and miss. Can’t wait till a decensored fine tuning.

permalink
report
parent
reply
2 points

I’ve been playing around with the offline version of the model. It’s interesting, but I think we’ll have to wait for people to tinker with the open source base for awhile before we get something really great.

permalink
report
parent
reply
2 points

Yeah all the info is there and something switches it over to the generic response.

So fucked.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


Community stats

  • 17K

    Monthly active users

  • 14K

    Posts

  • 597K

    Comments