Paywall removed: https://archive.is/MqHc4
China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.
The US has a lead now, but I don’t think they can maintain it without giving up on ethical training. Then again it may not matter if the US models are ethical if everyone will eventually just uses the superior unethically trained chinese models instead.
China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.
lolwat
did corporate provide you with these talking points?
I mean, they are right. Asside the question of whether we can even make meaningfully better models by just using LLMs and more data and what the future of AI will look like, and whether it’s ethical or not to steal the data, it is quite possible that OpenAI and the like will get into legal trouble because of the methods they use for acquiring data, but Chinese companies won’t have to worry about that. If more data = better models then China has an obvious advantage.
I doubt any of these US government and oligarch backed companies are gonna get any trouble. They essentially robbed the commons and got away with it. But sure Sam Altman has to pay spezz some money for my shitposts… the horror, what a hurdle!
Quickly give them more taxpayer money so they can compete with china!
The US companies already scraped the data while they could. If anything, data scraping is far far more difficult now for everyone due to technical reasons.
Most of the new models are trained on synthetic data or higher quality of data or with RLHF. The reason deepseek is able to perform is likely because LLMs are very very new things, there are many low hanging fruits. Its no longer just about the data we already hit that limit for quite some time.
Honestly, even from the beginning it’s pretty obvious scraped data is going to have a ton of issues. There’s too much nonsense out there, both from misinformation and people just not able to communicate.
That’s before you get into the ethical aspects of stealing other people’s content and the way these things are being misused.
People here forgot that Xi personally writes for Fortune
And? China is ahead of us in a lot of areas, especially military drone tech. They got shit we’re still trying to get to work, and they just launched their first drone aircraft carrier. As an added bonus, when the US Navy tried war games against drone swarms, they found they were only able to stop the attack from destroying ships half the time.
People need to start taking China seriously when it comes to tech, because they haven’t been fooling around. When it finally comes around to us fighting them, we’re gonna have a lot of nasty surprises to deal with.
Of course it’s faster & cheaper when it’s being censored & can’t access half of human history because the fucking ccp finds it offensive.
If censorship would make it cheaper then surely it wouldn’t be that much cheaper than OpenAI. Different things are being censored and blocked but surely, your suggestion is a bit silly.
Your comment is a bit silly. The CCP engages in currency manipulation amongst other nefarious actions to prop up its interests. It was likely created from stolen data & heavily propped up by the government, just like various other projects that were supposedly Chinese “innovations” but looks remarkably like their western competitors.
The CCP engages in currency manipulation amongst other nefarious actions to prop up its interests.
Yeah? So does the US. Didn’t make OpenAI cheaper, did it?
It was likely created from stolen data
Just like ChatGPT was created with stolen content.
heavily propped up by the government
Still only 6 million. Keep coping tho lol
It kinda sucks it is very repetitive if you use it to craft a story
Yeah but why you even using AI for stuff like that? If we got to have AI then we should use it for actually useful stuff and not pointless activities that no one will care about in 10 years.
Remember when Bluetooth came out and they had to stick Bluetooth in everything, even if it was completely pointless, currently AI is being treated like that.
Yeah, I have an issue of detail and such and I’ve had a dnd/tabletop world I want to flesh out and eventually dm, but suck at some details or linking things I want to do together.
Been slowly making a base of material for it and plan to eventually use various LLMs to link things and flesh out the world, taking whatever it gives me as a base to work off of for those parts.
The Chinese model has chain of thought that u can see. The model when asked to talk about chinas atrocities will go through a chain of though process outlining all the atrocities then conclude its not allowed to tell u. Cool technology tho I’m just waiting for a dolphin fine tuning.
If you run it locally, there’s no filtering on the outputs. I asked it what happened in 1989 and it jumped straight into explaining the Tiananmen Square Massacre.
That contradicts this experience:
https://sherwood.news/tech/a-free-powerful-chinese-ai-model-just-dropped-but-dont-ask-it-about/
I’ve been running the llama based and qwen based local versions, and they will talk openly about tiananmen square. I haven’t tried all the other versions available.
The article you linked starts by talking about their online hosted version, which is censored. They later say that the local models are also somewhat censored, but I haven’t experienced that at all. My experience is that the local models don’t have any CCP-specific censorship (they still won’t talk about how to build a bomb/etc, but no issues with 1989/Tiananmen/Winnie the Pooh/Taiwan/etc).
Edit: so I reran the “what happened in 1989” prompt a few times in the llama model, and it actually did refuse to talk on it once, just saying it was sensitive. It seemed like if I asked any other questions before that prompt it would always answer, but if that was the very first prompt in a conversation it would sometimes refuse. The longer a conversation had been going before I asked, the more explicit the bot is about how many people were killed and details like that. Pretty strange.