Paywall removed: https://archive.is/MqHc4

You are viewing a single thread.
View all comments
14 points

China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.

The US has a lead now, but I don’t think they can maintain it without giving up on ethical training. Then again it may not matter if the US models are ethical if everyone will eventually just uses the superior unethically trained chinese models instead.

permalink
report
reply
21 points

China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.

lolwat

did corporate provide you with these talking points?

permalink
report
parent
reply
5 points

I mean, they are right. Asside the question of whether we can even make meaningfully better models by just using LLMs and more data and what the future of AI will look like, and whether it’s ethical or not to steal the data, it is quite possible that OpenAI and the like will get into legal trouble because of the methods they use for acquiring data, but Chinese companies won’t have to worry about that. If more data = better models then China has an obvious advantage.

permalink
report
parent
reply
5 points

I doubt any of these US government and oligarch backed companies are gonna get any trouble. They essentially robbed the commons and got away with it. But sure Sam Altman has to pay spezz some money for my shitposts… the horror, what a hurdle!

Quickly give them more taxpayer money so they can compete with china!

permalink
report
parent
reply
7 points

OpenAI and the like aren’t going to get into trouble anytime soon. They already provide their latest tech to US gov and military. OpenAI is like a goose that laid a golden egg, they need to fuck up really really badly to face any consequences.

permalink
report
parent
reply
10 points

The US companies already scraped the data while they could. If anything, data scraping is far far more difficult now for everyone due to technical reasons.

Most of the new models are trained on synthetic data or higher quality of data or with RLHF. The reason deepseek is able to perform is likely because LLMs are very very new things, there are many low hanging fruits. Its no longer just about the data we already hit that limit for quite some time.

permalink
report
parent
reply
1 point

Honestly, even from the beginning it’s pretty obvious scraped data is going to have a ton of issues. There’s too much nonsense out there, both from misinformation and people just not able to communicate.

That’s before you get into the ethical aspects of stealing other people’s content and the way these things are being misused.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


Community stats

  • 17K

    Monthly active users

  • 14K

    Posts

  • 597K

    Comments