Lemdro.id

Local All Communities Log in Sign up

Local All Communities

1.3K

OpenAI hard work got stolen...(ponder.cat)

posted 10 days ago

by

in

microblogmemes@lemmy.world

Sort:

Hot Top Controversial New Old

[ +- ]

Grimy@lemmy.world

124 points

9 days ago

*

The courts ruled you can’t copyright the direct outputs of AI. It’s literally one of the few things they have decided. It’s common practice to use it to create synthetic data for new models. Laughable that OpenAI even brings it up.

In the space of a year or two, we went from altman saying he was going to capture half the world’s wealth straight to open source violently destroying his moat.

report

reply

[ +- ]

brucethemoose@lemmy.world

58 points

9 days ago

*

Everyone in the open LLM community knew this was coming.

We didn’t know the exact timing, but OpenAI is completely stagnant, and it was coming this year or the next.

I don’t think the world still understands how screwed OpenAI is. It isn’t just that their moat is gone, it’s that, even with all that money, their models (for the size\investment) are objectively bad.

report

reply

[ +- ]

Dkarma@lemmy.world

28 points

9 days ago

Yeah it went from hey the monopoly justifies the cost. To Oh shit they did it for how much? Real fast.

I suspect china is fudging the training timeline tho…

report

reply

[ +- ]

brucethemoose@lemmy.world

12 points

9 days ago

*

I had suspicious before, but I knew they were screwed when Qwen 2.5 came out. 32Bs and 72Bs nipping at their heels… O3 was a joke in comparison.

And they probably aren’t fudging anything. Base Deepseek isn’t like crazy or anything, and the way they finetuned it to R1 is public. Researchers are trying to replicate it now.

report

reply

[ +- ]

UnderpantsWeevil@lemmy.world

10 points

9 days ago

I suspect china is fudging the training timeline tho…

I’m more prone to believe OpenAI is just a clunky POS. DeepSeek released a model that’s operating on theories kicking around the LLM community for years. Now Alibaba is claiming they’ve got a better model, too.

Altman insisting he needed $1T in new physical infrastructure to get to the next iteration of his product should have been a red flag for everyone.

They’re trying to brute force a solution to a problem that more elegate coding accomplishes better.

report

reply

[ +- ]

brucethemoose@lemmy.world

4 points

9 days ago

*

Also, the thing the Chinese govt did probably do is give Deepseek training data.

For all the memes about the NSA, the US govt isn’t really in that position, as whatever the US govt has pales in comparison to Microsoft or Google.

report

reply

[ +- ]

jj4211@lemmy.world

5 points

9 days ago

And even as OpenAI struggles, they set fire to a lot of money in the process: https://www.notebookcheck.net/GPT-5-development-hits-major-setbacks-as-OpenAI-runs-out-of-training-data.937049.0.html

report

reply

[ +- ]

74 points

10 days ago

*

I am not crazy! I know they copied our data! I knew it was OpenAI material. One after Magna Carta. As if I could ever make such a mistake. Never. Never! I just – I just couldn’t prove it. They – they covered their tracks, they got that idiot at the copy shop to lie for them. You think this is something? You think this is bad? This? This chicanery? They’ve done worse. Are you telling me that a model just happens to form like that? No! They orchestrated it! Deepseek!

report

reply

[ +- ]

goldteeth@lemmy.dbzer0.com

30 points

9 days ago

He hallucinated through a sunroof! And he gets to be a Large Language Model? What a sick joke!

report

reply

[ +- ]

turmoil@feddit.org

25 points

9 days ago

report

reply

[ +- ]

brucethemoose@lemmy.world

16 points

9 days ago

*

Literally thought this was real for a sec. It could be I guess.

Even better that completely fake tweet screenshots are a thing. Let Twitter burn.

report

reply

[ +- ]

naeap@sopuli.xyz

4 points

9 days ago

Let X burn - ironically like the KKK did
Twitter will always have a place in my heart. Let it rest in peace

report

reply

Show more comments

Show more comments

[ +- ]

itsathursday@lemmy.world

12 points

10 days ago

Ok buddy

report

reply

[ +- ]

Fermion@feddit.nl

3 points

9 days ago

If anyone else is confused like I was, this is an adaptation of a line from “Better Call Saul.”

report

reply

[ +- ]

kuerbiskernoel@feddit.org

1 point

9 days ago

I thought it was a parody on Trump’s speech style

report

reply

[ +- ]

brucethemoose@lemmy.world

66 points

9 days ago

*

The OpenAI “don’t train on our output” clause is a meme in the open LLM research community.

EVERYONE does it, implicitly or sometimes openly, with chatml formatting and OpenAI specific slop leaking into base models. They’ve been doing it forever, and the consensus seems to be that it’s not enforceable.

OpenAI probably does it too, but incredibly, they’re so obsessively closed and opaque is hard to tell.

So as usual, OpenAI is full of shit here, and don’t believe a word that comes out of Altman’s mouth. Not one.

report

reply

[ +- ]

FatCrab@lemmy.one

18 points

9 days ago

Yup. Not only is there no IP right associated with generated content, even if there was, utilizing that content for training purposes doesn’t really in and of itself reflect an act of copying (which is of course their position as well), so that clause is some funny shit.

report

reply

[ +- ]

🇦🇺𝕄𝕦𝕟𝕥𝕖𝕕𝕔𝕣𝕠𝕔𝕕𝕚𝕝𝕖@lemm.ee

55 points

9 days ago

Copyright is dead. For good and for bad.

When I rule the world copyright lasts as long as ur warranty and when that expirees it must be published under agplv3.

report

reply

[ +- ]

T00l_shed@lemmy.world

42 points

9 days ago

Copyright is dead for us chumps, YOU try and steal? Straight to jail

report

reply

[ +- ]

reev@sh.itjust.works

9 points

9 days ago

report

reply

[ +- ]

Dkarma@lemmy.world

-7 points

9 days ago

Lol this is funny since you actually can do exactly what openai did…and it is perfectly legal.

report

reply

[ +- ]

T00l_shed@lemmy.world

8 points

9 days ago

Please go ahead and use Disney IP and let me know how this works.

report

reply

[ +- ]

Mojave@lemmy.world

31 points

9 days ago

Copyright was never good.

This post was brought to you by the Copyleft gang

report

reply

[ +- ]

Knock_Knock_Lemmy_In@lemmy.world

14 points

9 days ago

Copyright encourages new work, but optimal protection is 15 years, not life+70.

report

reply

[ +- ]

brucethemoose@lemmy.world

12 points

9 days ago

Thanks, Disney…

report

reply

[ +- ]

MisterFrog@lemmy.world

5 points

9 days ago

I could live with life of the author and spouse (not children), but only if they haven’t sold the rights.

If they’ve sold the rights, 15 years.

report

reply

Show more comments

Show more comments

[ +- ]

merc@sh.itjust.works

1 point

9 days ago

Copyright sometimes encourages new work, it often discourages it. Some of humanity’s greatest art comes from a time before copyright. Some of the earliest English stories come from a tradition of essentially “fanfic”. People took the Arthurian legends and wrote another story in that setting using those same characters. That would be illegal under copyright.

There may be an optimal copyright term that both encourages artists to pursue it, but then allows remixes while the art is still relevant. But, we don’t know if that optimal copyright term is 14 years, 3 months or 2 centuries. (Personally, I doubt it’s 2 centuries, but the current copyright terms are headed in that direction.)

Also, different types of works should probably have different copyright terms. Computer code is not going to be relevant in a century, it’s probably loses half its relevance in something like 5 years. OTOH, a novel might take 5+ years to write, and might remain relevant for decades, even centuries. Music can be relevant for centuries, but it also often fits in a short zeitgeist, and being able to remix it while still in that cultural moment would be beneficial to everyone.

Maybe the real solution is some kind of universal basic income and no copyright terms at all. That way, you don’t need a day job to pursue a passion that doesn’t make you money. There’s also nothing to stop you from finding patrons who will provide additional support because they love your work.

It all comes down to what the goal of copyright is. Are we trying to make art as a career possible? Do we want copyright behemoths like Disney? Do we want people to express themselves freely? Should cultural works be freely available to people while they’re relevant?

report

reply

[ +- ]

Dkarma@lemmy.world

0 points

9 days ago

Copyright never applied to training…it’s a moot point.

report

reply

[ +- ]

TheEighthDoctor@lemmy.zip

1 point

9 days ago

I hope so

report

reply

[ +- ]

RedditWanderer@lemmy.world

52 points

9 days ago

Watch them suddenly try to ban this chinese code under the same stuff they didn’t want to go after tiktok for.

report

reply

[ +- ]

✺roguetrick✺@lemmy.world

20 points

9 days ago

At this point regulatory capture is expected in the states. But we don’t have a corrupt government. Not at all.

report

reply

[ +- ]

Sabata@ani.social

25 points

9 days ago

Oh no, don’t make me torrent my illegal and unregulated Ai like a cool cyberpunk hacker.

report

reply

[ +- ]

brucethemoose@lemmy.world

16 points

9 days ago

*

Deepseek R1 runs with open source code from an American company, specifically Huggingface.

They have their own secret sauce inference code, sure, but they also documented it on a high level in the paper, so a US company can recreate it if they want.

There’s nothing they can do, short of a hitler esque “all open models are banned, you must use these select American APIs by law.” That would be like telling the US “everyone must use Bing and the Bing API for all search queries, anything else is illegal.”

report

reply

[ +- ]

RedditWanderer@lemmy.world

10 points

9 days ago

Ah well if it’s only hitler-esque stuff then I guess we’re safe? /s

report

reply

[ +- ]

SaharaMaleikuhm@feddit.org

4 points

9 days ago

Clearly that will get US back to number 1, right? Just lock it all down, what could go wrong?

report

reply

[ +- ]

brucethemoose@lemmy.world

1 point

9 days ago

Or gasp advocate for open source development in the US?

Unthinkable, right?

report

reply

[ +- ]

Grimy@lemmy.world

8 points

9 days ago

*

They are already talking about it.

U.S. officials are looking at the national security implications of the Chinese artificial intelligence app DeepSeek, White House press secretary Karoline Leavitt said on Tuesday, while President Donald Trump’s crypto czar said it was possible that intellectual property theft could have been at play.

https://archive.ph/t37xU

report

reply

[ +- ]

Naia@lemmy.blahaj.zone

8 points

9 days ago

They might try, but if their goal was to destabilizing western dominance for LLMs making it completely open source was the best way.

This isn’t like TikTok. They have a server that hosts it, but anyone can take their model and run it and there are going to be a lot of us companies besides the big Ai ones looking at it. Even the big Ai ones will likely try to adapt the stuff they’ve spent to long brute forcing to get improvement.

The thing is, it’s less about the actual model and more about the method. It does not take anywhere close to as many resources to train models like deepseek compared to what companies in the US have been doing. It means that there is no longer going to be just a small group hording the tech and charging absurd amounts for it.

Running the model can be no more taxing than playing a modern video game, except the load is not constant.

The cat is out of the bag. They could theoretically ban the direct models released from the research team, but retrained variants are going to be hard to differentiate from scratch models. And the original model is all over the place and have had people hacking away at it.

Blocking access to their hosted service right now would just be petty, but I do expect that from the current administration…

report

reply

[ +- ]

brucethemoose@lemmy.world

2 points

9 days ago

*

Running the model can be no more taxing than playing a modern video game, except the load is not constant.

This is not true, Deepseek R1 is huge. There’s a lot of confusion between the smaller distillations based on Qwen 2.5 (some that can run on consumer GPUs), and the “full” Deepseek R1 based on Deepseekv3

Your point mostly stands, but the “full” model is hundreds of gigabytes, and the paper mentioned something like a bank of 370 GPUs being optimal for hosting. It’s very efficient because its only like 30B active, which is bonkers, but still.

report

reply

Microblog Memes

!microblogmemes@lemmy.world

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

Rules:

Please put at least one word relevant to the post in the post title.
Be nice.
No advertising, brand promotion or guerilla marketing.
Posters are encouraged to link to the toot or tweet etc in the description of posts.

Related communities:

!whitepeopletwitter@sh.itjust.works
!curatedtumblr@sh.itjust.works

Community stats

12K
Monthly active users
2.1K
Posts
91K
Comments

Community moderators

Ready! Player 31@lemmy.world
aeronmelon@lemmy.world

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.8 (github)