Lol. Lmao even. "DeepSeek R1 reproduced for $30: Berkeley researchers replicate DeepSeek R1 for $30—casting doubt on H100 claims and controversy"(techstartups.com)

posted 6 days ago

Snot Flickerman@lemmy.blahaj.zone

techtakes@awful.systems

77 comments

hide report

Sam “wrong side of FOSS history” Altman must be pissing himself.

Direct Nitter Link:

https://nitter.lucabased.xyz/jiayi_pirate/status/1882839370505621655

Sort:

Hot Top Controversial New Old

[ +- ]

SGforce@lemmy.ca

103 points

6 days ago

They finetuned 1.5-3b models. This is a non-story

permalink

report

[ +- ]

Yozul@beehaw.org

16 points

5 days ago

The headline is dumb, but the research isn’t. According to the actual contents of the article, that $30 is still 27 times cheaper than what it costs OpenAI to make a similar sized model which also performs worse. That’s still a big deal even if the people reporting on it made a stupid title for their article about it.

permalink

report

parent

[ +- ]

fmstrat@lemmy.nowsci.com

15 points

5 days ago

I feel like the author here doesnt know what the definition of “breakthrough” is.

permalink

report

parent

[ +- ]

MrPoopyButthole@lemmy.dbzer0.com

2 points

6 days ago

Yup and it’s not even testing general reasoning. They didn’t have money for that.

permalink

report

parent

[ +- ]

self@awful.systems

28 points

5 days ago

fuck almighty have these DeepSeek threads been attracting a lot of LLM “experts”

permalink

report

parent

[ +- ]

swlabr@awful.systems

22 points

5 days ago

LLM experts aka poop sommeliers

report

[ +- ]

27 points

6 days ago

Is General reasoning in the room with us now?

permalink

report

parent

[ +- ]

bitofhope@awful.systems

7 points

4 days ago

There’s no way to know since they didn’t have the money to test.

permalink

report

parent

[ +- ]

V0ldek@awful.systems

15 points

5 days ago

It’s actually “Reasonings General”, common misconception

permalink

report

parent

[ +- ]

o7___o7@awful.systems

7 points

5 days ago

Is that the guy who sells insurance with Shaq?

permalink

report

parent

[ +- ]

froztbyte@awful.systems

13 points

5 days ago

I heard someone say Private Reasoning was around the corner. Think they’re related?

permalink

report

parent

Show more comments

[ +- ]

BlueMonday1984@awful.systems

61 points

6 days ago

To reference a previous sidenote, DeepSeek gives corps and randos a means to shove an LLM into their shit for dirt-cheap, so I expect they’re gonna blow up in popularity.

permalink

report

[ +- ]

fallowseed@lemmy.world

33 points

6 days ago

open source behaving like open source? couldn’t be the evil scary chinese!

permalink

report

[ +- ]

vrighter@discuss.tchncs.de

30 points

5 days ago

open weights is not open source. If it were, then nobody would have to work on trying to reproduce it. They could just run the build script.

permalink

report

parent

[ +- ]

David Gerard@awful.systemsM

6 points

5 days ago

unfortunately, nobody cares cos they’re all thieves

permalink

report

parent

[ +- ]

Evinceo@awful.systems

4 points

4 days ago

OSI is gonna mandate that we call it open source now, didn’t ya hear?

permalink

report

parent

[ +- ]

reallykindasorta@slrpnk.net

18 points

6 days ago

Non-techie requesting a laymen explanation if anyone has time!

After reading a couple of”what makes nvidias h100 chips so special” articles I’m gathering that they were supposed to have a significant amount more computational capability than their competitors (which I’m taking to mean more computations per second). So the question with deepseek and similar is something like ‘how are they able to get the same results with less computations?’ and the answer is speculated to be more efficient code/instructions for the AI model so it can make the same conclusions with less computations overall, potentially reducing the need for special jacked up cpus to run it?

permalink

report

[ +- ]

mountainriver@awful.systems

16 points

5 days ago

Good question!

The guesses and rumours that you have got as replies makes me lean towards “apparently no one knows”.

And because it’s slop machines (also referred to as “AI”, there is always a high probability of some sort of scam.

permalink

report

parent

[ +- ]

froztbyte@awful.systems

10 points

4 days ago

pretty much my take as well. I haven’t seen any actual information from a primary source, just lots of hearsay and “what we think happened” analyst shit (e.g. that analyst group in the twitter screenshot has names but no citation/links)

and doubly yep on the “everyone could just be lying” bit

permalink

report

parent

[ +- ]

manicdave@feddit.uk

4 points

4 days ago

The article sort of demonstrates it. Instead of needing inordinate amounts of data and memory to increase it’s chance of one-shotting the countdown game. It only needs to know enough to prove itself wrong and roll the dice again.

permalink

report

parent

[ +- ]

justOnePersistentKbinPlease@fedia.io

8 points

6 days ago

From a technical POV, from having read into it a little:

Deepseek devs worked in a very low level language called Assembly. This language is unlike relatively newer languages like C in that it provides no guardrails at all and is basically CPU instructions in extreme shorthand. An “if” statement would be something like BEQ 1000, where it goes to a specific memory location(in this case address 1000 if two CPU registers are equal.)

The advantage of using it is that it is considerably faster than C. However, it also means that the code is mostly locked to that specific hardware. If you add more memory or change CPUs you have to refactor. This is one of the reasons the language was largely replaced with C and other languages.

Edit: to expound on this: “modern” languages are even slower, but more flexible in terms of hardware. This would be languages like Python, Java and C#

permalink

report

parent

[ +- ]

V0ldek@awful.systems

25 points

5 days ago

This is a really weird comment. Assembly is not faster than C, that’s a nonsensical statement, C compiles down to assembly. LLVM’s optimizations will most likely outperform or directly match whatever hand-crafted assembly you write. Why would BEQ 1000 be “considerably faster” than if (x == y) goto L_1000;? This collapses even further if you consider any application larger than a few hundred lines of code, any sensible compiler is going to beat you on optimizations if you try to write hand-crafted assembly. Try loading up assembly code and manually performing intraprocedural optimizations, lol, there’s a reason every compiled language goes through an intermediate representation.

Saying that C# is slower than C is also nonsensical, especially now that C# has built-in PGO it’s very likely it could outperform an application written in C. C#'s JIT compiler is not somehow slower because it’s flexible in terms of hardware, if anything that’s what makes it fast. For example you can write a vectorized loop that will be JIT-compiled to the ideal fastest instruction set available on the CPU running the program, whereas in C or assembly you’d have to manually write a version for each. There’s no reason to think that manual implementation would be faster than what the JIT comes up with at runtime, though, especially with PGO.

It’s kinda like you’re saying that a V12 engine is faster than a Ferrari and that they are both faster than a spaceship because the spaceship doesn’t have wheels.

I know you’re trying to explain this to a non-technical person but what you said is so terribly misleading I cannot see educational value in it.

permalink

report

parent

[ +- ]

froztbyte@awful.systems

14 points

5 days ago

and one doesn’t program GPUs with assembly (in the sense as it’s used with CPUs)

permalink

report

parent

[ +- ]

iltg@sh.itjust.works

-5 points

4 days ago

your statement is so extreme it gets nonsensical too.

compilers will usually produce higher optimized asm than writing it yourself, but there is room to improve usually. it’s not impossible that deepseek team obtained some performance gains hand-writing some hot sections directly in assembly. llvm must “play it safe” because doesn’t know your use case, you do and can avoid all safety checks (stack canaries, overflow checks) or cleanups (eg, make memory arenas rather than realloc). you can tell LLVM to not do those, but it may happen in the whole binary and not be desirable

claiming c# gets faster than C because of jit is just ridicolous: you need yo compile just in time! the runtime cost of jitting + the resulting code would be faster than something plainly compiled? even if c# could obtain same optimization levels (and it can’t: oop and .net runtime) you still pay the jit cost, which plainly compiled code doesn’t pay. also what are you on with PGO, as if this buzzword suddenly makes everything as fast as C?? the example they give is “devirtualization” of interfaces. seems like C just doesn’t have interfaces and can just do direct calls? how would optimizing up to C level make it faster than C?

you just come off as a bit entitled and captured in MS bullshit claims

permalink

report

parent

Show more comments

[ +- ]

justOnePersistentKbinPlease@fedia.io

-8 points

5 days ago

I have have crafted assembly instructions and have made it faster than the same C code.

Particular to if statements, C will do things push and pull values from the stack which takes a small but occasionally noticeable amount of cycles.

permalink

report

parent

Show more comments

[ +- ]

froztbyte@awful.systems

20 points

5 days ago

for anyone reading this comment hoping for an actual eli5, the “technical POV” here is nonsense bullshit. you don’t program GPUs with assembly.

the rest of the comment is the poster filling in bad comparisons with worse details

permalink

report

parent

[ +- ]

Pup Biru@aussie.zone

8 points

4 days ago

literally looks like LLM-generated generic slop: confidently incorrect without even a shred of thought

permalink

report

parent

[ +- ]

justOnePersistentKbinPlease@fedia.io

-5 points

5 days ago

For anyone reading this comment, that person doesnt know anything about assembly or C.

permalink

report

parent

Show more comments

[ +- ]

fartsparkles@lemmy.world

9 points

6 days ago

I’m sure that non techie person understood every word of this.

permalink

report

parent

[ +- ]

blakestacey@awful.systems

22 points

6 days ago

And I’m sure that your snide remark will both tell them what to simplify and explain how to do so.

Enjoy your free trip to the egress.

permalink

report

parent

[ +- ]

msage@programming.dev

-1 points

5 days ago

Putting Python, the slowest popular language, alongside Java and C# really irks me bad.

The real benefit of R1 is Mixture of Experts - the model is separated into smaller sections, that are trained and used independently, meaning you don’t need the entire model to be active all the time, just parts of it.

Meaning it uses less resources during training and general usage. For example instead of 670 billion parameters all the time, it can use 30 billion for specific question, and you can get away with using 2% of the hardware used by competition.

permalink

report

parent

[ +- ]

UndercoverUlrikHD@programming.dev

-5 points

5 days ago

Putting Python, the slowest popular language, alongside Java and C# really irks me bad.

I wouldn’t call python the slowest language when the context is machine learning. It’s essentially C.

permalink

report

parent

Show more comments

[ +- ]

justOnePersistentKbinPlease@fedia.io

-5 points

5 days ago

I used them as they are well known modern languages that the average person might have heard about.

permalink

report

parent

[ +- ]

fallowseed@lemmy.world

0 points

6 days ago

i read that that the chinese made alterations to the cards, as well-- they dismantled them to access the chips themselves and were able to do more precise micromanagement that cuda doesn’t support, for instance… basically they took the training wheels off and used a more fine-tuned and hands-on approach that gave them some serious advantages

permalink

report

parent

[ +- ]

froztbyte@awful.systems

10 points

5 days ago

got a source for that?

permalink

report

parent

[ +- ]