You are viewing a single thread.
View all comments View context
22 points

i think you’re missing the point that “Deepseek was made for only $6M” has been the trending headline for the past while, with the specific point of comparison being the massive costs of developing ChatGPT, Copilot, Gemini, et al.

to stretch your metaphor, it’s like someone rolling up with their car, claiming it only costs $20 (unlike all the other cars that cost $20,000), when come to find out that number is just how much it costs to fill the gas tank up once

permalink
report
parent
reply
7 points

Now im imagining GPUs being traded like old cars.

slaps GPU This GPU? perfectly fine, second hand yes, but only used to train one model, by an old lady, will run the upcoming monster hunter wilds perfectly fine.

permalink
report
parent
reply
7 points

DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

Emphasis mine. Deepseek was very upfront that this 6m was training only. No other company includes r&d and salaries when they report model training costs, because those aren’t training costs

permalink
report
parent
reply
11 points
*

consider this paragraph from the Wall Street Journal:

DeepSeek said training one of its latest models cost $5.6 million, compared with the $100 million to $1 billion range cited last year by Dario Amodei, chief executive of the AI developer Anthropic, as the cost of building a model.

you’re arguing to me that they technically didn’t lie – but it’s pretty clear that some people walked away with a false impression of the cost of their product relative to their competitors’ products, and they financially benefitted from people believing in this false impression.

permalink
report
parent
reply
2 points

Okay I mean, I hate to somehow come to the defense of a slop company? But WSJ saying nonsense is really not their fault, like even that particular quote clearly says “DeepSeek said training one” cost $5.6M. That’s just a true statement. No one in their right mind includes the capital expenditure in that, the same way when you say “it took us 100h to train a model” that doesn’t include building a data center in those 100h.

Beside whether they actually lied or not, it’s still immensely funny to me that they could’ve just told a blatant lie nobody factchecked and it shook the market to the fucking core wiping off like billions in valuation. Very real market based on very real fundamentals run by very serious adults.

permalink
report
parent
reply
-1 points

but it’s pretty clear that some people walked away with a false impression of the cost of their product relative to their competitors’ products

Ask yourself why that may be, as you are the one who posted a link to a WSJ article that is repeating an absurd 100m-1b figure from a guy who has a vested interest in making the barrier of entry into the field seem as high as possible the increase the valuation of his company. Did WSJ make an attempt to verify the accuracy of these statements? Did it push for further clarification? Did it compare those statements to figures that have been made public by Meta and OpenAI? No on all counts - yet somehow “deepseek lied” because it explicitly stated their costs didn’t include capex, salaries, or R&D, but the media couldn’t be bothered to read to the end of the paragraph

permalink
report
parent
reply
0 points

No, it’s not. OpenAI doesn’t spend all that money on R&D, they spent majority of it on the actual training (hardware, electricity).

And that’s (supposedly) only $6M for Deepseek.

So where is the lie?

permalink
report
parent
reply
6 points
*

shot:

majority of it on the actual training (hardware, …)

chaser:

And that’s (supposedly) only $6M for Deepseek.

citation:

After experimentation with models with clusters of thousands of GPUs, High Flyer made an investment in 10,000 A100 GPUs in 2021 before any export restrictions. That paid off. As High-Flyer improved, they realized that it was time to spin off “DeepSeek” in May 2023 with the goal of pursuing further AI capabilities with more focus.

So where is the lie?

your post is asking a lot of questions already answered by your posting

permalink
report
parent
reply
-5 points

SemiAnalysis is “confident”

They did not answer anything, only alluded.

Just because they bought GPUs like everyone else doesn’t mean they could not train it cheaper.

permalink
report
parent
reply

TechTakes

!techtakes@awful.systems

Create post

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Community stats

  • 1.2K

    Monthly active users

  • 621

    Posts

  • 14K

    Comments

Community moderators