LLM inference can be batched, reducing the cost per request. If you have too few customers, you can’t fill the optimal batch size.
That said, the optimal batch size on today’s hardware is not big (<20). I would be very very surprised if they couldn’t fill it for any few-seconds window.
i would swear that in an earlier version of this message the optimal batch size was estimated to be as large as twenty.
this sounds like an attempt to demand others disprove the assertion that they’re losing money, in a discussion of an article about Sam saying they’re losing money
What? I’m not doubting what he said. Just surprised. Look at this. I really hope Sam IPO his company so I can short it.