You are viewing a single thread.
View all comments
48 points

Exactly how I plan to deploy LLMs on my desktop 😹

permalink
report
reply
14 points

You should be able to fit a model like LLaMa2 in 64GB RAM, but output will be pretty slow if it’s CPU-only. GPUs are a lot faster but you’d need at least 48GB of VRAM, for example two 3090s.

permalink
report
parent
reply
6 points
*

Amazon had some promotion in the summer and they had a cheap 3060 so I grabbed that and for Stable Diffusion it was more than enough, so I thought oh… I’ll try out llama as well. After 2 days of dicking around, trying to load a whack of models, I spent a couple bucks and spooled up a runpod instance. It was more affordable then I thought, definitely cheaper than buying another video card.

permalink
report
parent
reply
4 points

As far as I know, Stable Diffusion is a far smaller model than Llama. The fact that a model as large as LLaMa can even run on consumer hardware is a big achievement.

permalink
report
parent
reply
2 points

*laughs in top of the line 2012 hardware 😭

permalink
report
parent
reply
3 points

I need it just for the initial load on transformers based models to then run them in 8 bit. It is ideal for that situation

permalink
report
parent
reply
2 points

That does make a lot of sense

permalink
report
parent
reply
2 points

Same. I’m patient

permalink
report
parent
reply

Programmer Humor

!programmerhumor@lemmy.ml

Create post

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

  • Posts must be relevant to programming, programmers, or computer science.
  • No NSFW content.
  • Jokes must be in good taste. No hate speech, bigotry, etc.

Community stats

  • 5.3K

    Monthly active users

  • 1.5K

    Posts

  • 33K

    Comments