The Rule(lemmy.ml)

posted 4 months ago

roon@lemmy.ml

196@lemmy.blahaj.zone

63 commentshide report

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

josefo@leminal.space

3 points

4 months ago

there are other options less ram consuming?

permalink

report

parent

[ - ]

Pumpkin Escobar@lemmy.world

8 points

4 months ago

There’s quantization which basically compresses the model to use a smaller data type for each weight. Reduces memory requirements by half or even more.

There’s also airllm which loads a part of the model into RAM, runs those calculations, unloads that part, loads the next part, etc… It’s a nice option but the performance of all that loading/unloading is never going to be great, especially on a huge model like llama 405b

Then there are some neat projects to distribute models across multiple computers like exo and petals. They’re more targeted at a p2p-style random collection of computers. I’ve run petals in a small cluster and it works reasonably well.

permalink

report

parent

[ - ]

AdrianTheFrog@lemmy.world

1 point

4 months ago

Yes, but 200 gb is probably already with 4 bit quantization, the weights in fp16 would be more like 800 gb IDK if its even possible to quantize more, if it is, you’re probably better of going with a smaller model anyways

permalink

report

parent

[ - ]

theneverfox@pawb.social

5 points

4 months ago

Why, of course! People on here saying it’s impossible, smh

Let me introduce you to the wonderful world of thrashing. What is thrashing? It’s when you run out of ram. Luckily, most computers these days do something like swap space - they just treat your SSD as extra slow extra RAM.

Your computer gets locked up when it genuinely doesn’t have enough RAM still though, so it unloads some RAM into disk, puts what it needs right now back into RAM, executes a bit of processing, then the program tells it actually needs some of what got shelved on disk. And it does it super fast, so it’s dropping the thing it needs hundreds of times a second - technology is truly remarkable

Depending on how the software handles it, it might just crash… But instead it might just take literal hours

permalink

report

parent

196

!196@lemmy.blahaj.zone

Create post

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other ^rules

Community stats

9.4K
Monthly active users
16K
Posts
182K
Comments

Community stats

Community moderators