How to run LLaMA (and other LLMs) on Android.

posted 5 days ago

llama@lemmy.dbzer0.com

privacy@lemmy.ml

18 commentshide report

cross-posted from: https://lemmy.dbzer0.com/post/36841328

Hello, everyone! I wanted to share my experience of successfully running LLaMA on an Android device. The model that performed the best for me was llama3.2:1b on a mid-range phone with around 8 GB of RAM. I was also able to get it up and running on a lower-end phone with 4 GB RAM. However, I also tested several other models that worked quite well, including qwen2.5:0.5b , qwen2.5:1.5b , qwen2.5:3b , smallthinker , tinyllama , deepseek-r1:1.5b , and gemma2:2b. I hope this helps anyone looking to experiment with these models on mobile devices!

Step 1: Install Termux

Download and install Termux from the Google Play Store or F-Droid

Step 2: Set Up proot-distro and Install Debian
Open Termux and update the package list:
pkg update && pkg upgrade
Install proot-distro
pkg install proot-distro
Install Debian using proot-distro:
proot-distro install debian
Log in to the Debian environment:
proot-distro login debian
You will need to log-in every time you want to run Ollama. You will need to repeat this step and all the steps below every time you want to run a model (excluding step 3 and the first half of step 4).
Step 3: Install Dependencies
Update the package list in Debian:
apt update && apt upgrade
Install curl:
apt install curl
Step 4: Install Ollama
Run the following command to download and install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
Start the Ollama server:
ollama serve &
After you run this command, do ctrl + c and the server will continue to run in the background.
Step 5: Download and run the Llama3.2:1B Model
Use the following command to download the Llama3.2:1B model:
ollama run llama3.2:1b
This step fetches and runs the lightweight 1-billion-parameter version of the Llama 3.2 model .
Running LLaMA and other similar models on Android devices is definitely achievable, even with mid-range hardware. The performance varies depending on the model size and your device’s specifications, but with some experimentation, you can find a setup that works well for your needs. I’ll make sure to keep this post updated if there are any new developments or additional tips that could help improve the experience. If you have any questions or suggestions, feel free to share them below!

– llama

Sort:

Hot Top Controversial New Old

[ - ]

Autonomous User@lemmy.world

8 points

5 days ago

Warning, Llama is not libre. llama.com/llama3_3/license

Options here (check the license columm is green) wikipedia.org/wiki/List_of_large_language_models

permalink

report

[ - ]

Cris16228@lemmy.today

1 point

5 days ago

And what’s the purpose of running it locally? Just curious. Is there’s anything really libre or better?

Is there any difference between LLaMA or any libre model and ChatGPT (the first and popular I know)

permalink

report

[ - ]

projectmoon@lemm.ee

4 points

5 days ago

Most open/local models require a fraction of the resources of chatgpt. But they are usually not AS good in a general sense. But they often are good enough, and can sometimes surpass ChatGPT in specific domains.

permalink

report

parent

[ - ]

Cris16228@lemmy.today

1 point

5 days ago

Do you know about anything libre? I’m curious to try something. Better if self-hosted (?)

According to a Youtuber, deekseek (or whatever the name is, the Chinese Open source one) is better than ChatGPT when he tried one simple request of making a Tetris game and ChatGPT gave a broken game while the other one didn’t

Idk why lol

permalink

report

parent

[ - ]

projectmoon@lemm.ee

1 point

5 days ago

They’re probably referring to the 671b parameter version of deepseek. You can indeed self host it. But unless you’ve got a server rack full of data center class GPUs, you’ll probably set your house on fire before it generates a single token.

If you want a fully open source model, I recommend Qwen 2.5 or maybe deepseek v2. There’s also OLmo2, but I haven’t really tested it.

Mistral small 24b also just came out and is Apache licensed. That is something I’m testing now.

permalink

report

parent

Show more comments

[ - ]

llama@lemmy.dbzer0.comOP

2 points

4 days ago

For me the biggest benefits are:

Your queries don’t ever leave your computer
You don’t have to trust a third party with your data
You know exactly what you’re running
You can tweak most models to your liking
You can upload sensitive information to it and not worry about it
It works entirely offline
You can run several models

permalink

report

parent

[ - ]

Cris16228@lemmy.today

2 points

4 days ago

The biggest problem:

I don’t have enough RAM/GPU to run it on a server

But it looks interesting

permalink

report

parent

[ - ]

kekmacska@lemmy.zip

-7 points

5 days ago

you only fry your phone with this. very bad idea

permalink

report

[ - ]

llama@lemmy.dbzer0.comOP

3 points

5 days ago

Not true. If you load a model that is below your phone’s hardware capabilities it simply won’t open. Stop spreading fud.

permalink

report

parent

[ - ]

projectmoon@forum.agnos.is

0 points

4 days ago

@llama@lemmy.dbzer0.com Depends on the inference engine. Some of them will try to load the model until it blows up and runs out of memory. Which can cause its own problems. But it won’t overheat the phone, no. But if you DO use a model that the phone can run, like any intense computation, it can cause the phone to heat up. Best not run a long inference prompt while the phone is in your pocket, I think.

permalink

report

parent

[ - ]

llama@lemmy.dbzer0.comOP

1 point

4 days ago

Thanks for your comment. That for sure is something to look out for. It is really important to know what you’re running and what possible limitations there could be. Not what the original comment said, though.

permalink

report

parent

[ - ]

kekmacska@lemmy.zip

-3 points

4 days ago

that’s not how it works. Your phone can easily overheat if you use it too much, even if your device can handle it. Smartphones don’t have cooling like pcs and laptops (except some rog phone and stuff). If you don’t want to fry your processor, only run LLMs on high-end gaming pcs with All in one water cooling

permalink

report

parent

[ - ]

moonpiedumplings@programming.dev

3 points

4 days ago

This is so horrifically wrong, I don’t even know where to start.

The short version is that phone and computer makers aren’t stupid and they will kill things or shutdown when overheating happens. If you were a phone maker, why tf would you allow someone to fry their own phone?

My laptop has shut itself off when I was trying to compile code while playing video games, while watching twitch. My android phone has killed apps when I try to do too much as well.

permalink

report

parent

[ - ]

llama@lemmy.dbzer0.comOP

0 points

4 days ago

This is all very nuanced and there isn’t a clear cut answer. It really depends on what you’re running, for how long you’re running it, your device specs, etc. The LLMs I mentioned in the post did just fine and did not cause any overheating if not used for extended periods of time. You absolutely can run a SMALL LLM and not fry your processor if you don’t overdo it. Even then, I find it extremely unlikely that you’re going to cause permanent damage to your hardware components.

Of course that is something to be mindful of, but that’s not what the person in the original comment said. It does run, but you need to be aware of the limitations and potential consequences. That goes without saying, though.

Just don’t overdo it. Or do, but the worst thing that will happen is your phone getting hella hot and shutting down.

permalink

report

parent

Show more comments

Privacy

!privacy@lemmy.ml

Create post

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Posting a link to a website containing tracking isn’t great, if contents of the website are behind a paywall maybe copy them into the post
Don’t promote proprietary software
Try to keep things on topic
If you have a question, please try searching for previous discussions, maybe it has already been answered
Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
Be nice :)

How to run LLaMA (and other LLMs) on Android.

Step 1: Install Termux

Step 2: Set Up proot-distro and Install Debian

Step 3: Install Dependencies

Step 4: Install Ollama

Step 5: Download and run the Llama3.2:1B Model

Privacy

!privacy@lemmy.ml

A place to discuss privacy and freedom in the digital world.

Some Rules

Related communities

Community stats

Community moderators