How to Run Your Own Local LLM on a Raspberry Pi

Raspberry Pi Computer

Since OpenAI’s mind-blowing release of ChatGPT 3 in late 2022, Large Language Models (LLMs) have captured the world’s imagination by demonstrating remarkable capabilities, from writing essays to answering complex questions.

However, you don’t need to rely on companies like OpenAI or Google or Anthropic and share potentially personal data to take advantage of the power of LLMs. With just an affordable Raspberry Pi, you can set up your own local AI chat-based assistant. This guide shows you how.

What You’ll Need

To set up your own LLM on a Raspberry Pi, there are a few essential components you’ll need:

  • Raspberry Pi: Since LLMs are resource-intensive, it’s best to use the most powerful Raspberry Pi available for optimal performance. At the time of writing this article, the Raspberry Pi 5 with 8 GB of RAM is the recommended choice.
  • microSD Card with Raspberry Pi OS: For maximum performance, consider using the lite version of Raspberry Pi OS, as a graphical user interface isn’t necessary to run an LLM (you can interact with it remotely using a terminal and SSH). However, if you’re using your Raspberry Pi for other tasks or as your primary computer, you can use the regular version of Raspberry Pi OS. Our guide on how to set up Raspberry Pi OS on a Raspberry Pi can help you get started.
  • Additional components: Apart from the Raspberry Pi and a fast microSD card, you’ll need a reliable power supply (the official one is recommended), a keyboard, mouse, and monitor for initial setup (optional if you’re using SSH), and an internet connection for downloading necessary software and models.

With these components in hand, you’re ready to start setting up your own LLM on your Raspberry Pi.

Install Ollama

The first step in setting up your own LLM on a Raspberry Pi is to install the necessary software. Currently, the two most popular choices for running LLMs locally are llama.cpp and Ollama.

  • llama.cpp is a lightweight C++ implementation of Meta’s LLaMA (Large Language Model Adapter) that can run on a wide range of hardware, including Raspberry Pi. It was developed by Georgi Gerganov and released in March 2023.
  • Ollama, on the other hand, is built around llama.cpp, offering several user-friendly features. It automatically handles templating chat requests to the format each model expects, and it loads and unloads models on demand based on the client’s request. Ollama also manages downloading and caching models, including quantized models, so you can request them by name.

For this guide, we’ll be using Ollama due to its ease of use and extra features.

To install Ollama on your Raspberry Pi, open a terminal window on your Raspberry Pi. If you’re using SSH, connect to your Raspberry Pi using your preferred SSH client. Then, enter the following command in the terminal:

curl -fsSL https://ollama.com/install.sh | sh

This command downloads and executes the installation script from the official Ollama website. The script will automatically install the required dependencies and set up Ollama on your Raspberry Pi.

Ollama Installation Finished

Download and Run an LLM

With Ollama installed, it’s time to download a large language model. If you’re using a Raspberry Pi with 8 GB of RAM, you can run models with up to 7 billion parameters (the settings that the AI uses to determine its outputs).

Some popular choices include Mistral (7B), Gemma (7B or 2B), Llama 2 uncensored (7B), or Microsoft’s Phi-3 (3.8B). You can view all supported models on the Ollama library page.

For this guide, we’ll be using Microsoft’s Phi-3 model. Despite its small size and efficiency, Phi-3 is an extremely capable model. To install it, simply run the following command in the terminal:

ollama run phi3

This command will download and install the Phi-3 model, and it will also automatically start an interactive chat session with the model.

Ollama Phi3 Download

Using a Local LLM on Your Raspberry Pi

After downloading and installing the Phi-3 model, you’ll see a prompt in the terminal that looks like this:

>>> Send a message (/? for help)

This means that the LLM is running and waiting for your input. To start interacting with the model, type your message and press Enter.

Ollama Answering A Question

Here are some tips for crafting effective prompts:

  1. Be specific: Provide clear and detailed instructions or questions to help the LLM understand what you’re looking for.
  2. Set the context: Give the LLM some background information or a scenario to help it generate more relevant responses.
  3. Define roles: Specify the role the LLM should assume in its response, such as a storyteller, a teacher, or a technical expert.

To end the LLM session, press Ctrl + d or enter the /bye command. If you wish to start another session later, just open a new terminal and run the ollama run phi3 command. Since the model is already downloaded, it will start up quickly without needing to download again.

Keep in mind that the Raspberry Pi 5’s performance has its limits, and it can only output a few tokens per second. For better performance, consider running Ollama on a more powerful computer with a dedicated graphics card.

Image credit: Unsplash. Screenshots by David Morelo.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

David Morelo
David Morelo - Staff Writer

David Morelo is a professional content writer in the technology niche, covering everything from consumer products to emerging technologies and their cross-industry application. His interest in technology started at an early age and has only grown stronger over the years.