Running Ollama on Docker: A Quick Guide

Hi it's me again! Over the past few days, I've been testing multiples ways to work with LLMs locally, and so far, Ollama was the best tool (ignoring UI and other QoL aspects) for setting up a fast environment to test code and features. I've tried GPT4ALL and other tools before, but they seem overly bloated when the goal is simply to set up a running model to connect with a LangChain API (on Windows with WSL). Ollama provides an extremely straightforward experience. Because of this, today I decided to install and use it via Docker containers — and it's surprisingly easy and powerful.. With just five commands, we can set up the environment. Let's take a look. Step 1 - Pull the latest Ollama Docker image docker pull ollama/ollama If you want to download an older version, you can specify the corresponding tag after the container name. By default, the :latest tag is downloaded. You can check a list of available Ollama tags here. Step 2 - Create a Docker network Since we'll typically use and connect multiple containers, we need to specify a shared communication channel. To achieve this, it's a good practice create a Docker network. docker network create You can check a list of created Docker networks by running the following command: docker network list Step 3 - Run the Ollama container In this tutorial, we're going to run Ollama with CPU only. If you need to use GPU, the official documentation provide a step-by-step guide. The command to run the container is also listed in the documentation, but we need to specify which network it should connect to, so we must add the --network parameter. docker run -d --netowk -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama Step 4 - Run commands inside the Ollama container To download Ollama models, we need to run ollama pull command. To do this, we simply execute the command below, which enables the execution inside the container by enabling the interative mode (-it parameter). Then, we run ollama pull to download the llama3.2:latest (3B), quantized model: docker exec -it ollama ollama pull llama3.2 Visit the Ollama website to check the list of available models. Now, wait for the download to finish. You'll get this: Step 5 - Check the downloaded models To list the locally available models, just run: ollama list You should get this output: So you're done! Now you have Ollama running (using only the CPU), with the llama3.2:latest model available locally. To run it with a GPU, check the documentation link in Step 3. I'll share more short notes on working with Ollama and LangChain in the next few days. Stay tuned!

Mar 13, 2025 - 05:38
 0
Running Ollama on Docker: A Quick Guide

Hi it's me again! Over the past few days, I've been testing multiples ways to work with LLMs locally, and so far, Ollama was the best tool (ignoring UI and other QoL aspects) for setting up a fast environment to test code and features.
I've tried GPT4ALL and other tools before, but they seem overly bloated when the goal is simply to set up a running model to connect with a LangChain API (on Windows with WSL).

Ollama provides an extremely straightforward experience. Because of this, today I decided to install and use it via Docker containers — and it's surprisingly easy and powerful..

With just five commands, we can set up the environment. Let's take a look.

Step 1 - Pull the latest Ollama Docker image

docker pull ollama/ollama

If you want to download an older version, you can specify the corresponding tag after the container name. By default, the :latest tag is downloaded. You can check a list of available Ollama tags here.

Step 2 - Create a Docker network

Since we'll typically use and connect multiple containers, we need to specify a shared communication channel. To achieve this, it's a good practice create a Docker network.

docker network create 

You can check a list of created Docker networks by running the following command:

docker network list

Step 3 - Run the Ollama container

In this tutorial, we're going to run Ollama with CPU only. If you need to use GPU, the official documentation provide a step-by-step guide.

The command to run the container is also listed in the documentation, but we need to specify which network it should connect to, so we must add the --network parameter.

docker run -d --netowk  -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Step 4 - Run commands inside the Ollama container

To download Ollama models, we need to run ollama pull command.

To do this, we simply execute the command below, which enables the execution inside the container by enabling the interative mode (-it parameter).
Then, we run ollama pull to download the llama3.2:latest (3B), quantized model:

docker exec -it ollama ollama pull llama3.2

Visit the Ollama website to check the list of available models. Now, wait for the download to finish.

You'll get this:

Download model progress bar screenshot

Step 5 - Check the downloaded models

To list the locally available models, just run:

ollama list

You should get this output:

List of available downloaded models

So you're done! Now you have Ollama running (using only the CPU), with the llama3.2:latest model available locally. To run it with a GPU, check the documentation link in Step 3.

I'll share more short notes on working with Ollama and LangChain in the next few days. Stay tuned!