Getting started with local AI development using Docker Model Runner

Large language models (LLMs) like GPT-4, LLaMA, and Mistral have totally reshaped how we build everything. These models are massive—trained on giant datasets and packed with billions of parameters—and they can do everything from generating articles to writing code. With the open-source world making powerful LLMs more accessible, more developers are eager to run these models on their own machines. Running locally can be better for privacy, faster, and keeping costs down. But how can you get an LLM up and running on your local machine quickly and easily? Meet Docker Model Runner Docker Desktop 4.40 introduced Docker Model Runner, a tool designed to make running LLMs locally a whole lot easier. It's currently in beta and works only on Docker Desktop for Mac with Apple Silicon, but it’s a strong step toward making local AI development more accessible. Docker Model Runner isn’t containerizing the model itself. Instead, it runs the model directly on your machine and uses Docker to wrap a convenient API interface around it. That means you don’t have to worry about model environments or setting up APIs yourself. So What Exactly Is Docker Model Runner? Model Runner is a new Docker Desktop feature (again, Mac-only and specifically for Apple Silicon machines) that helps you get open-source LLMs up and running locally—without a bunch of manual set up. Docker Model Runner makes managing L The magic here is that the model execution happens on your machine while Docker just acts as the wrapper, exposing a standardized API. Getting started is about as simple as it gets: you pull a model and point your application at it. The API can be accessed in a few ways: from another container via the OpenAPI interface, or directly from the host using either the Docker socket or TCP. For more details on the available endpoints, check out the official documentation. You can also run the model interactively using the docker model run command if you like. Feel free to check out the currently available models on Docker Hub. A Few Caveats While Docker Model Runner is a huge help, there are a few things to keep in mind: You still need decent hardware: The model runs on your machine, so performance depends on what you’ve got. The good thing is that Docker Hub has various sized models that should work across a wide range of hardware configs. Size matters: Some really large models might not fit in your available VRAM. Docker Model Runner will dynamically load and unload the model from RAM as necessary, but still something to be aware of. Watch those ports: Like any Docker app, you’ll need to manage networking and avoid conflicts. What Can You Use It For? Docker Model Runner is perfect for: Prototyping AI features: Build and test locally before pushing to production. Keeping things private: Since everything runs on your machine, no data leaves your environment. Learning and experimenting: Great for developers who want to explore LLMs without deep ML expertise. Offline-ready apps: Ideal if you’re building tools that need to work without the internet. Take Docker Model Runner For a Spin If you'd like to see Docker Model Runner in action, Docker has provided a GitHub repo with examples in Go, Python, and Node. All you need to do is clone the repo, and run the setup script: You can access each different example app at the following URLs: http://localhost:8080 for the GenAI Application in Go http://localhost:8081 for the GenAI Application in Python http://localhost:8082 for the GenAI Application in Node You can use the docker model commands to interact with your models. For instance docker model ls to see what models have been downloaded or docker model rm to remove a model, and of course docker model run to run a model interactively. Wrapping It Up Docker Model Runner is a fantastic tool if you’re looking to experiment with LLMs locally without getting bogged down in complicated setup steps. It takes care of all the boring parts so you can get straight to building cool stuff. Whether you care about privacy, want to work offline, or just want to hack around with AI, it’s definitely worth checking out. As the ecosystem grows and Docker expands support beyond Mac, it could become a go-to tool in every AI developer’s toolkit. If you have any questions you can find me on Bluesky (@mikegcoleman.com) or drop them in the comments below!

Apr 7, 2025 - 18:41

Getting started with local AI development using Docker Model Runner

Large language models (LLMs) like GPT-4, LLaMA, and Mistral have totally reshaped how we build everything. These models are massive—trained on giant datasets and packed with billions of parameters—and they can do everything from generating articles to writing code.

With the open-source world making powerful LLMs more accessible, more developers are eager to run these models on their own machines. Running locally can be better for privacy, faster, and keeping costs down.

But how can you get an LLM up and running on your local machine quickly and easily?

Meet Docker Model Runner

Docker Desktop 4.40 introduced Docker Model Runner, a tool designed to make running LLMs locally a whole lot easier. It's currently in beta and works only on Docker Desktop for Mac with Apple Silicon, but it’s a strong step toward making local AI development more accessible.

Docker Model Runner isn’t containerizing the model itself. Instead, it runs the model directly on your machine and uses Docker to wrap a convenient API interface around it. That means you don’t have to worry about model environments or setting up APIs yourself.

So What Exactly Is Docker Model Runner?

Model Runner is a new Docker Desktop feature (again, Mac-only and specifically for Apple Silicon machines) that helps you get open-source LLMs up and running locally—without a bunch of manual set up. Docker Model Runner makes managing L

The magic here is that the model execution happens on your machine while Docker just acts as the wrapper, exposing a standardized API.

Getting started is about as simple as it gets: you pull a model and point your application at it.

The API can be accessed in a few ways: from another container via the OpenAPI interface, or directly from the host using either the Docker socket or TCP. For more details on the available endpoints, check out the official documentation.

You can also run the model interactively using the docker model run command if you like.

Feel free to check out the currently available models on Docker Hub.

A Few Caveats

While Docker Model Runner is a huge help, there are a few things to keep in mind:

You still need decent hardware: The model runs on your machine, so performance depends on what you’ve got. The good thing is that Docker Hub has various sized models that should work across a wide range of hardware configs.
Size matters: Some really large models might not fit in your available VRAM. Docker Model Runner will dynamically load and unload the model from RAM as necessary, but still something to be aware of.
Watch those ports: Like any Docker app, you’ll need to manage networking and avoid conflicts.

What Can You Use It For?

Docker Model Runner is perfect for:

Prototyping AI features: Build and test locally before pushing to production.
Keeping things private: Since everything runs on your machine, no data leaves your environment.
Learning and experimenting: Great for developers who want to explore LLMs without deep ML expertise.
Offline-ready apps: Ideal if you’re building tools that need to work without the internet.

Take Docker Model Runner For a Spin

If you'd like to see Docker Model Runner in action, Docker has provided a GitHub repo with examples in Go, Python, and Node.

All you need to do is clone the repo, and run the setup script:

You can access each different example app at the following URLs:

http://localhost:8080 for the GenAI Application in Go
http://localhost:8081 for the GenAI Application in Python
http://localhost:8082 for the GenAI Application in Node

You can use the docker model commands to interact with your models. For instance docker model ls to see what models have been downloaded or docker model rm to remove a model, and of course docker model run to run a model interactively.

Wrapping It Up

Docker Model Runner is a fantastic tool if you’re looking to experiment with LLMs locally without getting bogged down in complicated setup steps. It takes care of all the boring parts so you can get straight to building cool stuff.

Whether you care about privacy, want to work offline, or just want to hack around with AI, it’s definitely worth checking out. As the ecosystem grows and Docker expands support beyond Mac, it could become a go-to tool in every AI developer’s toolkit.

If you have any questions you can find me on Bluesky (@mikegcoleman.com) or drop them in the comments below!