Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi

Introduction In this document, I’m sharing my journey of turning a Raspberry Pi into a powerful, real-time voice assistant. The goal was to: Capture voice input through a web interface. Process the text using a local LLM (like Mistral) running on the Pi. Generate voice responses using Piper for text-to-speech (TTS). Stream everything in real-time via WebSockets. All of this runs offline on the Raspberry Pi — no cloud services involved. Let’s dive into how I built it step by step! 1. Setting up the Raspberry Pi First, I set up my Raspberry Pi with the latest Raspberry Pi OS. It’s important to enable hardware interfaces and connect a USB microphone and speaker. Steps: Update the system: sudo apt-get update sudo apt-get upgrade Enable the audio interface: sudo raspi-config Navigate to System Options > Audio and select the correct output/input device. 2. Installing Ollama for Local LLMs Ollama makes it easy to run local LLMs like Mistral on your Raspberry Pi. I installed it using: curl -fsSL https://ollama.com/install.sh | sh Once installed, I pulled the Mistral model: ollama pull mistral To confirm it works, I ran a quick test: ollama run mistral The model was ready to process text right on the Pi! 3. Setting up Piper for Text-to-Speech (TTS) For offline voice generation, I chose Piper — a fantastic open-source TTS engine. Install dependencies: sudo apt-get install wget build-essential libsndfile1 Download Piper for ARM64 (Raspberry Pi): wget https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz tar -xvzf piper_arm64.tar.gz chmod +x piper sudo mv piper /usr/local/bin/ Test if Piper works: echo "Hello, world!" | piper --model en_US --output_file output.wav aplay output.wav Now the Pi could "talk" back! 4. Creating the Backend (Node.js) I built a simple Node.js server to: Accept text from the client (voice input from a web app). Process it using Mistral (via Ollama). Convert the LLM response to speech with Piper. Stream the audio back to the client. server.js: const express = require('express'); const { exec } = require('child_process'); const WebSocket = require('ws'); const app = express(); const PORT = 3001; // WebSocket setup const wss = new WebSocket.Server({ port: 3002 }); wss.on('connection', (ws) => { console.log('Client connected'); ws.on('message', (message) => { console.log('Received:', message); // Run Mistral LLM exec(`ollama run mistral "${message}"`, (err, stdout) => { if (err) { console.error('LLM error:', err); ws.send('Error processing your request.'); return; } // Convert LLM response to speech using Piper exec(`echo "${stdout}" | piper --model en_US --output_file output.wav`, (ttsErr) => { if (ttsErr) { console.error('Piper error:', ttsErr); ws.send('Error generating speech.'); return; } // Send the audio file back to the client ws.send(JSON.stringify({ text: stdout, audio: 'output.wav' })); }); }); }); }); app.listen(PORT, () => { console.log(`Server running at http://localhost:${PORT}`); }); 5. Building the Real-Time Web Interface (React) For the frontend, I created a simple React app to: Record voice input. Display real-time text responses. Play the generated speech audio. App.js: import React, { useState } from 'react'; function App() { const [text, setText] = useState(''); const [response, setResponse] = useState(''); const [audio, setAudio] = useState(null); const ws = new WebSocket('ws://localhost:3002'); const handleSend = () => { ws.send(text); }; ws.onmessage = (event) => { const data = JSON.parse(event.data); setResponse(data.text); fetch(`http://localhost:3001/${data.audio}`) .then(res => res.blob()) .then(blob => { setAudio(URL.createObjectURL(blob)); }); }; return ( Voice Assistant setText(e.target.value)} /> Send Response: {response} {audio && } ); } export default App; 6. Running the Project Once the backend and frontend were ready, I launched both: Start the backend: node server.js Run the React app: npm start I accessed the web app on my Raspberry Pi’s IP at port 3000 and spoke into the mic — and voilà! The assistant responded in real-time, all processed locally. Conclusion Building a real-time, fully offline voice assistant on a Raspberry Pi was an exciting challenge. With: Ollama for running local LLMs (like Mistral) Piper for high-quality text-to-speech WebSockets for real-time communication React for a smooth web interface ... I now have a personalized voice AI that works without relying on th

Mar 1, 2025 - 10:24
 0
Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi

Introduction

In this document, I’m sharing my journey of turning a Raspberry Pi into a powerful, real-time voice assistant. The goal was to:

  • Capture voice input through a web interface.
  • Process the text using a local LLM (like Mistral) running on the Pi.
  • Generate voice responses using Piper for text-to-speech (TTS).
  • Stream everything in real-time via WebSockets.

All of this runs offline on the Raspberry Pi — no cloud services involved. Let’s dive into how I built it step by step!

1. Setting up the Raspberry Pi

First, I set up my Raspberry Pi with the latest Raspberry Pi OS. It’s important to enable hardware interfaces and connect a USB microphone and speaker.

Steps:

  1. Update the system:
   sudo apt-get update
   sudo apt-get upgrade
  1. Enable the audio interface:
   sudo raspi-config

Navigate to System Options > Audio and select the correct output/input device.

2. Installing Ollama for Local LLMs

Ollama makes it easy to run local LLMs like Mistral on your Raspberry Pi. I installed it using:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, I pulled the Mistral model:

ollama pull mistral

To confirm it works, I ran a quick test:

ollama run mistral

The model was ready to process text right on the Pi!

3. Setting up Piper for Text-to-Speech (TTS)

For offline voice generation, I chose Piper — a fantastic open-source TTS engine.

  1. Install dependencies:
   sudo apt-get install wget build-essential libsndfile1
  1. Download Piper for ARM64 (Raspberry Pi):
   wget https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz
   tar -xvzf piper_arm64.tar.gz
   chmod +x piper
   sudo mv piper /usr/local/bin/
  1. Test if Piper works:
   echo "Hello, world!" | piper --model en_US --output_file output.wav
   aplay output.wav

Now the Pi could "talk" back!

4. Creating the Backend (Node.js)

I built a simple Node.js server to:

  • Accept text from the client (voice input from a web app).
  • Process it using Mistral (via Ollama).
  • Convert the LLM response to speech with Piper.
  • Stream the audio back to the client.

server.js:

const express = require('express');
const { exec } = require('child_process');
const WebSocket = require('ws');

const app = express();
const PORT = 3001;

// WebSocket setup
const wss = new WebSocket.Server({ port: 3002 });

wss.on('connection', (ws) => {
  console.log('Client connected');

  ws.on('message', (message) => {
    console.log('Received:', message);

    // Run Mistral LLM
    exec(`ollama run mistral "${message}"`, (err, stdout) => {
      if (err) {
        console.error('LLM error:', err);
        ws.send('Error processing your request.');
        return;
      }

      // Convert LLM response to speech using Piper
      exec(`echo "${stdout}" | piper --model en_US --output_file output.wav`, (ttsErr) => {
        if (ttsErr) {
          console.error('Piper error:', ttsErr);
          ws.send('Error generating speech.');
          return;
        }

        // Send the audio file back to the client
        ws.send(JSON.stringify({ text: stdout, audio: 'output.wav' }));
      });
    });
  });
});

app.listen(PORT, () => {
  console.log(`Server running at http://localhost:${PORT}`);
});

5. Building the Real-Time Web Interface (React)

For the frontend, I created a simple React app to:

  • Record voice input.
  • Display real-time text responses.
  • Play the generated speech audio.

App.js:

import React, { useState } from 'react';

function App() {
  const [text, setText] = useState('');
  const [response, setResponse] = useState('');
  const [audio, setAudio] = useState(null);

  const ws = new WebSocket('ws://localhost:3002');

  const handleSend = () => {
    ws.send(text);
  };

  ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    setResponse(data.text);

    fetch(`http://localhost:3001/${data.audio}`)
      .then(res => res.blob())
      .then(blob => {
        setAudio(URL.createObjectURL(blob));
      });
  };

  return (
    <div>
      <h1>Voice Assistant</h1>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleSend}>Send</button>
      <h2>Response:</h2>
      <p>{response}</p>
      {audio && <audio controls src={audio} />}
    </div>
  );
}

export default App;

6. Running the Project

Once the backend and frontend were ready, I launched both:

  • Start the backend:
  node server.js
  • Run the React app:
  npm start

I accessed the web app on my Raspberry Pi’s IP at port 3000 and spoke into the mic — and voilà! The assistant responded in real-time, all processed locally.

Conclusion

Building a real-time, fully offline voice assistant on a Raspberry Pi was an exciting challenge. With:

  • Ollama for running local LLMs (like Mistral)
  • Piper for high-quality text-to-speech
  • WebSockets for real-time communication
  • React for a smooth web interface

... I now have a personalized voice AI that works without relying on the cloud.