macOS AI Agent

Building a Gemini AI Assistant for macOS I created the first macOS AI-powered Agent: Meet the AI-Agent, a native macOS application that integrates with Google's Gemini AI to provide a seamless assistant experience. This project is open-source, and we encourage you to test it, modify it, and contribute to its development. SchBenedikt / ai-agent Testing macOS AI Agent with Google Gemini Live Web API Gemini Assistant macOS App A native macOS application that connects to Google's Gemini AI. The app automatically accesses your camera and microphone to provide a seamless AI assistant experience Features Audio input through your microphone Visual context through your camera Text responses displayed in the app Audio responses played through your speakers Setup Prerequisites Python 3.8+ A Google Gemini API key Installation Install the required dependencies: pip install google-generativeai opencv-python pyaudio pillow mss PyQt5 pynput python-dotenv pyinstaller Set your Gemini API key as an environment variable (optional): export GEMINI_API_KEY="your-api-key-here" If not set as an environment variable, the app will ask for it on startup. Building the macOS App There are two ways to build the app: Method 1: Using PyInstaller (Recommended) PyInstaller creates a more reliable standalone application that better handles dependencies: Make sure PyInstaller is installed: pip install pyinstaller Run the build process: # First clean any previous builds… View on GitHub What is AI-Agent? The Gemini Assistant is a macOS application designed to: Capture audio input through your microphone. Use your camera for visual context. Provide AI-powered responses via text. The app leverages Google's Gemini AI for natural language understanding and response generation, making it a powerful tool for productivity and interaction. Features Audio Input: Speak to the assistant using your microphone. Visual Context: The app uses your camera to gather additional context. Text Responses: Get responses displayed in the app Customizable: Modify the code to add new features or improve existing ones How It Works The application is built using Python and integrates several libraries: PyQt5: For the user interface. OpenCV: For camera access and visual processing. PyAudio: For capturing and playing audio. Google Generative AI: For natural language processing. Python-dotenv: For managing environment variables. The app uses a .env file to store your Google Gemini API key securely. If the file doesn't exist, the app will create one for you. Getting Started Prerequisites Python 3.8 or higher A Google Gemini API key Installation Clone the repository: git clone https://github.com/SchBenedikt/ai-agent.git cd ai-agent Install the required dependencies: pip install -r requirements.txt Set your Gemini API key in the .env file: echo "GEMINI_API_KEY=your-api-key" > .env Running the App To run the app directly without building: python app.py Building the App You can build a standalone macOS application using PyInstaller: pyinstaller gemini.spec The app will be created in the dist folder as Gemini Assistant.app. Contributing We welcome contributions! Here are some ways you can help: Test the App: Run the app and report any issues. Improve the Code: Add new features or optimize existing ones. Documentation: Help us improve the documentation. Feedback We'd love to hear your thoughts! Share your feedback, suggestions, or issues in the GitHub repository. Conclusion The Gemini Assistant is a powerful example of how AI can be integrated into everyday applications. I hope you find this project as useful and enjoyable as it is! Thanks for reading, techtech

Apr 9, 2025 - 20:23
 0
macOS AI Agent

Building a Gemini AI Assistant for macOS

Image description

I created the first macOS AI-powered Agent:

Meet the AI-Agent, a native macOS application that integrates with Google's Gemini AI to provide a seamless assistant experience. This project is open-source, and we encourage you to test it, modify it, and contribute to its development.

GitHub logo SchBenedikt / ai-agent

Testing macOS AI Agent with Google Gemini Live Web API

Gemini Assistant macOS App

A native macOS application that connects to Google's Gemini AI. The app automatically accesses your camera and microphone to provide a seamless AI assistant experience image

Features

  • Audio input through your microphone
  • Visual context through your camera
  • Text responses displayed in the app
  • Audio responses played through your speakers

Setup

Prerequisites

  1. Python 3.8+
  2. A Google Gemini API key

Installation

  1. Install the required dependencies:

    pip install google-generativeai opencv-python pyaudio pillow mss PyQt5 pynput python-dotenv pyinstaller
    
  2. Set your Gemini API key as an environment variable (optional):

    export GEMINI_API_KEY="your-api-key-here"
    

    If not set as an environment variable, the app will ask for it on startup.

Building the macOS App

There are two ways to build the app:

Method 1: Using PyInstaller (Recommended)

PyInstaller creates a more reliable standalone application that better handles dependencies:

  1. Make sure PyInstaller is installed:

    pip install pyinstaller
    
  2. Run the build process:

    # First clean any previous builds

What is AI-Agent?

The Gemini Assistant is a macOS application designed to:

  • Capture audio input through your microphone.
  • Use your camera for visual context.
  • Provide AI-powered responses via text.

The app leverages Google's Gemini AI for natural language understanding and response generation, making it a powerful tool for productivity and interaction.

Features

  • Audio Input: Speak to the assistant using your microphone.
  • Visual Context: The app uses your camera to gather additional context.
  • Text Responses: Get responses displayed in the app
  • Customizable: Modify the code to add new features or improve existing ones

How It Works

The application is built using Python and integrates several libraries:

  • PyQt5: For the user interface.
  • OpenCV: For camera access and visual processing.
  • PyAudio: For capturing and playing audio.
  • Google Generative AI: For natural language processing.
  • Python-dotenv: For managing environment variables.

The app uses a .env file to store your Google Gemini API key securely. If the file doesn't exist, the app will create one for you.

Getting Started

Prerequisites

Installation

  1. Clone the repository:
   git clone https://github.com/SchBenedikt/ai-agent.git
   cd ai-agent
  1. Install the required dependencies:
   pip install -r requirements.txt
  1. Set your Gemini API key in the .env file:
   echo "GEMINI_API_KEY=your-api-key" > .env

Running the App

To run the app directly without building:

python app.py

Building the App

You can build a standalone macOS application using PyInstaller:

pyinstaller gemini.spec

The app will be created in the dist folder as Gemini Assistant.app.

Contributing

We welcome contributions! Here are some ways you can help:

  • Test the App: Run the app and report any issues.
  • Improve the Code: Add new features or optimize existing ones.
  • Documentation: Help us improve the documentation.

Feedback

We'd love to hear your thoughts! Share your feedback, suggestions, or issues in the GitHub repository.

Conclusion

The Gemini Assistant is a powerful example of how AI can be integrated into everyday applications.

I hope you find this project as useful and enjoyable as it is!

Thanks for reading,
techtech