Creating AI Memories using Rig & MongoDB

Introduction Artificial intelligence (AI) applications that engage in conversations often benefit from memory—storing past interactions to provide more contextually aware responses. Memory in AI can be categorized into short-term and long-term memory. Short-term memory involves tracking recent exchanges within a single session, while long-term memory enables AI to recall past interactions across different sessions. In this article, we’ll build a CLI tool using the Rig AI framework and MongoDB for retrieval-augmented generation (RAG). This tool will store summarized conversations in a database and retrieve them when needed, enabling the AI to maintain context over time. What is Memory in AI? Memory in AI allows systems to retain and utilize past interactions to improve responses. There are two key types of memory: Short-term memory: Captures recent exchanges in an ongoing conversation. Typically stored in-memory, it resets after a session ends. An example of this might be a chat history. Long-term memory: Stores relevant conversation history and facts persistently, allowing recall across sessions. Usually, this is handled using vector databases or structured databases like MongoDB. By combining Rig's AI processing capabilities with MongoDB’s persistence, we can build an AI assistant that not only responds intelligently in the moment but also remembers past interactions. Getting Started Pre-requisites Before we get started, you'll need a MongoDB instance for storing long-term memory. You'll need to ensure that you sign up for MongoDB Atlas (they provide one free cluster which we'll be using!), and create a database (called knowledgebase) in which we'll put a collection (called memories). On that collection, we need an index that looks like the following (using the JSON editor): { "fields": [ { "numDimensions": 1536, "path": "embedding", "similarity": "cosine", "type": "vector" } ] } You'll also want an OpenAI API key as we'll be using their embedding and chat completion models. Creating our new project Before we begin, ensure you have Rust installed. Then, create a new Rust project: cargo new ai-memory-cli cd ai-memory-cli This project will serve as the foundation for our AI memory system. Adding Dependencies To integrate Rig-Core and MongoDB, we need to add the necessary dependencies. You can copy the command below for easy installation: cargo add rig-core rig-mongodb mongodb tokio nanoid serde serde-json chrono -F rig-core/derive,tokio/macros,serde/derive What did we just add? rig-core: The core rig library required to use Rig. We add the derive feature so we can easily implement the Embed trait for structs we want to add as documents into our vector store. rig-mongodb: rig's MongoDB integration. mongodb: The MongoDB Rust SDK. tokio: The Tokio runtime, used for ergonomic access to async Rust. We use the macros feature for access to the #[tokio::main] macro, which allows for convenient async runtime setup. nanoid: A quick, compact and easy way to get random IDs without using UUIDs (via nanoid!()). serde: A de/serialization library. We attach the derive feature to make it easy to implement serialization and deserialization for our structs. serde-json: serde's JSON crate. chrono: A crate for working with time. We use this crate in this project to get UNIX timestamps. Taking User Input Taking user input is quite easy. We simply create a mutable empty string, prompt the user to input something (and flush the output so it actually prints instead of potentially staying in buffer) and return the user input from the function. fn take_user_input() -> String { let mut string = String::new(); print!("> "); let _ = stdout().flush(); stdin().read_line(&mut string).unwrap(); string } From this, we can create a simple async fn main that instantiates an OpenAI client, then loops infinitely (until it either crashes or the user presses ctrl-c) and answers user questions. use rig::providers::openai::Client; #[tokio::main] async fn main() -> Result { let openai_client = Client::from_env(); loop { let query = take_user_input(); let agent = openai_client .agent("gpt-4o") .preamble("You are a helpful agent.") .build(); let response = agent.prompt(query.as_ref()).await?; println!("{response}"); } } However, we can do much better than this. Let's add some short term memory to our AI so it can more competently answer user questions. Adding short term memory Relative to long term memory, short term memory is pretty easy to add. In terms of practical implementation, you can do something as simple as storing all the messages in a Vec and then sending that list of messages in your prompt. See below:

Mar 12, 2025 - 21:49

Creating AI Memories using Rig & MongoDB

Introduction

Artificial intelligence (AI) applications that engage in conversations often benefit from memory—storing past interactions to provide more contextually aware responses. Memory in AI can be categorized into short-term and long-term memory. Short-term memory involves tracking recent exchanges within a single session, while long-term memory enables AI to recall past interactions across different sessions.

In this article, we’ll build a CLI tool using the Rig AI framework and MongoDB for retrieval-augmented generation (RAG). This tool will store summarized conversations in a database and retrieve them when needed, enabling the AI to maintain context over time.

What is Memory in AI?

Memory in AI allows systems to retain and utilize past interactions to improve responses. There are two key types of memory:

Short-term memory: Captures recent exchanges in an ongoing conversation. Typically stored in-memory, it resets after a session ends. An example of this might be a chat history.
Long-term memory: Stores relevant conversation history and facts persistently, allowing recall across sessions. Usually, this is handled using vector databases or structured databases like MongoDB.

By combining Rig's AI processing capabilities with MongoDB’s persistence, we can build an AI assistant that not only responds intelligently in the moment but also remembers past interactions.

Getting Started

Pre-requisites

Before we get started, you'll need a MongoDB instance for storing long-term memory. You'll need to ensure that you sign up for
MongoDB Atlas (they provide one free cluster which we'll be using!), and create a database (called knowledgebase) in which we'll put a collection (called memories).

On that collection, we need an index that looks like the following (using the JSON editor):

{
    "fields": [
        {
            "numDimensions": 1536,
            "path": "embedding",
            "similarity": "cosine",
            "type": "vector"
        }
    ]
}

You'll also want an OpenAI API key as we'll be using their embedding and chat completion models.

Creating our new project

Before we begin, ensure you have Rust installed. Then, create a new Rust project:

cargo new ai-memory-cli
cd ai-memory-cli

This project will serve as the foundation for our AI memory system.

Adding Dependencies

To integrate Rig-Core and MongoDB, we need to add the necessary dependencies. You can copy the command below for easy installation:

cargo add rig-core rig-mongodb mongodb tokio nanoid serde serde-json chrono -F rig-core/derive,tokio/macros,serde/derive

What did we just add?

rig-core: The core rig library required to use Rig. We add the derive feature so we can easily implement the Embed trait for structs we want to add as documents into our vector store.
rig-mongodb: rig's MongoDB integration.
mongodb: The MongoDB Rust SDK.
tokio: The Tokio runtime, used for ergonomic access to async Rust. We use the macros feature for access to the #[tokio::main] macro, which allows for convenient async runtime setup.
nanoid: A quick, compact and easy way to get random IDs without using UUIDs (via nanoid!()).
serde: A de/serialization library. We attach the derive feature to make it easy to implement serialization and deserialization for our structs.
serde-json: serde's JSON crate.
chrono: A crate for working with time. We use this crate in this project to get UNIX timestamps.

Taking User Input

Taking user input is quite easy. We simply create a mutable empty string, prompt the user to input something (and flush the output so it actually prints instead of potentially staying in buffer) and return the user input from the function.

fn take_user_input() -> String {
    let mut string = String::new();

    print!("> ");
    let _ = stdout().flush();

    stdin().read_line(&mut string).unwrap();

    string
}

From this, we can create a simple async fn main that instantiates an OpenAI client, then loops infinitely (until it either crashes or the user presses ctrl-c) and answers user questions.

use rig::providers::openai::Client;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let openai_client = Client::from_env();

    loop {
        let query = take_user_input();

        let agent = openai_client
            .agent("gpt-4o")
            .preamble("You are a helpful agent.")
            .build();

        let response = agent.prompt(query.as_ref()).await?;

        println!("{response}");
    }
}

However, we can do much better than this. Let's add some short term memory to our AI so it can more competently answer user questions.

Adding short term memory

Relative to long term memory, short term memory is pretty easy to add. In terms of practical implementation, you can do something as simple as storing all the messages in a Vec and then sending that list of messages in your prompt. See below:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut messages = Vec::new();

    let openai_client = Client::from_env();
    let embedding_model = openai_client.embedding_model("text-embedding-ada-002");

    loop {
        let agent = openai_client
            .agent("gpt-4o")
            .preamble("You are a helpful agent.")
            .build();

        let response = agent.chat(query.as_ref(), messages.clone()).await.unwrap();

        println!("{response}");

        messages.push(Message::user(query));
        messages.push(Message::assistant(response));
    }
}

Adding long term memory

Now for the hard part: adding long term memory! There's several different ways we can do this. A naive impementation might try to summarize a conversation on every turn: however, this is extremely resource intensive in terms of tokens (and potentially GPU or CPU usage if you're running your model locally). Typically, a system might have several different criterias for adding memory (note that this list is non-exhaustive):

A new topic
A given checkpoint in the conversation (ie, send the conversation to another model to check if there is anything significant to keep, if there is, store it)
A given conversation chunk size

We'll be implementing our long term memory system to try and summarize conversation chunks every 3 turns (so 6 messages).

Summarizing a Conversation and Adding It to the Vector Store

To efficiently store memory, we first summarize conversations before adding them to the database. This prevents excessive storage usage and allows the AI to retrieve meaningful insights instead of raw message logs. The system first extracts relevant information by identifying key points from the discussion. Once the essential details are identified, we generate a concise summary using an LLM that captures the core context. This structured summary is then stored in MongoDB, allowing for efficient retrieval and future use in maintaining AI memory.

To get started, we'll define a struct that holds the MongoDB object ID, the conversation ID (which will be randomly generated later on), the memory itself and the UNIX timestamp at which the memory was created. Note that because of the way ID is handled in Mongo, we create a custom function that can deserialize either as a string or an object.

use nanoid::nanoid;
use chrono::Utc;

use serde::{Deserialize, Deserializer};
use serde_json::Value;



// Shape of data that needs to be RAG'ed.
// The definition field will be used to generate embeddings.
#[derive(Embed, Clone, Deserialize, Debug)]
struct Memory {
    #[serde(rename = "_id", deserialize_with = "deserialize_object_id")]
    id: String,
    conversation_id: String,
    #[embed]
    memory: String,
    timestamp_created: usize,
}

impl Memory {
    fn new(conversation_id: &str, memory: String) -> Self {
        let id = nanoid!(10);

        let conversation_id = conversation_id.to_string();
        let timestamp_created = Utc::now().timestamp();

        Self {
            id,
            conversation_id,
            memory,
            timestamp_created,
        }
    }
}

fn deserialize_object_id<'de, D>(deserializer: D) -> Result<String, D::Error>
where
    D: Deserializer<'de>,
{
    let value = Value::deserialize(deserializer)?;
    match value {
        Value::String(s) => Ok(s),
        Value::Object(map) => {
            if let Some(Value::String(oid)) = map.get("$oid") {
                Ok(oid.to_string())
            } else {
                Err(serde::de::Error::custom(
                    "Expected $oid field with string value",
                ))
            }
        }
        _ => Err(serde::de::Error::custom(
            "Expected string or object with $oid field",
        )),
    }
}

We'll also create a function for connecting to MongoDB. Note that this requires us to set our MongoDB connection string as an environment variable using MONGODB_CONNECTION_STRING:

use mongodb::{Client as MongoClient, Collection};
use mongodb::bson;
use mongodb::options::ClientOptions;

async fn connect_to_mongodb() -> Collection<bson::Document> {
    // Initialize MongoDB client
    let mongodb_connection_string = std::env::var("MONGODB_CONNECTION_STRING").expect("MONGODB_CONNECTION_STRING not set");
    let options = ClientOptions::parse(mongodb_connection_string)
        .await
        .expect("MongoDB connection string should be valid");

    let mongodb_client =
        MongoClient::with_options(options).expect("MongoDB client options should be valid");

    // Initialize MongoDB vector store
    let collection: Collection<bson::Document> = mongodb_client
        .database("knowledgebase")
        .collection("memories");

    collection
}

Next, we'll implement our summarize_chunks function which will ensure there are enough new messages, send the messages to an LLM and ask it to summarize the messages, then store the messages in MongoDB.

At the start of the function, we try to get the last 6 messages by reversing the iterator for our Vec (so it starts at the last pushed message) then takes 6 messages. If there's less than 6, we return an error stating there are not enough messages to summarize.

use rig::providers::openai::EmbeddingModel;

async fn summarize_chunks(
    openai_client: &Client,
    mongo: &Collection<bson::Document>,
    messages: Vec<Message>,
    conversation_id: &str,
    embedding_model: &EmbeddingModel,
) -> Result<(), Box<dyn std::error::Error>> {
    let messages = messages.into_iter().rev().take(6).collect::<Vec<Message>>();

    // Ensure there are a reasonable number of turns that have passed before trying to summarize the message.
    // In this case, we want to summarize every 3 turns - so we need to ensure the remainder after dividing by 6 is 0
    // (because 3 turns = 6 messages)
    // If it doesn't meet the required criteria, just return early
    if messages.len() % 6 != 0 {
        return Err("Not enough new messages to summarize".into());
    }

    // rest of code goes here
    Ok(())
}

To be able to get the memories back in the format that we can use easily, we'll set up an extractor asking the LLM to summarize an inputted conversation chunk into a list of factoids, using the JSON schema provided. In this case it's a Vec, but you could easily extend this if you wanted to.

For embedding our memories into Mongo, we will additionally enrich the data by using Memory::new() to create a Vec out of the strings.

    let agent = openai_client.extractor::<Vec<String>>("gpt-4o").preamble("Please summarize the inputted conversation chunk by the user into a list of factoids, using the JSON schema provided.").build();

    let messages_as_string = serde_json::to_string_pretty(&messages).unwrap();

    let res = agent.extract(&messages_as_string).await.unwrap();

    let response_as_memory_entries = res
        .into_iter()
        .map(|x| Memory::new(&conversation_id, x))
        .collect::<Vec<Memory>>();

Finally, we need to use EmbeddingsBuilder to create an iterator that will output both embeddings and documents. We'll use this to create our final Vec (an array of MongoDB BSON documents) which we'll then finally insert into Mongo. Done!

use rig::embeddings::EmbeddingsBuilder;
use mongodb::bson::doc;

    let embeddings = EmbeddingsBuilder::new(embedding_model.clone())
        .documents(response_as_memory_entries)?
        .build()
        .await?;

    let mongo_documents = embeddings
        .iter()
        .map(
            |(
                Memory {
                    id,
                    conversation_id,
                    memory,
                    timestamp_created,
                },
                embedding,
            )| {
                doc! {
                    "id": id.clone(),
                    "conversation_id": conversation_id.clone(),
                    "memory": memory.clone(),
                    "timestamp_created": timestamp_created.clone(),
                    "embedding": embedding.first().vec.clone(),
                }
            },
        )
        .collect::<Vec<_>>();

    mongo.insert_many(mongo_documents).await?;

The full code for the summarize_chunks function can found below.

async fn summarize_chunks(
    openai_client: &Client,
    mongo: &Collection<bson::Document>,
    messages: Vec<Message>,
    conversation_id: &str,
    embedding_model: &EmbeddingModel,
) -> Result<(), Box<dyn std::error::Error>> {
    let messages = messages.into_iter().rev().take(6).collect::<Vec<Message>>();

    // Ensure there are a reasonable number of turns that have passed before trying to summarize the message.
    // In this case, we want to summarize every 3 turns - so we need to ensure the remainder after dividing by 6 is 0
    // (because 3 turns = 6 messages)
    // If it doesn't meet the required criteria, just return early
    if messages.len() % 6 != 0 {
        return Err("Not enough new messages to summarize".into());
    }

    let agent = openai_client.extractor::<Vec<String>>("gpt-4o").preamble("Please summarize the inputted conversation chunk by the user into a list of factoids, using the JSON schema provided.").build();

    let messages_as_string = serde_json::to_string_pretty(&messages).unwrap();

    let res = agent.extract(&messages_as_string).await.unwrap();

    let response_as_memory_entries = res
        .into_iter()
        .map(|x| Memory::new(&conversation_id, x))
        .collect::<Vec<Memory>>();

    let embeddings = EmbeddingsBuilder::new(embedding_model.clone())
        .documents(response_as_memory_entries)?
        .build()
        .await?;

    let mongo_documents = embeddings
        .iter()
        .map(
            |(
                Memory {
                    id,
                    conversation_id,
                    memory,
                    timestamp_created,
                },
                embedding,
            )| {
                doc! {
                    "id": id.clone(),
                    "conversation_id": conversation_id.clone(),
                    "memory": memory.clone(),
                    "timestamp_created": timestamp_created.clone(),
                    "embedding": embedding.first().vec.clone(),
                }
            },
        )
        .collect::<Vec<_>>();

    mongo.insert_many(mongo_documents).await?;

    Ok(())
}

To use this function, all we need to do is to use it at the end of the loop:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // .. setup goes here

    loop {
        let query = take_user_input();

        let agent = openai_client
            .agent("gpt-4o")
            .preamble("You are a helpful agent.")
            .append_preamble(&additional_input)
            .build();

        let response = agent.chat(query.as_ref(), messages.clone()).await.unwrap();

        println!("{response}");

        messages.push(Message::user(query));
        messages.push(Message::assistant(response));

        if summarize_chunks(
            &openai_client,
            &mongo,
            messages.clone(),
            &conversation_id,
            &embedding_model,
        )
        .await
        .is_ok()
        {
            println!("Saved new memories.");
        };
    }
}

Retrieving and Using Memories

To enable AI memory retrieval we need to search the vector index for relevant past conversations and return relevant summarized conversation chunks.

Search the database for relevant past conversations based on the current query.
Retrieve stored summaries and use them as context for the next response.

To get started, we will create an in-memory vector index. We'll then query the index for data in the shape of the struct we want to use (in this case Memory), and then join them all up into a single string that splits the memories by newlines for LLM ingestion.

use rig::vector_store::VectorStoreIndex;
use rig_mongodb::{MongoDbVectorIndex, SearchParams};

async fn get_relevant_memories(
    mongo: &Collection<bson::Document>,
    query: &str,
    model: &EmbeddingModel,
) -> Result<Option<Vec<String>>, Box<dyn std::error::Error>> {
    // Create a vector index on our vector store.
    // Note: a vector index called "vector_index" must exist on the MongoDB collection you are querying.
    // IMPORTANT: Reuse the same model that was used to generate the embeddings
    let index = MongoDbVectorIndex::new(
        mongo.clone(),
        model.clone(),
        "vector_index",
        SearchParams::new(),
    )
    .await?;

    // Query the index
    let results = index.top_n::<Memory>(query, 5).await?;

    let res_as_vec_strings = results
        .into_iter()
        .map(|(_, embedded_string, _)| embedded_string)
        .collect::<Vec<String>>();

    if res_as_vec_strings.is_empty() {
        return Ok(None);
    }

    Ok(Some(res_as_vec_strings))
}

Once we've ran the function, we declare a new variable that's either a newline-joined list of memories, or an empty string if there are no relevant memories (or the database just doesn't have any).

let additional_input = if let Some(messages) =
    get_relevant_memories(&mongo, &query, &embedding_model).await?
    {
        format!(
            "\n\nRelevant memorires from previous conversations: {}",
            messages.join("\n")
        )

    } else {
        String::new()
    };

let agent = openai_client
    .agent("gpt-4o")
    .preamble("You are a helpful agent.")
    .append_preamble(&additional_input)
    .build();

Trying it out

Your final async fn main should look like this:

use chrono::Utc;
use mongodb::bson::{self, doc};
use mongodb::options::ClientOptions;
use mongodb::{Client as MongoClient, Collection};
use nanoid::nanoid;
use rig::embeddings::EmbeddingsBuilder;
use rig::providers::openai::EmbeddingModel;
use rig::vector_store::VectorStoreIndex;
use rig_mongodb::{MongoDbVectorIndex, SearchParams};
use serde_json::Value;
use std::io::{Write, stdin, stdout};

use rig::{Embed, completion::Chat, message::Message, providers::openai::Client};
use serde::{Deserialize, Deserializer};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut messages = Vec::new();
    let mongo = connect_to_mongodb().await;

    let conversation_id = nanoid!(6);

    let openai_client = Client::from_env();
    let embedding_model = openai_client.embedding_model("text-embedding-ada-002");

    loop {
        let query = take_user_input();

        let additional_input = if let Some(messages) =
            get_relevant_memories(&mongo, &query, &embedding_model).await?
        {
            format!(
                "\n\nRelevant memorires from previous conversations: {}",
                messages.join("\n")
            )
        } else {
            String::new()
        };

        let agent = openai_client
            .agent("gpt-4o")
            .preamble("You are a helpful agent.")
            .append_preamble(&additional_input)
            .build();

        let response = agent.chat(query.as_ref(), messages.clone()).await.unwrap();

        println!("{response}");

        messages.push(Message::user(query));
        messages.push(Message::assistant(response));

        if summarize_chunks(
            &openai_client,
            &mongo,
            messages.clone(),
            &conversation_id,
            &embedding_model,
        )
        .await
        .is_ok()
        {
            println!("Saved new memories.");
        };
    }
}

To use it, make sure you have your OpenAI API key and MongoDB URLs added as OPENAI_API_KEY and MONGODB_CONNECTION_STRING as environment variables, then simply use cargo run and try it out!

Conclusion

By combining Rig’s AI capabilities with MongoDB’s robust storage, we’ve created a system that allows an AI assistant to retain and recall relevant past conversations. This setup enables both short-term and long-term memory, making interactions feel more natural and context-aware.

For additional Rig resources and community engagement:

Check out more examples in our gallery.
Contribute or report issues on our GitHub.
Join discussions in our Discord community!