Populating a RAG with data from enterprise documents repositories for Generative AI

Ingesting data from OneDrive into Milvus in order to use with generative ai. Image courtesy Milvus.io Introduction and motivation In a recent project we help our customer to populate a RAG vector database (Milvus in our case) with content ingestion from the company’s Sharepoint/OneDrive repositories, which would then be using a LLM fitting the best of the customer’s requirement on watsonx.ai platform, for instance a Granite model. This article describes the code I prepare and test locally on my computer. This code will be adapted to be used on the customer’s platform, but as usual I like to do my research beforehand… (I’m a bit a geek I guess). Local Milvus Preparation In a previous article I wrote about the difficulties I encountered on setting-up a Milvus environment on my laptop, well I guess I finally found a very straightforward and simple way to do this. Hereafter are the steps. In order to do this you need either Docker, or in my case I use Podman. # fetch the startup script curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh The next step and all you have to do is to run the script, although I have made an alias on my laptop which makes alias docker=”podman”, for the simplification in this case, I replace all ‘sudo docker’ commands to ‘sudo podman’ un the downloaded ‘standalone_embed.sh’ file which I renamed. # the original call is 'bash standalone_embed.sh start' bash standalone_podman_embed.sh start And that’s it. Your Milvus instance should be up and running. But we’re going to be sure of it. Test Milvus database connection and prepare the main application As I write the application in Python, I prepare a virtual environment (a best practice I recommend to everyone which I learnt the hard way!

Mar 15, 2025 - 09:34

Populating a RAG with data from enterprise documents repositories for Generative AI

Ingesting data from OneDrive into Milvus in order to use with generative ai.

Image courtesy Milvus.io

Introduction and motivation

In a recent project we help our customer to populate a RAG vector database (Milvus in our case) with content ingestion from the company’s Sharepoint/OneDrive repositories, which would then be using a LLM fitting the best of the customer’s requirement on watsonx.ai platform, for instance a Granite model.

This article describes the code I prepare and test locally on my computer. This code will be adapted to be used on the customer’s platform, but as usual I like to do my research beforehand… (I’m a bit a geek I guess).

Local Milvus Preparation

In a previous article I wrote about the difficulties I encountered on setting-up a Milvus environment on my laptop, well I guess I finally found a very straightforward and simple way to do this. Hereafter are the steps.

In order to do this you need either Docker, or in my case I use Podman.

# fetch the startup script
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh

The next step and all you have to do is to run the script, although I have made an alias on my laptop which makes alias docker=”podman”, for the simplification in this case, I replace all ‘sudo docker’ commands to ‘sudo podman’ un the downloaded ‘standalone_embed.sh’ file which I renamed.

# the original call is 'bash standalone_embed.sh start'
bash standalone_podman_embed.sh start

And that’s it. Your Milvus instance should be up and running. But we’re going to be sure of it.

Test Milvus database connection and prepare the main application

As I write the application in Python, I prepare a virtual environment (a best practice I recommend to everyone which I learnt the hard way!