Populating a RAG with data from enterprise documents repositories for Generative AI

Ingesting data from OneDrive into Milvus in order to use with generative ai. Image courtesy Milvus.io Introduction and motivation In a recent project we help our customer to populate a RAG vector database (Milvus in our case) with content ingestion from the company’s Sharepoint/OneDrive repositories, which would then be using a LLM fitting the best of the customer’s requirement on watsonx.ai platform, for instance a Granite model. This article describes the code I prepare and test locally on my computer. This code will be adapted to be used on the customer’s platform, but as usual I like to do my research beforehand… (I’m a bit a geek I guess). Local Milvus Preparation In a previous article I wrote about the difficulties I encountered on setting-up a Milvus environment on my laptop, well I guess I finally found a very straightforward and simple way to do this. Hereafter are the steps. In order to do this you need either Docker, or in my case I use Podman. # fetch the startup script curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh The next step and all you have to do is to run the script, although I have made an alias on my laptop which makes alias docker=”podman”, for the simplification in this case, I replace all ‘sudo docker’ commands to ‘sudo podman’ un the downloaded ‘standalone_embed.sh’ file which I renamed. # the original call is 'bash standalone_embed.sh start' bash standalone_podman_embed.sh start And that’s it. Your Milvus instance should be up and running. But we’re going to be sure of it. Test Milvus database connection and prepare the main application As I write the application in Python, I prepare a virtual environment (a best practice I recommend to everyone which I learnt the hard way!

Mar 15, 2025 - 09:34
 0
Populating a RAG with data from enterprise documents repositories for Generative AI

Ingesting data from OneDrive into Milvus in order to use with generative ai.

Image description
Image courtesy Milvus.io

Introduction and motivation

In a recent project we help our customer to populate a RAG vector database (Milvus in our case) with content ingestion from the company’s Sharepoint/OneDrive repositories, which would then be using a LLM fitting the best of the customer’s requirement on watsonx.ai platform, for instance a Granite model.

This article describes the code I prepare and test locally on my computer. This code will be adapted to be used on the customer’s platform, but as usual I like to do my research beforehand… (I’m a bit a geek I guess).

Local Milvus Preparation

In a previous article I wrote about the difficulties I encountered on setting-up a Milvus environment on my laptop, well I guess I finally found a very straightforward and simple way to do this. Hereafter are the steps.

In order to do this you need either Docker, or in my case I use Podman.

# fetch the startup script
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh

The next step and all you have to do is to run the script, although I have made an alias on my laptop which makes alias docker=”podman”, for the simplification in this case, I replace all ‘sudo docker’ commands to ‘sudo podman’ un the downloaded ‘standalone_embed.sh’ file which I renamed.

# the original call is 'bash standalone_embed.sh start'
bash standalone_podman_embed.sh start

And that’s it. Your Milvus instance should be up and running. But we’re going to be sure of it.

Test Milvus database connection and prepare the main application

As I write the application in Python, I prepare a virtual environment (a best practice I recommend to everyone which I learnt the hard way!