What I've Learned About Deploying ML & LLM Projects from DAGsHub's CEO

Hey guys! I've been experimenting with building and deploying ML and LLM projects for a while now, and honestly, it’s been a journey. Training the models always felt more straightforward, but deploying them smoothly into production turned out to be a whole new beast. Recently, I had a really good conversation with Dean Pleban (CEO @ DAGsHub), who shared some incredibly practical insights based on his own experience helping teams go from experiments to real-world production. Here are a few valuable lessons Dean shared with me that I think could help others as well: Data matters way more than I thought. Initially, I focused a lot on model architectures and less on the quality of my data pipelines. Production performance heavily depends on robust data handling—things like proper data versioning, monitoring, and governance can save you a lot of headaches. This becomes way more important when your toy-project becomes a collaborative project with others. LLMs need their own rules. Working with large language models introduced challenges I wasn't fully prepared for—like hallucinations, biases, and the resource demands. Dean suggested frameworks like RAES (Robustness, Alignment, Efficiency, Safety) to help tackle these issues, and it’s something I’m actively trying out now. He also mentioned "LLM as a judge" which seems to be a concept that is getting a lot of attention recently. Some practical tips Dean shared with me: Save chain of thought output - you never know when you might need it. Log experiments thoroughly (parameters, hyper-parameters, models used, data-versioning...). Start with a Jupyter notebook, but move to production-grade tooling (all tools mentioned in the guide bellow

Mar 19, 2025 - 19:08

What I've Learned About Deploying ML & LLM Projects from DAGsHub's CEO

Hey guys!
I've been experimenting with building and deploying ML and LLM projects for a while now, and honestly, it’s been a journey.

Training the models always felt more straightforward, but deploying them smoothly into production turned out to be a whole new beast.

Recently, I had a really good conversation with Dean Pleban (CEO @ DAGsHub), who shared some incredibly practical insights based on his own experience helping teams go from experiments to real-world production. Here are a few valuable lessons Dean shared with me that I think could help others as well:

Data matters way more than I thought.
Initially, I focused a lot on model architectures and less on the quality of my data pipelines. Production performance heavily depends on robust data handling—things like proper data versioning, monitoring, and governance can save you a lot of headaches. This becomes way more important when your toy-project becomes a collaborative project with others.
LLMs need their own rules.
Working with large language models introduced challenges I wasn't fully prepared for—like hallucinations, biases, and the resource demands. Dean suggested frameworks like RAES (Robustness, Alignment, Efficiency, Safety) to help tackle these issues, and it’s something I’m actively trying out now. He also mentioned "LLM as a judge" which seems to be a concept that is getting a lot of attention recently.

Some practical tips Dean shared with me:

Save chain of thought output - you never know when you might need it.
Log experiments thoroughly (parameters, hyper-parameters, models used, data-versioning...).
Start with a Jupyter notebook, but move to production-grade tooling (all tools mentioned in the guide bellow