How I Met Your Evaluation Strategy and Almost Shot Myself

It all started when I decided to run Mistral on my laptopй. Just the usual: scripts, datasets, terminal. But Mistral had other plans. I was expecting strings, but 75% of the data turned out to be lists. The model stared back at me like a cat being offered new food — complete confusion. I waved my hands, threw in more code, but nothing. Finally, I said to hell with the lists, let’s stick to dict. And things got better. For a while. New round. Training still wasn’t working. evaluation_strategy="epoch"? Six hours wasted on one line! Turns out my version of TrainingArguments didn’t support half the parameters. Cut out everything unnecessary, left only the bare essentials. The model finally started responding. But not on the GPU. It was chugging along on the CPU. Cool, but while I was wrestling with those epochs, I forgot about the GPU. Fine, I install accelerate. The terminal started spitting out error messages like a possessed printer. Apparently, pytorch and accelerate didn’t get along. Alright, we’re tough — fixed the versions, resolved conflicts. Now it should work, right? Wrong. Mistral in fp16 mode wanted 14 GiB of VRAM, and I only had 8. It’s like trying to park a truck in a basement. Okay, fine — let’s go 4-bit quantization. Set up LoRA. r=8, alpha=32. Training started again. But then grad_norm suddenly turned into NaN. The model fell silent, like a lady who’s just been insulted. I dove back into the code. Turns out model.half() was breaking everything. Alright, ditch .half() and rerun. And finally, the long-awaited message: Houston, we have liftoff! Numbers start flashing on the screen. train_loss — 2.14. Not bad for the first epoch. grad_norm was stable, learning_rate was gradually decreasing. I’m sitting there like a cat after a successful hunt — exhausted, but happy. GCP is resting, my laptop smells like it’s about to catch fire, and Mistral is finally alive. And this is just the first chapter. qapix #Mistral7B #MachineLearning #TrainingAI #DeepLearning #AIJourney #ModelTraining #DataScience #Debugging #GPU #LoRA #Quantization #CodingLife #GCP #TechDiaries #Programming #APIIntegration #LearningRate #DataPrep

May 18, 2025 - 10:56

How I Met Your Evaluation Strategy and Almost Shot Myself

It all started when I decided to run Mistral on my laptopй. Just the usual: scripts, datasets, terminal. But Mistral had other plans. I was expecting strings, but 75% of the data turned out to be lists. The model stared back at me like a cat being offered new food — complete confusion. I waved my hands, threw in more code, but nothing. Finally, I said to hell with the lists, let’s stick to dict. And things got better. For a while.
New round. Training still wasn’t working. evaluation_strategy="epoch"? Six hours wasted on one line! Turns out my version of TrainingArguments didn’t support half the parameters. Cut out everything unnecessary, left only the bare essentials. The model finally started responding. But not on the GPU. It was chugging along on the CPU. Cool, but while I was wrestling with those epochs, I forgot about the GPU.
Fine, I install accelerate. The terminal started spitting out error messages like a possessed printer. Apparently, pytorch and accelerate didn’t get along. Alright, we’re tough — fixed the versions, resolved conflicts. Now it should work, right? Wrong. Mistral in fp16 mode wanted 14 GiB of VRAM, and I only had 8. It’s like trying to park a truck in a basement. Okay, fine — let’s go 4-bit quantization.
Set up LoRA. r=8, alpha=32. Training started again. But then grad_norm suddenly turned into NaN. The model fell silent, like a lady who’s just been insulted. I dove back into the code. Turns out model.half() was breaking everything. Alright, ditch .half() and rerun. And finally, the long-awaited message: Houston, we have liftoff!
Numbers start flashing on the screen. train_loss — 2.14. Not bad for the first epoch. grad_norm was stable, learning_rate was gradually decreasing. I’m sitting there like a cat after a successful hunt — exhausted, but happy. GCP is resting, my laptop smells like it’s about to catch fire, and Mistral is finally alive.
And this is just the first chapter.

qapix #Mistral7B #MachineLearning #TrainingAI #DeepLearning #AIJourney #ModelTraining #DataScience #Debugging #GPU #LoRA #Quantization #CodingLife #GCP #TechDiaries #Programming #APIIntegration #LearningRate #DataPrep