Towards Data Science

Anatomy of a Parquet File

Parquet from scratch: A Python deep dive into a raw parquet file The post Anatom...

Fourier Transform Applications in Literary Analysis

How mathematics and data analysis can offer a head start to analysing poetry, be...

Are You Still Using LoRA to Fine-Tune Your LLM?

A look at this year’s crop of LoRA alternatives The post Are You Still Using LoR...

Mastering Hadoop, Part 2: Getting Hands-On — Setting Up...

Understanding Hadoop’s core components before installation and scaling The post ...

2026 Will Be The Year of Data + AI Observability

Observations on performance and reliability from conversations with dozens of te...

How to Switch from Data Analyst to Data Scientist

And get hired! The post How to Switch from Data Analyst to Data Scientist appear...

7 Powerful DBeaver Tips and Tricks to Improve Your SQL ...

Straight-to-the-point tips for the best SQL IDE The post 7 Powerful DBeaver Tips...

Mastering Hadoop, Part 1: Installation, Configuration, ...

A comprehensive guide covering Hadoop setup, HDFS commands, MapReduce, debugging...

Experiments Illustrated: Can $1 Change Behavior More Th...

A small prize for something easy vs a big prize for something difficult? The pos...

Heatmaps for Time Series 

Visualizing trends and outliers with non-linear color scales The post Heatmaps f...

How to Develop Complex DAX Expressions

On the importance of taking some time to thoroughly understand the needs and the...

How to Make Your LLM More Accurate with RAG & Fine-Tuning

And when to use which one The post How to Make Your LLM More Accurate with RAG &...

Platform-Mesh, Hub and Spoke, and Centralised | 3 Types...

Why understanding team structure is critical for data and AI The post Platform-M...

This Is How LLMs Break Down the Language

The science and art behind tokenization The post This Is How LLMs Break Down the...

Linear Regression in Time Series: Sources of Spurious R...

Why does the autocorrelation of the errors term matter? The post Linear Regressi...

From Fuzzy to Precise: How a Morphological Feature Extr...

Mimicking human visual perception to truly understand objects The post From Fuzz...

LettuceDetect: A Hallucination Detection Framework for ...

How to capitalize on ModernBERT’s extended context window to build a token-level...

Experiments Illustrated: How We Optimized Premium Listi...

Also, how georandomization can help clean up spillovers The post Experiments Ill...

Experiments Illustrated: How Random Assignment Saved Us...

Also, a casual intro to the multiple comparisons problem The post Experiments Il...

Custom Training Pipeline for Object Detection Models

I examined several well-known object detection pipelines and designed one that b...

Comprehensive Guide to Dependency Management in Python

Master the management of virtual environments The post Comprehensive Guide to De...

Image Captioning, Transformer Mode On

Implementing CPTR (CaPtion TransformeR) from scratch with PyTorch The post Image...

Using GPT-4 for Personal Styling

Data management, GPT context limits, and real-world challenges The post Using GP...

When You Just Can’t Decide on a Single Action

Game Theory 101: Mixing strategies The post When You Just Can’t Decide on a Sing...

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.