Logistic Regression from theory to code implementation
Imagine you are building a document filter that takes documents as input and decide whether they are fraud or not. You will need a model that doesn't just predict yes or no but gives you a probability. Like "this document is 40% likely to be fraud". Logistic Regression is perfect for this kind of problem. In this post, we'll break down the math behind *logistic regression * step-by-step. No scary equations, just clear, intuitive explanations — with a little help from Python code along the way! What is Logistic Regression? Logistic Regression is a supervised learning algorithm used for categorical classification based on threshold value between 0 and 1 (call is threshold probability if you wish).This could involve predicting if something belong to 1 out of 2 categories(binary classification) or 1 out of many discrete categories ().This could involve classifying emails into spam or not spam, classifying persons as sick or healthy , infected or not infected just to name a few. While it has "regression" in the name, logistic regression is actually about classification, not predicting a continuous number like standard linear regression does. But regression is the bases for this classification as the classification is done using continous values between 0 and 1. Mathematical background 1.Probability: A probability is just a number between 0 and 1 that tells us the likely of an event occuring. 0 means impossible and 1 means certain and 0.6 means 60% chance of occuring. 2.Odds:Odds are just another way of expressing probability. It is the ratio between probability of success(p) and probability of failure(1 - p) this is called odds for or probability of failure(1 - p) divided by probability of success (p). from this point i will only be talking about odds for Odds=p1−p \text {Odds} = \frac {p} {1 - p} Odds=1−pp Odds ranges from 0 to infinity. odds > 1 means success is more likely odds < 1 means failure is more likely odds = 1 means 50-50 chance for failure and success Odds Can only take positive numbers and we need a way to map these numbers to the set that ranges from -infinity to +infinity Logarithms are perfect for solving this problem 3.log(Odds) or Logit function: So we take the logarithm of odds, called the log-odds (or logit). Formula: log(Odds) or logit(p)=log(p1−p) \text {log(Odds)} \text{ } or \text{ } logit(p) = log(\frac {p} {1 - p}) log(Odds) or logit(p)=log(1−pp) if p > 0.5 logit(p) is positive if p < 0.5 logit(p) is negative if p = 0.5 logit(p) = 0 why are we doing all these linear models like Logistic Regression produce output from -infinity to +infinity But probabilities are restricted between 0 and 1 So we model log-odds (which stretch across the whole real number line) as a linear function of inputs. that is ; log(p1−p)=w.x+b log(\frac {p} {1 - p}) = \text {w.x} + \text{b} log(1−pp)=w.x+b where w and x are vectors this is log base e where e = 2.718... if we know that z = w.x + b gives the log(odds) how do we get the probability (p) . The Sigmoid function: This function takes a number between -infinity and +infinity and map it to a number between 0 and 1 ( a probability) It is obtained by solving for p in the logit function take z=w.x+b \text{ } {z} = \text{w}. {x} + {b} \text { } z=w.x+b then σ(z)=11+e−z \sigma_ (z)= \frac {1} {1 + e^{-z}} σ(z)=1+e−z1 if z is large and positive output will be close to 1 if z is large and negative output will be close to 0 if z = 0 output will be 0.5 To see what is looks like import numpy as np import matplotlib.pyplot as plt def sigmoid(z): return 1 / (1 + np.exp(-z)) #create 100 numbers from -10 to 10 z = np.linspace(-10, 10, 100) #Plot in a graph plt.plot(z, sigmoid(z)) plt.title("Sigmoid Function") plt.xlabel("z") plt.ylabel("σ(z)") plt.grid() plt.show() The Decision Boundary Once we have the output probability y^=σ(z)\hat{y} = \sigma(z)y^=σ(z) , how do we decide the class? Easy: If y^≥0.5\hat{y} \geq 0.5y^≥0.5 , predict class 1 (positive) If y^

Imagine you are building a document filter that takes documents as input and decide whether they are fraud or not. You will need a model that doesn't just predict yes or no but gives you a probability. Like "this document is 40% likely to be fraud".
Logistic Regression is perfect for this kind of problem.
In this post, we'll break down the math behind *logistic regression * step-by-step. No scary equations, just clear, intuitive explanations — with a little help from Python code along the way!
What is Logistic Regression?
Logistic Regression is a supervised learning algorithm used for categorical classification based on threshold value between 0 and 1 (call is threshold probability if you wish).This could involve predicting if something belong to 1 out of 2 categories(binary classification) or 1 out of many discrete categories ().This could involve classifying emails into spam or not spam, classifying persons as sick or healthy , infected or not infected just to name a few.
While it has "regression" in the name, logistic regression is actually about classification, not predicting a continuous number like standard linear regression does.
But regression is the bases for this classification as the classification is done using continous values between 0 and 1.
Mathematical background
1.Probability: A probability is just a number between 0 and 1 that tells us the likely of an event occuring. 0 means impossible and 1 means certain and 0.6 means 60% chance of occuring.
2.Odds:Odds are just another way of expressing probability. It is the ratio between probability of success(p) and probability of failure(1 - p) this is called odds for or probability of failure(1 - p) divided by probability of success (p).
from this point i will only be talking about odds for
Odds ranges from 0 to infinity.
odds > 1 means success is more likely
odds < 1 means failure is more likely
odds = 1 means 50-50 chance for failure and success
Odds Can only take positive numbers and we need a way to map these numbers to the set that ranges from -infinity to +infinity
Logarithms are perfect for solving this problem
3.log(Odds) or Logit function:
So we take the logarithm of odds, called the log-odds (or logit).
Formula:
if p > 0.5 logit(p) is positive
if p < 0.5 logit(p) is negative
if p = 0.5 logit(p) = 0
why are we doing all these
linear models like Logistic Regression produce output from -infinity to +infinity
But probabilities are restricted between 0 and 1
So we model log-odds (which stretch across the whole real number line) as a linear function of inputs.
that is ;
where w and x are vectors
this is log base e where e = 2.718...
if we know that z = w.x + b gives the log(odds) how do we get the probability (p) .
The Sigmoid function:
This function takes a number between -infinity and +infinity and map it to a number between 0 and 1 ( a probability)
It is obtained by solving for p in the logit function
take z=w.x+b \text{ } {z} = \text{w}. {x} + {b} \text { } z=w.x+b then
if z is large and positive output will be close to 1
if z is large and negative output will be close to 0
if z = 0 output will be 0.5
To see what is looks like
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(z):
return 1 / (1 + np.exp(-z))
#create 100 numbers from -10 to 10
z = np.linspace(-10, 10, 100)
#Plot in a graph
plt.plot(z, sigmoid(z))
plt.title("Sigmoid Function")
plt.xlabel("z")
plt.ylabel("σ(z)")
plt.grid()
plt.show()
The Decision Boundary
Once we have the output probability
y^=σ(z)\hat{y} = \sigma(z)y^=σ(z)
, how do we decide the class?
Easy:
- If y^≥0.5\hat{y} \geq 0.5y^≥0.5 , predict class 1 (positive)
- If y^<0.5\hat{y} < 0.5y^<0.5 , predict class 0 (negative) The decision boundary is where σ(z)=0.5\sigma(z) = 0.5σ(z)=0.5 , which happens when z=0z = 0 z=0 . So, the equation w⋅x+b=0w \cdot x + b = 0w⋅x+b=0 defines the decision boundary — a straight line (or a hyperplane in higher dimensions).
Cost Function: Measuring How Bad Our Predictions Are.
We need the cost function to measure how bad our model's prediction are
In linear regression, we used Mean Squared Error (MSE).
But in logistic regression, MSE doesn’t work well because of the sigmoid's non-linear nature — it causes messy, non-convex optimization.
Instead, we use Log-Loss (aka Cross-Entropy Loss)
The cross-entropy loss tells us how good our prediction $\hat{y}$ is compared to the true label yyy .
The formula is:
y is a label, a class, a category (0 or 1)
If the true label y=1y = 1y=1 , we want y^\hat{y}y^ to be close to 1.
If y=0y = 0y=0 , we want y^\hat{y}y^ to be close to 0.
The closer the prediction is to the truth, the smaller the loss!
For multiple examples, we just average the losses:
where mmm is the number of examples.
Goal: Make the loss as small as possible by adjusting the model!
Optimization: Finding the Best Weights that minimizes the loss
We want to minimize the total loss over all data points.
We'll do this using gradient descent
The gradient is defined using partial derrivatives
The gradient at any point in space will tell you the direction you should follow if you want to reach the highest point in that space as fast as possible .
On the contrary gradient descent does the opposite (the fastest direction to the bottom)
how is this achieved
1.compute the gradient of the loss with respect to
each parameter (w and b).
2.Update the parameters a little bit opposite to the
gradient (downhill!).
Update Rules (Gradient Descent)
After computing the gradients, we update the weights and bias like this:
-
Weight update:
w:=w−α∂L∂w w := w - \alpha \frac{\partial L}{\partial w} w:=w−α∂w∂L -
Bias update:
b:=b−α∂L∂b b := b - \alpha \frac{\partial L}{\partial b} b:=b−α∂b∂L
where:
- LLL is the loss
- α\alphaα (alpha) is the learning rate — it controls how big the update steps are.
# Assume we have loss gradient dw, db
w = w - learning_rate * dw
b = b - learning_rate * db
If you have more than two categories (e.g., cat, dog, rabbit), you extend logistic regression into Softmax Regression (a.k.a. Multinomial Logistic Regression).
Softmax generalizes sigmoid to handle multi-class classification.
Now that we fully understand the math behind logistic regression — from the linear model(w.x + b) to the sigmoid function( σ(z)\sigma(z)σ(z) ) and log-loss — it's time to bring it to life with real-world data.
We'll start by implementing logistic regression using Scikit-learn, a popular machine learning library that makes applying models incredibly easy.
After that, we'll also build the same model using TensorFlow/Keras to show how logistic regression fits naturally into deep learning workflows
Even though libraries like Scikit-learn and TensorFlow handle all the math for us under the hood — like computing the log-odds, applying the sigmoid function, and minimizing the cross-entropy loss — understanding the math gives us intuition about what the model is doing behind the scenes
We'll use the Pima Indians Diabetes Dataset, a famous dataset where the goal is to predict whether a patient has diabetes based on medical information like glucose level, BMI, age, and more
I) Implementation using scikit learn
Step 1: Import Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Step 2: Load the Dataset
# Load dataset
data = pd.read_csv('diabetes.csv')
# View the first few rows
print(data.head())
Step 3: Prepare the Data
# Split into features (X) and labels (y)
X = data.drop('Outcome', axis=1) # 'Outcome' is the target
y = data['Outcome']
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Feature scaling (important for logistic regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Step 4: Train the Model
# Initialize and train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
Step 5: Evaluate the Model
# Make predictions
y_pred = model.predict(X_test)
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
II)Implementation using TensorFlow/keras
Step 1: Import Libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
Step 2: Build the Model
model = Sequential([
Dense(1, activation='sigmoid', input_shape=(X_train.shape[1],))
])
Explanation:
Only 1 neuron because it’s binary classification.
Sigmoid activation because we want output probabilities between 0 and 1.
Step 3: Compile the Model
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
Explanation:
Adam optimizer for efficient gradient descent.
Binary Crossentropy because it’s binary classification.
Step 4: Train the Model
history = model.fit(
X_train,
y_train,
epochs=100,
batch_size=32,
validation_split=0.2,
verbose=1
)
Epochs = 100: Train 100 passes through the dataset.
Batch size = 32: Process 32 samples at a time.
Step 5: Evaluate the Model
# Evaluate on the test set
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print("Test Accuracy:", accuracy)
As we can see, both Scikit-learn **and **TensorFlow make it incredibly easy to build a logistic regression model.
While Scikit-learn is perfect for quick classical machine learning models, TensorFlow shines when you want to extend logistic regression into deep learning architectures later on.
Visualizing model performance
To better understand how our logistic regression model performed, let's visualize the confusion matrix. and also take a look at the model's learning curves over the training process.
1. Plotting the Confusion Matrix (Scikit-learn)
Confusion matrices help you see not just accuracy, but where the model is making mistakes.
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import ConfusionMatrixDisplay
# Create confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Plot confusion matrix
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix - Scikit-learn Logistic Regression')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
2. Plotting Training History (TensorFlow/Keras)
TensorFlow gives you the history object, which tracks loss and accuracy during training.
Let’s plot how the model learned over time:
# Plot training & validation accuracy values
plt.figure(figsize=(12,5))
# Accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy over Epochs')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')
# Loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss over Epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.tight_layout()
plt.show()