Program to Find the Correlation Coefficient

In this blog post, we will explore how to calculate the correlation coefficient between two arrays. The correlation coefficient is a statistical measure that helps us understand the strength and direction of the relationship between two variables. Ranging from -1 to +1, this coefficient indicates whether the variables are positively or negatively correlated. Understanding the Correlation Coefficient The correlation coefficient ( r ) can be calculated using the following formula: r = (n * (Σxy) - (Σx)(Σy)) / sqrt([(n * Σx² - (Σx)²) * (n * Σy² - (Σy)²)]) Where: ( n ) is the number of pairs of scores, ( Σ xy ) is the sum of the product of paired scores, ( Σ x ) and ( Σ y ) are the sums of the individual scores, ( Σ x^2 ) and ( Σ y^2 ) are the sums of the squares of the scores. Example Calculation Let's consider the following dataset: X Y 15 25 18 25 21 27 24 31 27 32 From this data: ( Σ X = 105 ) ( Σ Y = 140 ) Using the formula, we can calculate the correlation coefficient. Example Inputs and Outputs Input 1: ( X[] = {43, 21, 25, 42, 57, 59} ) ( Y[] = {99, 65, 79, 75, 87, 81} ) Output 1: 0.529809 Input 2: ( X[] = {15, 18, 21, 24, 27} ) ( Y[] = {25, 25, 27, 31, 32} ) Output 2: 0.953463 Python Program to Calculate the Correlation Coefficient Here's how you can implement this in Python: import math # Function to calculate the correlation coefficient def correlationCoefficient(X, Y, n): sum_X = 0 sum_Y = 0 sum_XY = 0 squareSum_X = 0 squareSum_Y = 0 for i in range(n): # Sum of elements in X sum_X += X[i] # Sum of elements in Y sum_Y += Y[i] # Sum of X[i] * Y[i] sum_XY += X[i] * Y[i] # Sum of squares of elements in X squareSum_X += X[i] * X[i] # Sum of squares of elements in Y squareSum_Y += Y[i] * Y[i] # Calculate the correlation coefficient corr = (n * sum_XY - sum_X * sum_Y) / \ math.sqrt((n * squareSum_X - sum_X ** 2) * (n * squareSum_Y - sum_Y ** 2)) return corr # Driver code X = [15, 18, 21, 24, 27] Y = [25, 25, 27, 31, 32] # Size of the array n = len(X) # Calculate and print the correlation coefficient print('{0:.6f}'.format(correlationCoefficient(X, Y, n))) Output When you run the code, you should see: 0.953463 Complexity Time Complexity: ( O(n) ), where ( n ) is the size of the given arrays. Auxiliary Space: ( O(1) ). This simple program effectively calculates the correlation coefficient, allowing you to analyze the relationship between two sets of data. Whether you're working on statistical analysis, financial data, or any other fields, understanding correlation is crucial for making informed decisions based on your data. For more content, follow me at —  https://linktr.ee/shlokkumar2303

Mar 20, 2025 - 18:02
 0
Program to Find the Correlation Coefficient

In this blog post, we will explore how to calculate the correlation coefficient between two arrays. The correlation coefficient is a statistical measure that helps us understand the strength and direction of the relationship between two variables. Ranging from -1 to +1, this coefficient indicates whether the variables are positively or negatively correlated.

Understanding the Correlation Coefficient

The correlation coefficient ( r ) can be calculated using the following formula:

r = (n * (Σxy) - (Σx)(Σy)) / sqrt([(n * Σx² - (Σx)²) * (n * Σy² - (Σy)²)])

Where:

  • ( n ) is the number of pairs of scores,
  • ( Σ xy ) is the sum of the product of paired scores,
  • ( Σ x ) and ( Σ y ) are the sums of the individual scores,
  • ( Σ x^2 ) and ( Σ y^2 ) are the sums of the squares of the scores.

Example Calculation

Let's consider the following dataset:

X Y
15 25
18 25
21 27
24 31
27 32

From this data:

  • ( Σ X = 105 )
  • ( Σ Y = 140 )

Using the formula, we can calculate the correlation coefficient.

Example Inputs and Outputs

  • Input 1:
    • ( X[] = {43, 21, 25, 42, 57, 59} )
    • ( Y[] = {99, 65, 79, 75, 87, 81} )
  • Output 1: 0.529809

  • Input 2:

    • ( X[] = {15, 18, 21, 24, 27} )
    • ( Y[] = {25, 25, 27, 31, 32} )
  • Output 2: 0.953463

Python Program to Calculate the Correlation Coefficient

Here's how you can implement this in Python:

import math

# Function to calculate the correlation coefficient
def correlationCoefficient(X, Y, n):
    sum_X = 0
    sum_Y = 0
    sum_XY = 0
    squareSum_X = 0
    squareSum_Y = 0

    for i in range(n):
        # Sum of elements in X
        sum_X += X[i]
        # Sum of elements in Y
        sum_Y += Y[i]
        # Sum of X[i] * Y[i]
        sum_XY += X[i] * Y[i]
        # Sum of squares of elements in X
        squareSum_X += X[i] * X[i]
        # Sum of squares of elements in Y
        squareSum_Y += Y[i] * Y[i]

    # Calculate the correlation coefficient
    corr = (n * sum_XY - sum_X * sum_Y) / \
           math.sqrt((n * squareSum_X - sum_X ** 2) * (n * squareSum_Y - sum_Y ** 2))

    return corr

# Driver code
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Size of the array
n = len(X)

# Calculate and print the correlation coefficient
print('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

Output

When you run the code, you should see:

0.953463

Complexity

  • Time Complexity: ( O(n) ), where ( n ) is the size of the given arrays.
  • Auxiliary Space: ( O(1) ).

This simple program effectively calculates the correlation coefficient, allowing you to analyze the relationship between two sets of data. Whether you're working on statistical analysis, financial data, or any other fields, understanding correlation is crucial for making informed decisions based on your data.

For more content, follow me at —  https://linktr.ee/shlokkumar2303