The Correlation Coefficient – Meaning & Calculations

Time to read: 5 Minutes
Correlation-coefficient-Definition

In statistical research, you very often have to determine the correlation between two variables. In other words, how strong the correlation between two or more variables may be.

This article will thoroughly cover what role the correlation coefficient plays in statistics.

Definition: The correlation coefficient

The correlation coefficient is a measure of the strength of the relationship between two variables. There are several types of correlation coefficients, but the most commonly used is Pearson’s (r).

Correlation Coefficient Meaning Example
Between 0 and 1 A positive relationship Height and weight
0 No relationship at all Amount of tea drunk and intelligence
Between 0 and -1 A negative relationship Height above sea level and temperature1

Understanding the correlation coefficient

This coefficient is used as an inferential statistic, meaning it can be used to test statistical hypotheses. It also works as a descriptive statistic and can be used to summarize findings in research.

Here are other names for this coefficient:

  • Pearson’s r
  • Pearson’s product-moment correlation coefficient (PPMCC)
  • Bivariate correlation

Correlation coefficient: Graphical representation

The line of best fit is a line that goes through scatter plots of data points, and it helps to show the relationship between variables.

If the variables have a correlation coefficient of zero, it will not be possible to draw a line of best fit. If there is a perfect correlation between the two variables, you will get all points on the line of best fit.2

Correlation coefficient positive zero negative

When to use the correlation coefficient

You should use this coefficient in the following cases:

  • When both variables are quantitative
  • When the variables are normally distributed
  • When the data has no outliers
  • When the relationship between the variables is linear

Other types of correlation coefficients

Spearman’s rank correlation coefficient can be used for monotonic relationships.

It is appropriate for measurements taken from ordinal scales. You should also use it if the variables aren’t normally distributed and if it includes outliers.3

Correlation coefficient monotonic relationship

Calculating the correlation coefficient

You can calculate this value using a formula or software. Here is the formula used:

Here is a sample dataset to demonstrate how you can calculate the correlation coefficient:

  • Person 1: Age (x) = 20; income (y) = 1500
  • Person 2: Age (x) = 30; income (y) = 3000
  • Person 3: Age (x) = 40; income (y) = 5000
  • Person 4: Age (x) = 50; income (y) = 7500

Step 1: x and y variables

The independent variable should be on the x-axis, while the dependent variable should be on the y-axis.

  • In this case, age will be considered the x variable.
  • The person’s income will be considered the y variable.

Step 2: x2 and y2

The first step is to square the x and y variables. This will give you the following values:

Person 1 x2 = 400; y2 = 2250000
Person 2 x2 = 900; y2 = 9000000
Person 3 x2 = 1600; y2 = 25000000
Person 4 x2 = 2500; y2 = 56250000

Step 3: Cross product and the sum

Next, you have to multiply x and y together. This will give you the following values:

Person 1 30,000
Person 2 90,000
Person 3 200,000
Person 4 375,000

Step 4: Calculate the correlation coefficient

You can add up the x values, y values, xy values, x2 values, and y2 values. You can now plug the numbers into the formula.

  • This will give you a correlation coefficient of 0.9938.

That means age and income are strongly correlated.4

Testing correlation coefficient significance

To determine whether the relationship between two variables is significant, you have to find the value of r and n, which is the sample size. Your goal is to determine whether Pearson’s correlation coefficient of the population (p) is significantly different from 0.

  • Null hypothesis (H0): p = 0
  • Alternative hypothesis (Ha): p ≠ 0

Aside from using the following steps, you can test the hypotheses using software like Stata. As an example, we will assume you have the following figures:


Step 1: Calculate the t-value

You can calculate the t-value using this formula:

By inputting the figures in our example, you will get .

Step 2: Find the critical value of t

You have to use a t table to find the critical value of t. You will need to determine the degrees of freedom (df) and significance level (a).

In most cases, two-tailed tests are preferred for correlations. In our example, the degree of freedom is 8, and the significance level is 0.5. Our critical value of t is 1.86.

Step 3: Compare t-value to critical value

For this step, you have to use the absolute value of t, meaning you should ignore any negative signs. Determine whether the t-value is higher or lower than the critical value of t.

Step 4: To reject or not to reject

If the absolute value of t is greater than the critical value, you will reject the null hypothesis. If it is less, you will fail to reject the null hypothesis.

In our example, we will reject the null hypothesis.5

Reporting the correlation coefficient

In a research paper, the correlation coefficient is included in the results section. If you are using APA style, you should italicize r and round the correlation coefficient to two decimal places.

The p-value should be rounded off to three decimal places. You also need to drop the leading 0 from these two values.

Here is an example:

Pearson’s correlation coefficient was computed to determine the linear relationship between marketing budgets and profits earned. There was a positive relationship between these two variables, , .6

FAQs

This value is used to show how strongly related two variables are.

This means the two variables are not related at all.

In most cases, you will only test two variables.

A plus sign indicates that the variables have a positive correlation, while a minus sign shows that the variables have a negative correlation.

Sources

1 Glen, Stephanie. “Correlation Coefficient: Simple Definition, Formula, Easy Steps.” StatisticsHowTo.com. Accessed December 8, 2022. https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/.

2 Guildfordschools. “Line of Best Fit Correlation Coefficient.” Accessed December 8, 2022. https://www.guilfordschools.org/pdf/Line%20of%20Best%20Fit%20Correlation%20Coefficient.pdf.

3 Amoeba. “How to choose between Pearson and Spearman correlation?” StackExchange. March 2, 2017. https://stats.stackexchange.com/questions/8071/how-to-choose-between-pearson-and-spearman-correlation.

4 Srivastav, Ashish Kumar. “Pearson Correlation Coefficient.” WallStreetMojo. Accessed December 8, 2022. https://www.wallstreetmojo.com/pearson-correlation-coefficient/.

5 Minitab Statistical Software. “Using the t-value to determine whether to reject the null hypothesis.” Accessed December 8, 2022. https://support.minitab.com/en-us/minitab/21/help-and-how-to/statistical-modeling/regression/supporting-topics/regression-models/using-the-t-value-to-determine-whether-to-reject-the-null-hypothesis/.

6 Zach. “How to Report Pearson’s r in APA Format (With Examples).” Statology. July 19, 2021. https://www.statology.org/how-to-report-pearson-correlation/.