Variability in Descriptive Statistics

Time to read: 5 Minutes
Variability-Definition

Variability is a statistical unit that is used to create conclusions from a data set. It is used by researchers and statisticians in several fields to make deductive assertions through a series of tests.

Variability – In a Nutshell

  • Variability is an essential statistical measure.
  • Measures of variability such as the variance and standard deviation show the relationship between the values in a data set.
  • The most appropriate measure of variability is based on the measurement level in a study.

Definition: Variability

In descriptive statistics, variability is the extent to which data in a data set varies. It shows how much the elements in a data group differ by metrics such as size.1

The most common methods of measuring variability are:

  • Range – This is the difference between the highest and the lowest value in a data set, and the average of the two is known as the midrange.2
  • Interquartile range – The middle range of your ordered data, i.e., smallest to largest. It is the difference between the third quartile and the first quartile.3
  • Standard deviation – It measures the dispersion of data values from the group’s mean and is derived as the square root of the variance.4
  • Variance – It calculates how much the data in a set differs from the average.5

Why is variability important

Data sets that display low variability can design predictive models, as they are reasonably consistent. High variability scenarios are hard to predict due to their wide dispersion.

Data groups may have the same central tendency but exhibit different variability. Thus, variability supplements central tendency and other statistical measures to give a stronger summary of the conclusions from a test.

Measuring variability: Range

It is the difference between the largest and the smallest value. The formula for the range is expressed as:

Range (R) = Highest number (H) – Lowest number (L)

Calculating the range gives a relatively accurate measure of variability. However, outliers in the data group may give misleading conclusions. Outliers refer to extreme values that are dissimilar from other values in a group.

For instance, in the data set below:

36, 31, 39, 42, 47, 98766.

The last value is an outlier. Outliers can affect deductions from the range because the range only considers two numbers, i.e., the largest and the smallest.

The ranges should therefore be applied alongside other measures.

Range calculation example

If you have 6 data elements from a sample:

Data (days) 20  25  35  40  45  50  65

  • R = H – L
  • H (Highest value) = 65
  • L (Lowest value) = 20
  • R(Range) = 65 – 20

The Range is 45.

Measuring variability: Interquartile range

The interquartile range (IQR) is the range of the middle values in an ordered data set. Quartiles are used in descriptive statistics to divide an ordered data group into four equal parts.

Interquartile range calculation example

The interquartile range is calculated as follows using a previous day’s data set:

Step 1 – Find Q1 which is:

Step 2 – Find Q3 which is:

Q1 can be expressed as the 2nd element which is 25 while Q3 is the 5th element which is 45

Variability interquartile range

Measuring variability: Standard deviation

The standard deviation is the mean of the variability in a data group.

Calculating standard deviation involves six steps:

  1. Outline every score and calculate the mean.
  2. Deduct the value of the mean from each score to find the deviation.
  3. Find the square of each deviation.
  4. Find the sum of the squared deviations.
  5. Divide the total squared deviations by n-1.
  6. Calculate the square root of each result.

Standard deviation with a sample

Data samples are subsets of data groups derived from the selection and analysis of patterns in a population. The standard deviation of a sample is calculated from the following formula:

Formula Explanation
The standard deviation of the sample
The sum of
Each value
Mean of the sample
Number of units in the sample

Standard deviation calculation example

From the data set proposed:

20      25     35     40     45     50     65

Step 1 

Data deviations from mean Square of deviations

20 20-40 = -20 400
25 25-40=-15 225
35 35-40=-5 25
40 40-40=0 0
45 45-40=5 25
50 50-40=10 100
65 65-40=25 625

Sum of squares = 1400

Step 2

Use

Step 3

 

 

Step 4

 

 

Standard deviation with a population

A statistical population in descriptive statistics refers to the pool of individuals or objects that a researcher is interested in. The standard deviation of a population is calculated as follows:

Formula Explanation
Standard deviation
Sum of
Each unit
Mean of the population
Values in the population

Measuring variability: Variance

Variance is the mean of the squared deviations from the average of the data group. It is derived by squaring the standard deviation.

Variance with a sample

The following formula is used to calculate the variance of a sample:

Function Explanation
Variance of the sample
Sum of
Each value
Mean of sample
Number of values

Variance calculation example

From our previous data set:

  • Standard deviation = 15.28

Since variance is the square of the standard deviation:

  • Variance = 15.28 x 15.28 = 233.47

Variance with a population

You can also determine the variance of a population. The formula for finding the variance of a population is:

Formula Explanation
Population variance

Sum of
Each value

Mean of population
Number of values present

Determining the best measure of variability

The distribution and level of measurement dictate the most suitable measure.

Level of measurement:

  • The range and interquartile measures are preferable for ordinal measurements. Standard deviation and variance are used for sophisticated ratio measurements.

Distribution:

  • All the measurement types can be applied for normal distributions.
  • Variance and standard deviation are used often because they consider every element of a data group.
  • However, this also makes them highly susceptible to outliers.
  • For data groups with outliers such as skewed distributions, it is best to use the interquartile range as it focuses on the dispersion in the middle.

FAQs

The range – the easiest measurement level is derived from the difference between the smallest and largest values in a data set.

  • Standard deviation measures the spread of values from the mean.
  • Variance is the square of standard deviation.

An example is observed in production lines. Specifications are made using computers to produce identical parts, but there are still anomalies. Variance and other measures of variability estimate the deviations from the desired mean.

A biased estimate gives consistently high or low results. It has a systematic bias that emphasizes consistent values.6

Sources

1 Scott, Gordon. “Variability.” Investopedia. November 18, 2020. https://www.investopedia.com/terms/v/variability.asp.

2 MathIsFun. “Definition of Range (statistics).” Accessed November 11, 2022. https://www.mathsisfun.com/definitions/range-statistics-.html.

3 Frost, Jim. “Interquartile Range (IQR): How to Find and Use It.” Statistics By Jim. Accessed November 11, 2022. https://statisticsbyjim.com/basics/interquartile-range/.

4 CueMath. “Standard Deviation.” Accessed November 11, 2022. https://www.cuemath.com/data/standard-deviation/.

5 Glen, Stephanie. “Variance: Simple Definition, Step by Step Examples.” StatisticsHowTo.com. Accessed November 11, 2022. https://www.statisticshowto.com/probability-and-statistics/variance/ .

6 Hanck, Christoph, Martin Arnold, Alexander Gerber, and Martin Schmelzer. “Omitted Variable Bias.” In Introduction to Econometrics with R. Essen, Germany: University of Duisburg-Essen. https://www.econometrics-with-r.org/6-1-omitted-variable-bias.html.