The chi-square goodness of fit test determines if the observed proportions drawn from a random sample follow the suggested theory. Statistical analysts use the chi-square goodness of fit test to determine the proportions to use in the test and if the outcomes follow a probability distribution. Discover how to use the hypothesis test, the important formulas and examples of applying the chi-square goodness of fit test.
Definition: Chi-square goodness of fit test
Chi-square goodness of fit test is a Pearson’s test that determines whether the observed distribution differs from the expectations. Since it is a statistical hypothesis test, you can use it to determine if the data groupings represent the full population.
For instance, if the chi-square goodness of fit test is low, the expectations don’t match the observed values. When the chi-square goodness of fit test is high, the hypothesis is close to the observed values.
When to use the chi-square goodness of fit test
Most statistical tests are based on distributional assumptions. Unlike Anderson-Darling and Kolmogorov-Smirnov tests2, which are restricted to continuous distributions, the chi-square goodness of fit test can be applied to discrete distributions like Poisson and binomials. The test can evaluate how well the theoretical distribution matches the empirical distribution. You use it to check if your sample data matches your specific theoretical distribution. For instance, if you have a set of data values and an idea of how they are distributed, you use the chi-square goodness of fit test to determine if your hypothesis is true.
When performing a chi-square goodness of fit test, you must fulfil the following conditions:
- You can only test categorical variables. If the variable is continuous, convert it to a categorical variable through data binning that separates observations into intervals.
- The sample is randomly selected from the population.
- The distribution should provide a minimum of five observations.
Chi-square goodness of fit test: Hypotheses
Since the chi-square goodness of fit test evaluates if a sample was drawn from the population, you need a hypothesis to test. A hypothesis allows you to draw conclusions about the population distribution and whether the goodness of fit is high enough to conclude that your hypothesis is right. A chi-square goodness of fit test has two types of hypotheses. For instance, your hypothesis for the goodness of fit test includes the following:
Null hypothesis(H0): The hypothesis observes that the difference between the expected and observed values is insignificant. For example, the population follows the categorical grouping or specified distribution.
Alternative hypothesis(Ha): The hypothesis reveals a significant difference between the expected and observed values. For example, the population does not follow the categorical grouping or specified distribution.
You can specify your hypothesis by describing the sample distribution or providing the proportions for the categorical groupings.
How to calculate the chi-square goodness of fit test
In the chi-square goodness of fit test, the distribution for the hypothesis is chi-square (X2). The formula for the goodness of fit test is as follows:
From the formula:
Σ: is the summation symbol
O: observed value
E: expected value
If the p-value representing the X2 test statistic with n-1 degrees of freedom3 is lower than your significance level, the alternative hypothesis is true, and you can reject the null hypothesis. The degree of freedom you use depends on the distribution of the sample. For instance, if it is binomial distribution, the degree of freedom is n-1. Poisson distribution has a degree of freedom of n-2 while normal distribution is n-3.
In the example below, the shop owner projects that an equal number of customers visit the shop. From observation, the researcher found the actual number of customers in a given week. The information can be used to calculate the chi-square goodness of fit test.
- Step: Actual vs. expected number
Days of the week Number of customers(O) Expected number of customers(E) Monday 40 50 Tuesday 50 50 Wednesday 60 50 Thursday 45 50 Firday 55 50
Step: Observed vs. expected
Days of the week Number of customers(O) Expected number of customers(E) Observed – Expected(0-E) Monday 40 50 -10 Tuesday 50 50 0 Wednesday 60 50 10 Thursday 45 50 -5 Firday 55 50 5
- Step: Squared difference between observed and expected
Days of the week Number of customers(O) Expected number of customers(E) Observed-Expected(0-E) (0-E)2 Monday 40 50 -10 100 Tuesday 50 50 0 0 Wednesday 60 50 10 100 Thursday 45 50 -5 25 Friday 55 50 5 25
- Step: Calculation of the squared difference
Days of the week Number of customers(O) Expected number of customers(E) Observed-Expected(O-E) (O-E)2 Monday 40 50 -10 100 2 Tuesday 50 50 0 0 0 Wednesday 60 50 10 100 2 Thursday 45 50 -5 25 0.5 Friday 55 50 5 25 0.5
The chi-square statistic doesn’t tell you much without interpretation. For instance, you need to compare the chi-square value with the appropriate distribution to determine whether to reject the null hypothesis. For example, you need to determine the degree of freedom and significance level to get the critical value. From the critical value, you can reject or accept the null hypothesis. For example, if the chi-square figure is lower than the critical value, the observed values are not different from the expected values.
The chi-square goodness of fit test determines if your hypothesis is true. The test is applied to categorical variables from the population, and you want to determine whether the data is consistent with the hypothesized distribution. When applying the test, start by stating the null and alternative hypotheses. You will formulate an analysis plan, sample the data and interpret the results to reject or accept the hypothesis.
You can perform a chi-square goodness of fit test if the distribution is a categorical variable. If the variable is continuous, convert it to a categorical variable by dividing the observations into intervals for easy interpretation. The sample should also be randomly selected from the population. Your sample should have a minimum of five observations expected.
Use the chi-square goodness of fit test with one categorical variable and would like to test the hypothesis of the distribution. If you have two categorical variables, you can use the chi-square test of independence. You also use the test to determine the hypothesis about the relationship. In both tests, you need to formulate the null and alternative hypotheses and test before accepting or rejecting the hypotheses.
1 Frost, Jim. “Categorical Variables.” Statistics By Jim, May 5, 2017. https://statisticsbyjim.com/glossary/categorical-variables/.
2 Stephanie. “Kolmogorov-Smirnov Goodness of Fit Test.” Statistics How To, May 16, 2022. https://www.statisticshowto.com/kolmogorov-smirnov-test/.
3 Study.com. “Take Online Courses. Earn College Credit. Research Schools, Degrees &Careers.” Accessed February 28, 2023. https://study.com/academy/lesson/degrees-of-freedom-definition-formula-example.html.