Even well-designed and controlled studies include missing data. Missing values reduces a study’s statistical power and causes erroneous estimates and inaccurate findings. This manuscript discusses missing data concerns, types, and ways of dealing with them.
Definition: Missing data
Missing data or missing values arise when you do not have data stored for particular variables or participants. Data can be lost for numerous causes, including incomplete data entry, device failure, and misplaced files.
Types of missing data
Missing data are errors because they do not represent the actual values of what was intended to be measured.
Consideration of the reason for the missing values is essential, as it enables you to establish the type of missing values and the necessary course of action.
Missing values fall into three categories:
MCAR |
Missing completely at random (MCAR) occurs when the probability of missing data is unrelated to the expected value or observed responses. MCAR is an ideal but impractical assumption for many anesthesia studies. MCAR data are missing by design due to instrument failure or because samples are lost in transit or are technically unacceptable. MCAR data ensures unbiased analysis. The design may lose power, but missing value doesn't influence estimated parameters. |
MAR | MAR is a better assumption for anesthetic studies. MAR data are missing when the probability of missing replies relies on the observed responses but not the expected missing values. We may think MAR isn't a concern because randomness isn't biased. Missing data can't be ignored under MAR. If a dropout variable is MAR, the probability of a dropout in each case is conditionally independent of the variable obtained currently and expected to be obtained in the future, given the history of the obtained variable before that case. |
MNAR | If data characters don't meet MCAR or MAR, they're missing, not at random (MNAR). MNAR data is problematic. Modeling the missing data is the only approach to getting unbiased parameter estimates. The model is then used to estimate missing values.1 |
How to prevent missing data
Common causes of missing values include attrition, non-response, and poorly constructed study techniques. While planning a study, it is advisable to make it simple for participants to contribute data.
Here are tips to minimize missing values:
✓ Limit follow-ups
✓ Minimize data collected
✓ Make forms user-friendly
✓ Incorporate methods of data validation.
✓ Give rewards
How to deal with missing data
Typically, you have the choice of accepting, eliminating, or reconstructing missing values to organize your data.
Determine how to handle each instance of missing values depending on your evaluation of the missing value’s cause:
- Are these missing data due to random or non-random causes?
- Are missing data zero or null?
- Was the query or measurement ill-conceived?
If your information is MCAR or MAR, it can be accepted or left unchanged. However, MNAR data may necessitate a more intricate approach.
Missing data: Acceptance
Accepting missing data is the most prudent course; leave these cells blank. This is best for MCAR or MAR values. When you have a small sample, save as much data as possible to maintain statistical power.
To make your dataset consistent, recode any missing values as “N/A.” These steps let you preserve as much research data as possible without alterations.
Missing data: Deletion
Listwise or pairwise deletion can be used to eliminate missing values from analyses.
Listwise deletion
Listwise deletion eliminates all cases (participants) with missing data for any variable. You’ll have the entire participant data. This strategy may result in a smaller, biased sample. If data are lacking from some variables or measurements, those who offer them may differ from those who don’t.
Your sample may not be representative of the population, making it biased.2
Pairwise deletion
Pairwise deletion removes data only if a needed data point is missing. The existing values are used if missing values exist in the data set. Pairwise deletion maintains more information than listwise deletion, which deletes absent cases.
Pairwise deletion is less biased for MCAR or MAR data, provided relevant mechanisms are covariates. Missing observations will degrade the analysis.3
Missing data: Imputation
Imputation replaces missing values with an estimate. Use other data to form a comprehensive dataset.
You have numerous imputation options. The easiest way of imputation is to use the mean or median of a variable.
Hot-deck imputation
Hot-deck imputation replaces missing values with values from related cases or participants. A “donor” value is used for each situation with missing values based on data from other variables.
Cold-deck imputation
In cold-deck imputation, missing values are substituted with existing values from similar cases in other datasets. The new values are derived from an independent sample.
FAQs
Missing values arise when you do not have data stored for particular variables or participants.
Missing data are important as they can influence results depending on the kind. Because of an unrepresentative sample, your results may not be generalizable.
Typically, you have the choice of accepting, eliminating or reconstructing missing data to organize it.
Sources
1 Kang, Hyun. “The prevention and handling of the missing data.” National Library of Medicine. May 24, 2013. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668100/.
2 Ogunbiyi, Ibrahim Abayomi . “How to Handle Missing Data in a Dataset.” FreeCodeCamp. June 24, 2022.
https://www.freecodecamp.org/news/how-to-handle-missing-data-in-a-dataset/.
3 MastersInDataScience. “How to Deal with Missing Data.” Accessed December 02, 2022. https://www.mastersindatascience.org/learning/how-to-deal-with-missing-data/.