Structurally Missing Data is data that is expected to be missing.
For example, there are structurally missing data in the ‘Litters’ and ‘Pups/Litter’ columns for all the male dogs in the table below because we would not expect male dogs to have puppies.
Missing Not at Random (MNAR) data is missing for reasons that cannot be inferred. These data are systematically missing, meaning their missingness may be predictable by the value of another variable, but there is no clear explanation as to why.
In the table below, the sales data for bananas is missing, but try as you might, you cannot figure out why it is missing. Bananas were stocked and sold every week that data was collected! The missing banana data is MNAR data.
Missing at Random (MAR) data is missing because of some random characteristic about the person or thing being studied. Often, this type of data is reliably missing based on the value of another variable in the dataset.
In the table below, the bacterial cell counts for all the stool samples are ‘NaN’. If we looked into this, we might find that there were too many bacterial cells to count in all those samples. Therefore, the bacterial cell counts for stool samples would be MAR data.
|Sample ID||Sample Type||Bacterial Cell Counts|
Missing Completely at Random (MCAR) data has no detectable underlying reason causing the values to be missing.
The table below has MCAR data. The # of fruits is missing for some plants, but the missing fruit data seems unrelated to the height of the plant. Short and tall plants are both missing fruit data. In addition, we are missing the height for one of our plants!
|Plant||Height (cm)||# of Fruits|