|
What is an outlier?
When analyzing data, you'll sometimes find that one value is far from the others. Such a value is called an outlier, a term that is usually not defined rigorously.
Approach to thinking about outliers
When you encounter an outlier, you may be tempted to delete it from the analyses. First, ask yourself these questions:
| • | Was the value entered into the computer correctly? If there was an error in data entry, fix it. |
| • | Were there any experimental problems with that value? For example, if you noted that one tube looked funny, you have justification to exclude the value resulting from that tube without needing to perform any calculations. |
| • | Could the outlier be caused by biological diversity? If each value comes from a different person or animal, the outlier may be a correct value. It is an outlier not because of an experimental mistake, but rather because that individual may be different from the others. This may be the most exciting finding in your data! |
If you answered “no” to all three questions, you are left with two possibilities.
| • | The outlier was due to chance. In this case, you should keep the value in your analyses. The value came from the same distribution as the other values, so should be included. |
| • | The outlier was due to a mistake: bad pipetting, voltage spike, holes in filters, etc. Since including an erroneous value in your analyses will give invalid results, you should remove it. In other words, the value comes from a different population than the other values, and is misleading. |
The problem, of course, is that you can never be sure which of these possibilities is correct.
|