|
Analysis checklist: Column statistics |
|
|
Descriptive statistics
Normality tests Normality tests are performed for each column of data. Each normality test reports a P value that answers this question: If you randomly sample from a Gaussian population, what is the probability of obtaining a sample that deviates from a Gaussian distribution as much (or more so) as this sample does? A small P value is evidence that your data was sampled from a nongaussian distribution. A large P value means that your data are consistent with a Gaussian distribution (but certainly does not prove that the distribution is Gaussian). Normality tests are less useful than some people guess. With small samples, the normality tests don't have much power to detect nongaussian distributions. Prism won't even try to compute a normality test with fewer than seven values. With large samples, it doesn't matter so much if data are nongaussian, since the t tests and ANOVA are fairly robust to violations of this standard. Normality tests can help you decide when to use nonparametric tests, but the decision should not be an automatic one.
Inferences A one-sample t test compares the mean of a each column of numbers against a hypothetical mean that you provide. The P value answers this question: If the data were sampled from a Gaussian population with a mean equal to the hypothetical value you entered, what is the chance of randomly selecting N data points and finding a mean as far (or further) from the hypothetical value as observed here? If the P value is small (usually defined to mean less than 0.05), then it is unlikely that the discrepancy you observed between sample mean and hypothetical mean is due to a coincidence arising from random sampling. The nonparametric Wilcoxon signed-rank test is similar, but does not assume a Gaussian distribution. It asks whether the median of each column differs from a hypothetical median you entered. |