Commenting on the P-value of 0.059 obtained in the example, Moore & McCabe say, "Sample size strongly influences the P-value of a test. An effect that fails to be significant at a specified level alpha in a small sample can be significant in a larger sample. In the light of the rather small samples in Example 7.20, the evidence for some effect of calcium on blood pressure is rather good."

This reasoning is circular. Increasing the sample size will tend
to result in a smaller P-value **only** if the null hypothesis is false,
which is the point at issue.

However, it is possible to justify using a larger alpha when the sample
size is small by considering the probabilities of both type I and
type II errors. With a small sample, the probability of a type II
error with the standard alpha of 0.05 may be too high, and we might
wish to act in a way appropriate to when the null hypothesis is
false even though the P-value is greater than 0.05, because we
are afraid of making such a type II error. However, we would do this
**despite** the fact that when the P-value is greater than 0.05,
we have less evidence that the null hypothesis is false than we would
have if we had obtained a smaller P-value with a larger sample,
**not** because a P-value greater than 0.05 with a small
sample is somehow just as strong evidence against the null hypothesis
as a smaller P-value with a big sample.

The whole point of a P-value is to express the strength of evidence against the null hypothesis in a uniform way that accounts for the sample size, the amount of noise in measurements, and other aspects of the situation. There are other approaches to expressing the strength of evidence, and one of these, the "Bayesian" approach, will in some circumstances give results that vary with sample size in a way different from P-values. Moore & McCabe don't cover the Bayesian approach, however.

Back to main page of Moore & McCabe errata