Introduction: What is null hypothesis
The hypothesis we test statistically is called the null hypothesis. Let us take a conceptually simple example. Suppose we are testing the efficacy of a new drug on patients with myocardial infarction (heart attack). We divide the patients into two groups—drug and no drug—according to good design procedures and use as our criteria to measure mortality in the two groups. It is our hope that the drug lowers mortality, but to test the hypothesis statistically, we have to set it up in a sort of backward way. We say our hypothesis is that the drug makes no difference, and what we hope to do is to reject the “no difference” hypothesis, based on evidence from our sample of patients. This is known as the null hypothesis. We specify our test hypothesis as follows:
Ho (null hypothesis): Death rate in the group treated with drug A = death rate in group treated with drug B
This is equivalent to: Ho: (death rate in group A) – (death rate in group B)= 0
We test this against an alternate hypothesis, known as HA, that the difference in death rates between the two groups does not equal 0. We then gather data and note the observed difference in mortality between group A and group B. If this observed difference is sufficiently greater than zero, we reject the null hypothesis. If we reject the null hypothesis of no difference, we accept the alternate hypothesis, which is that the drug does make a difference. When you test a hypothesis, this is the type of reasoning you use:
- I will assume the hypothesis that there is no difference is true.
- I will then collect the data and observe the difference between the two groups.
- If the null hypothesis is true, how likely is it that by chance alone I would get results such as these?
- If it is not likely that these results could arise by chance under the assumption than the null hypothesis is true, then I will conclude it is false, and I will “accept” the alternate hypothesis.
Why Do We Test the Null Hypothesis?
Suppose we believe that drug A is better than drug B in preventing death from a heart attack. Why don’t we test that belief directly and see which drug is better rather than testing the hypothesis that drug A is equal to drug B? The reason is that there is an inﬁnite number of ways in which drug A can be better than drug B, so we would have to test an inﬁnite number of hypotheses. If drug A causes 10 % fewer deaths than drug B, it is better. So ﬁrst we would have to see if drug A causes 10 % fewer deaths. If it doesn’t cause 10 % fewer deaths, but if it causes 9 % fewer deaths, it is also better. Then we would have to test whether our observations are consistent with a 9 % difference in mortality between the two drugs. Then we would have to test whether there is an 8 % difference, and so on.
Note: each such hypothesis would be set up as a null hypothesis in the following form: drug A – drug B mortality=10 %, or equivalently
(Drug A – drug B mortality) – 10 %= 0.
(Drug A – drug B mortality) – 9 % = 0.
(Drug A – drug B mortality) – 8 = 0.
On the other hand, when we test the null hypothesis of no difference, we only have to test one value—a 0 % difference—and we ask whether our observations are consistent with the hypothesis that there is no difference in mortality between the two drugs. If the observations are consistent with a null difference, then we cannot state that one drug is better than the other. If it is unlikely that they are consistent with a null difference, then we can reject that hypothesis and conclude there is a difference. A common source of confusion arises when the investigator really wishes to show that one treatment is as good as another (in contrast to the above example, where the investigator in her heart of hearts really believes that one drug is better).
For example, in the emergency room, a quicker procedure may have been devised and the investigator believes it may be as good as the standard procedure, which takes a long time. The temptation in such a situation is to “prove the null hypothesis.” But it is impossible to “prove” the null hypothesis. All statistical tests can do is reject the null hypothesis or fail to reject it. We do not prove the hypothesis by gathering afﬁrmative or supportive evidence, because no matter how many times we did the experiment and found a difference close to zero; we could never be assured that the next time we did such an experiment, we would not ﬁnd a huge difference that was nowhere near zero. Rather, we try to falsify or reject our assertion of no difference, and if the assertion of zero difference withstands our attempt at refutation, it survives as a hypothesis in which we continue to have belief. Failure to reject it does not mean we have proven that there is really no difference. It simply means that the evidence we have “is consistent with” the null hypothesis. The results we obtained could have arisen by chance alone if the null hypothesis were true. (Perhaps the design of our study was not appropriate. Perhaps we did not have enough patients.)
So what can one do if one really wants to show that two treatments are equivalent? One can design a study that is large enough to detect a small difference if there really is one. If the study has the power (meaning a high likelihood) to detect a difference that is very, very, very small, and one fails to detect it, then one can say with a high degree of conﬁdence that one can’t ﬁnd a meaningful difference between the two treatments. It is impossible to have a study with sufficient power to detect a 0 % difference. As the difference one wishes to detect approaches zero, the number of subjects necessary for a given power approaches inﬁnity.