4 Hypothesis tests
Statistics is useful as it enables us to answer questions of interest, for example whether a new treatment is better than the current treatment. We start with a null hypothesis and then examine evidence to see if it can be sustained. A hypothesis test involves testing a claim, or null hypothesis $$H_{0}$$, against an alternative, $$H_{1}$$. A decision to reject $$H_{0}$$ or not reject $$H_{0}$$ uses sample evidence to calculate a test statistic which is used to obtain a p-value. The p-value is a useful reformulation of the test statistic and is the probability of obtaining the test results or results more extreme if the null hypothesis is true. $$H_{0}$$ is maintained unless it is made untenable by sample evidence i.e. the p-value is less than or equal to $$(\leq)$$ some pre-specified critical value. On the basis of the sample evidence the null hypothesis is either rejected or not rejected. Table 3 shows the different situations that can arise.
Rejecting $$H_{0}$$ when we should not is a Type I error. The probability of making a Type I error is called the significance level, $$\alpha$$. Not rejecting $$H_{0}$$ when we should is a Type II error, which has probability $$\beta$$. The acceptable levels of committing a Type I error and a Type II error are specified before an analysis is conducted and this acceptable Type I error rate provides the critical value mentioned above. The power of a hypothesis test is the probability of rejecting the null hypothesis when it is actually false (power $$= 1-\beta$$).
Table 3: Testing hypotheses
One-sided vs two-sided testing
A two-sided test is one in which the alternative hypothesis does not state a particular direction for the effect or difference. Conversely a one-sided test is one in which the alternative hypothesis is that an effect or difference is in a particular direction (e.g. greater than zero). It should be either theoretically plausible or interest only lies in one direction. For example, suppose that a new technology or treatment has been developed that is much cheaper than the existing treatment and it may be that interest lies in proving only that it is no worse. Provided the new treatment is at least as good as the old treatment then it will be used, as it is much cheaper. This is one of the few occasions when a one-sided test is justifiable in Medicine. If a one-sided test is to be used it should be stated at the design stage.Simple statistical tests
When comparing two groups it is important to distinguish between independent groups and paired groups. Two groups are considered to be independent when subjects are either randomly sampled from two distinct populations or randomly assigned to one of two groups. Two groups are considered to be paired when they consist of observations made within the same individual or between individuals who are explicitly paired.
Table 4: Simple statistical methods for comparing two groupsComparison
Data type
Assumptions
Method
Difference between two independent groups
Numerical:
Measurable
Normally distributed
Independent samples t-test
Not Normally distributed
Mann-Whitney U test
Count
Mann Whitney U
Categorical:
Binary
Large sample, most expected frequencies > 5
Chi-squared test
Small sample, at least 1 expected frequency < 5
Fisher’s exact test
Nominal
More than two categories
Most expected frequencies > 5Chi-squared test
Ordinal
Mann-Whitney U
Difference between paired groups:
Numerical:
Measurable
Differences Normally distributed
Paired t-test
Differences not Normally distributed
Wilcoxon matched pairs test
Count
Wilcoxon matched pairs test
Categorical:
Binary
McNemar’s test
Nominal
More than two categories
No simple test available, consult a statistician
Ordinal
Wilcoxon matched pairs test or sign test