Most researchers do not choose a statistical test. They inherit one. A supervisor used a t-test, so the next student uses a t-test, and the one after that. It works often enough that nobody asks why, until a reviewer does. The reassuring part is that choosing the right test is not a feat of memory. It follows from three plain questions about your data, asked in order. Answer them and the test almost always picks itself.
1Start with your data, not the test
The mistake is to start from the test you already know and look for a way to apply it. Start from the other end. What kind of outcome did you measure, how many groups are you comparing, and does your data behave the way the test expects. Those three questions, in that order, lead you to the right test for almost every common study. Figure 1 is the map they lead to.
2Question one: what kind of outcome did you measure?
Everything starts with the type of your outcome, the thing you are actually comparing.
- A continuous outcome is a number on a scale, like blood pressure, tumour volume, a lab value or a symptom score.
- A categorical outcome is a label or a count, like alive or dead, responder or non-responder, or the number of events.
- A time-to-event outcome is how long until something happens, like overall survival or time to progression.
Each type points to a different family of tests, the three columns of Figure 1. Get this one wrong and nothing downstream is right. A survival outcome forced into a t-test, for example, throws away the timing and the censoring that make it a survival outcome at all.
3Question two: how many groups, and are they linked?
For a continuous outcome, the next question is how many groups you compare and whether they are independent or paired.
- Two independent groups, treatment against control, point to a t-test, or to the Mann-Whitney test when the data are skewed.
- Two measurements on the same people, before against after, are paired and need a paired test. Treating paired data as independent is one of the most common errors, and it usually makes your result look weaker than it really is.
- Three or more groups call for analysis of variance (ANOVA), not a string of t-tests between every pair.
4Question three: does your data meet the test's assumptions?
The familiar tests, the t-test, ANOVA and Pearson correlation, are called parametric. They assume your data follow roughly a normal distribution, the symmetric bell shape. When the data are clearly skewed, or the sample is small and you cannot tell, the safer choice is a non-parametric test. Mann-Whitney replaces the t-test, Wilcoxon replaces the paired t-test, Kruskal-Wallis replaces ANOVA. These ask less of your data and rarely cost you much. Figure 3 shows the difference that decides it.
5When you are looking at a relationship, not a difference
Sometimes you are not comparing groups at all. You want to know whether two things move together, or whether one predicts another. For two continuous variables that rise and fall together, use correlation, Pearson when both are roughly normal and Spearman when they are not. When you want to predict an outcome from several variables at once, and to adjust for confounders, you need regression. Linear regression handles a continuous outcome, logistic regression a yes-or-no outcome, and Cox regression a time-to-event outcome. Regression is also how you answer the question reviewers ask most, whether your effect still holds after accounting for age, stage and the other usual suspects.
6The mistakes that cost you a reviewer's trust
A handful of errors come up again and again, and a reviewer spots every one of them in seconds. None is exotic, and all are avoidable once you have answered the three questions honestly. Figure 4 is the short list to hold your own analysis against before you submit.
Choosing a test is not the hard part of research, but choosing the wrong one quietly undermines everything built on top of it. Start from your data, answer the three questions in order, and the choice is usually obvious. When it is not, that is the moment to ask someone before you run the analysis, not after a reviewer has sent it back.