How to Report Statistics: P-values, Confidence Intervals and Effect Sizes (A Reviewer's Checklist)

There is a moment in almost every statistical review where the science looks fine but the reporting is not. A bare p-value with no effect size. The word “significant” used to mean “large.” A table of tests with no mention of which test was used or why. The analysis underneath may be perfectly sound, but the reader cannot tell, and a reviewer who cannot tell starts to doubt the rest.

The reassuring part is that most of these are reporting problems, not analysis problems. Surveys of submitted manuscripts find that the great majority fail to describe their tests clearly, and that a striking share draw conclusions their data do not support. You can avoid almost all of it without touching your analysis, just by reporting it the way a reviewer reads it. Here is how.

1The p-value is not the result

A p-value answers one narrow question: if there were truly no effect, how surprising would data like yours be. That is all. It is not the size of the effect, not the probability that your hypothesis is true, and not a measure of importance. Reported on its own, it tells the reader almost nothing they can use. Two more numbers carry the actual finding: the effect size and its confidence interval.

Figure 1. The effect size (the dot) is your answer: how big the difference is. The confidence interval (the bar) is its precision: how sure you are. The p-value collapses all of this into a single yes-or-no about the dashed line. Report all three and the reader sees the whole picture.

2Pair every effect with a confidence interval

The effect size is the answer to the question your study asked: the difference in means, the odds ratio, the hazard ratio, the correlation. The confidence interval is how precisely you measured it. A narrow interval says you have pinned the effect down; a wide one says you have not, even if the p-value is small. Report them together, always. “Risk fell by 38% (95% CI 12% to 56%)” tells a reviewer everything; “p < 0.05” tells them almost nothing.

Figure 2. The same finding, reported two ways. The second gives the reader the magnitude, the precision and an exact p-value, so they can judge the result instead of taking it on trust.

3“Significant” is a statistical word, not a synonym for large

In a results section, “significant” should appear only when there is a test and a p-value behind it. Using it to mean “big” or “important” is one of the fastest ways to invite the comment every author dreads: significant in what sense? If you mean the effect was large, say it was large and give the number. Keep “significant” for statistical significance, and even then, let the effect size and interval carry the weight.

4Name the test, and show you checked its assumptions

The single most common reporting failure is not naming the statistical test, or naming it without justifying it. For every analysis, the reader should be able to see which test you used, why it was the right one, and that you checked what it assumes. A t-test assumes things a Mann-Whitney does not. A linear regression assumes things you should have looked at. State the test, state that the assumptions were checked, and name the software and version you used. This one paragraph removes a whole category of reviewer doubt.

5Give exact numbers, and report everything you pre-specified

Two habits round it out. First, report exact p-values, “p = 0.03,” not “p < 0.05,” and not a bare “n.s.” The exact value carries information the threshold throws away. Second, report all the outcomes you pre-specified, not only the ones that reached significance. Quietly dropping the analyses that did not work, or running many comparisons and reporting the one that hit, is exactly the pattern reviewers are trained to catch. Pre-specify, then report the lot.

Figure 3. Six lines that remove almost every statistical comment a reviewer makes about reporting. None of them changes your analysis. They change whether the reader can trust it.

None of this turns a weak result into a strong one, and none of it is about doing fancier statistics. It is about letting the reader see the size of what you found, how sure you are, and how you got there. Do that, and the reviewer reading your results has nothing to flag, which is exactly the position you want to be in.

How to Report Statistics: P-values, Confidence Intervals and Effect Sizes

1The p-value is not the result

2Pair every effect with a confidence interval

3“Significant” is a statistical word, not a synonym for large

4Name the test, and show you checked its assumptions

5Give exact numbers, and report everything you pre-specified

Priv.-Doz. Dr. med. Sied Kebir, MD, PhD

Want your results read like this before a reviewer sees them?