Will Changing p-value Lead to Better Care


p-value, Zignifica, Trustworthy Evidence


No not really!


A group of researchers have suggested lowering the p-value accepted for statistical significance from today’s most used 0.05 to 0.005.


While this may reduce some problems in clinical research it will give rise to some new.


And it is not at all where we should focus!

With 85% of resources spent on clinical trials leading to useless results, there is a demand to improve the trustworthiness and relevance of clinical trials.  Add to that the problems with p-hacking and low reproducibility of findings from trials, and the problem is gigantic. After all, clinical research is expected to be a  60 billion market in 2020. So it may seem obvious to suggest raising the bar for when we accept something proven as recently proposed by a group of researchers. It is, however, the least of our problems, will not solve the real problems, and will give us new challenges to handle.

Significant Results

The most common method to prove the significance of a trial’s discovery is to use the p-value. The p-value is, in short, a mathematical calculation of the probability (hence the “p”) for the presented findings (or something more extreme) to be due to pure chance. The p-value is a calculated probability for the findings or even more extreme results being due to chance. The p-value does not relate directly to the research question but to the data collected – or better analyzed – and is not a measure of relevant differences between groups in a trial, or of the power of the findings.

Zignifica, trustworthy evidence, p-valueThe use of p-value to define results as “statistically significant” goes almost 100 years back to 1925 when Sir Ronald A. Fischer presented the idea and definition in his book ” Statistical Methods for Research Workers,” where he suggested p=0.05 as the limit to judge whether a deviation should be considered significant or not.

A group of researchers has just proposed lowering this threshold to p=0.005 as a way to deal with the problems with useless findings and low reproducibility of results. This means going from the chance of the findings (or something more extreme) only being due to chance from 1 out of 20 results to 1 out of 200. It may seem like a perfect idea: cut away much of the noise and focus on less but better results. Unfortunately are these results not necessarily better.

The p-value is only referring to the analyzed data, cannot distinguish between “true or false”, and does not say anything about the clinical meaningfulness of the results. Being only related to the data analyzed, the relevance of the findings – and thereby the p-value – is heavily dependent on the appropriateness of the design of the study which includes sample size, in- and exclusion criteria, outcome measurement and much more. The p-value does not say anything about this.


Ronald A. Fischer and  ” Statistical Methods for Research Workers”

When Ronald A. Fisher, a UK based statistician, wrote and published “Statistical Methods for Research Workers” he hardly had any idea of the enormous impact this book would have on clinical research almost 100 years later. Zignifica, trustworthy evidenceIn the book, Fischer wrote

…it is convenient to take this point [p = 0.05] as a limit in judging whether a deviation is to be considered significant or not. Deviations exceeding twice the standard deviation [approxemately equals p=0.05] are thus formally regarded as significant

Voila, “are thus formally regarded as significant,” and so it became and continued to be.

No doubt that this had never been Fischer’s intention that this cut-off value of p = 0.05 should be used as a black or white way to distinguish between right or wrong. There is no scientific reason for this choice it is purely arbitrary. In the book, Fischer also wrote

Whether this procedure should, or should not, be used must be decided, not by the mathematical attainments of the investigator, but by discovering whether it will or will not give a sufficiently accurate answer.

So it was never Fischer’s intention to use the p-value to dichotomize between true or false. The p-value is one, and only one, brick in the puzzle of evaluating the usefulness of results from research.

Significance and Publication

Multiple studies and analysis have made it clear that it is easier to get a paper published if it presents significant results. Results with p-values higher than 0.05 are often not published. Researchers may see them as negative and do not want to publish the results, or the papers are rejected by the editors of the medical journals. A research project looking at more than 1.6 million articles published from 1990 to 2015 found that 96% of papers presenting p-values had one or more p-values at or lower than 0.05.

Minimising the risk for post-study adjustments being made for achieving the magical small p-values are one of the reasons why researchers are encouraged to publish a formalized description of the study before study start on pages like clinicaltrials.gov. Unfortunately, the compliance with own pre-published information seems to be rather low, though. Outcome measures may disappear in the final paper, or new tests added. Exclusion/inclusion criteria are sometimes changed, and much more. All this may – just may – be post-study adjustments to reach significance. We call it p-hacking.

Lowering the accepted p-value to 0.005 will obviously make this exercise more challenging, but it will still be possible. One of the ways to increase the chance for a lower p-value s by increasing the sample size and increase the number of tests and parameters. The growing access to big data will often make this an obvious and easy solution, but that does NOT make results more relevant.

Clinical Relevance

To be sure that results from a clinical trial are meaningful for a patient must the outcomes be relevant for the patient and be big enough to be something that can be acknowledged as a change. Like a detectable reduction in pain, not just a statistically significant decrease. Furthermore, must persons at least somewhat like the individual patient be included in the study population, the study must be designed to reflect real-life situations, and there must be 100% compliance with the pre-trial publication of the trial.

Zignifica, trustworthy evidenceReal Life Clinical Trials

It is easy to reject a published article presenting a clinical trial, due to little or no clinical meaningfulness. Clinical trials are complicated and extremely expensive to run with a lot of rules and regulatory factors to follow. The most important is, however, not to make it easier for researchers to get their work published, but to help patients to a better life.

Reducing the cut-off p-value for statistical significance to 0.005 may be a small contribution, but it is also a risk for rejecting clinical relevant information since the p-value has absolutely nothing to do with that. 

One of the fields in which we see these problems unfolded is in oncology. We often don’t know how well new cancer drugs and diagnostics work, thanks to ill-designed clinical trials with statistically significant findings.

So let’s change the focus and go for trustworthy evidence instead. There are so much at stake here, both regarding human lives, quality of life, and the economy of the society.