Evidence Based Medicine and the p-value

Evidence Based Medicine

 

The p-value is a mathematical method for analyzing data. Unfortunately, it is often both misunderstood, misused, and miscommunicated in medical research.

It’s emerging use is challenged by American Statistical Association (ASA) followed by a paper and an editorial in Journal of American Medical Association (JAMA).

And this is good news.

 

In clinical trials, the treatment or procedure we want to examine is often compared to another treatment, to placebo, to persons on a waiting list, or another well-defined group. If such a comparison is not made, it is only an observation with very limited clinical usefulness.

To compare the effect of treatment (group A) to effect of other treatment/placebo/whatever (group B), it is not enough to state how many people in group A and how many individuals in group B achieved the pre-specified outcome(s). What we want to know is if the difference is meaningful. In other words, we want to know if group A did better than group B. The most used method for evaluating this is by calculating the p-value, but the problem is, that the p-value is not useful to make such an evaluation.

What researchers do, mathematically, is to set up an assumption that there is no difference between the groups (called a null hypothesis), for then calculating the probability of the observed difference or an even more extreme difference between group A and group B if it was only due to chance.

This probability is what we call the p-value.

For this to have any meaning at all. People in the study population must be randomly allocated to either group A and group B, there must be no cross-over between the groups, and the outcome(s) must be known before starting the study. The p-value is not the probability of the research question being true. A p-value is a number calculated from analyzing data with a mathematical method, calculating the chance of finding the result presented or a more extreme result. This means that a arise from a study can be labeled “statistically significant” and have a small p-value without being a meaningful result for the patient. Statistical significance and p-values are two sides of the same coin. Being statistical significance means that the probability of getting the findings in a study by chance is small, but even if the findings are not just a case of chance, they still do not need to be relevant for the patient at the point of care. Often they are not.

The mathematic calculation of the p-value is easy to do on a laptop or even a smartphone with high precision, but just s no house is not better than the materials of which it is built, so is the results from a medical study not better than the data it provides. And our mathematical precision by far exceeds our ability to design and execute clinical relevant medical studies. So most important don’t get misled by results that seem to have high accuracy.

The validity of scientific conclusions depends on more than the statistical methods themselves. Appropriately chosen techniques, properly conducted analyses and correct interpretation of statistical results also play a fundamental role in ensuring that conclusions are sound and that uncertainty surrounding them is represented correctly

statistical significantSo the p-value

statistical significant* Is not proof of effect of the treatment in group A
* Cannot divide the result to “truth” / “not truth.”
* Is not referring to the given treatment, but only to the data analyzed
* Is not necessarily related to meaningful outcome for the patient
* Is not necessarily reproducible (most of the results are not)
* Is not proof of evidence for the treatment
* Is not absence of evidence because p is greater than 0.05
* Is not an indication of clinical meaningfulness, not even a very small p-value, say <0.001
* A p-value at 0.05 = 5% = 1 out of 20 is most often selected as “statistical significant”. There is no mathematical or other scientific reason for choosing this value; it is poor random.

To find if the difference between group A and group B represents a meaningful difference for patients, we must look at several factors some of which are

* The power of the findings also compared to significant changes
* The study population (including who is included/excluded/dropped out)
* The size of the study population
* The study period
…and so much more
Meaningful change is a change that can be detected by the patient, and at the same time is relevant to the patient.

The American Statistical Association (ASA) is worried about the increasing use and misuse of p-values. Even so worried that they for the first time at all recently made a statement on a statistical practice. In a landmark statement, they said

P-values do not measure the probability that the studied hypothesis is correct or the likelihood that the data were produced by random chance alone.

and even more serious or should I say “significant” talk:

Pragmatic considerations often require binary, “yes-no” decisions, but this does not mean that p-value alone can ensure that a decision is correct or incorrect. The widespread use of “statistical significance” (generally interpreted as “p<0.05″) making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.

Short after the ASA paper, another article was published in JAMA about the increasing and widespread use of p-values as a scientific argument for “evidence”. Using text mining the authors had identified 4,572,043 p-values in 1,608,736 MEDLINE abstracts and 3,438,299 P values in 385,393 full-text articles. They found that

Reporting of P values in abstracts increased from 7.3% in 1990 to 15.6% in 2014. In 2014, P values were reported in 33.0% of abstracts from the 151 core clinical journals (n = 29 725 abstracts), 35.7% of meta-analyses (n = 5620), 38.9% of clinical trials (n = 4624), 54.8% of randomized controlled trials (n = 13 544), and 2.4% of reviews (n = 71 529). The distribution of reported P values in abstracts and in full text showed strong clustering at P values of .05 and of .001 or smaller.

Indeed not a minor problem!

In an accompanying editorial, the message is clear

Statistical and scientific inference need not be constricted by such rigid thinking…. …..the automatic application of dichotomized hypothesis testing based on prearranged levels of statistical Statistical and scientific significance should be substituted with a more complex process using effect estimates, confidence intervals, and even P values, thereby permitting scientists, statisticians, and clinicians to use their own inferential capabilities to assign scientific significance.

In other words: evaluating outcomes of clinical trials that are relevant to the patient is much, much more complicated than just looking at a p-value.

We are developing Zignifica to do this, an app that uses complicated algorithms to analyse the method and design of a study, and to evaluate the clinical meaningfulness of the results. A prototype of the Zignifica app is ready for testing. The prototype contains studies in the field of pain medicine and pain management. We wold like to hear YOUR opinion, and invite you to test the Zignifica Prototype, and help us with feedback.

Click Here to Be Part of Zignifica Prototype Test Team

Click Here to Be Part of Zignifica Prototype Test Team

START TESTINGY