The Men Who Mistook p Values for Significant Results: Evidence-Based Medicine, Statistical Fallacy, or Neuropathology?

Ever wondered why coffee is good one day and evil the other day? Did you play Mozart for your kids and hoped they grow up more intelligent ? Same thing happens in medicine, like it or not evidence-based medicine is the pillars of modern medicine. If you think about phrenology, bloodletting and many other things that believed to be effective in the past few decades and centuries you would know that we have come a long way. Before diving into the topic of evaluation of research , clinical trials, and evidence-based medicine vs. science-based medicine, it  would be necessary to briefly review some important features of critical thinking and logical fallacies and some evolutionary facts about the neuroscience behind logic:

Differences between deductive and inductive reasoning: 

Deduction: Based on premises comprised of proven facts and semantic knowledge. e.g. Humans are mortal, socrates is a human, therefore he is mortal.[1]

Induction (Scientific method): Sherlock Holmsian way of looking at things, based on observations and pattern recognitions, not always true! but can transform bogus claims into proven facts, or in another words Inductive reasoning injects proven knowledge and theories into deductive realm. it requires constant debunking and questioning. e.g. in ancient times people thought the earth is flat, it took thousands of years to  delete that fact from peoples semantic knowledge.[1,2]

 

Sources of Bias:

As Chavaliers and Ioannidis write in ” mapping 235 biases in biomedical research” there are many different biases out there that confound and obscure proper research. Biases emerge depending on the type of research, researchers, and grant sources, and they can be conscious, unconscious or subconscious. [11] Here is a few :

Expertise and experience (subjectivism): One of the main sources of bias is experience and expertise of doctors or scientists. Imagine if you worked your entire life to proof the efficacy of a treatment, you will end up being biased about that topic because you have invested your life on it and you are convinced that there is enough evidence to support your idea. This is how dogmatic thinking obscure the scientific vision for further exploration.

Confirmation bias: The urge to reject the null hypothesis because of researcher’s inclination to proof that the hypothesis is correct.

Financial gain: Wherever and whenever money is involved.Industry funded research, grants from government agencies. after all researchers have to make sure that the source of grant is satisfied for further grants and supports.

Publication bias: How many studies have you read that suggest the study failed to reject the nul hypothesis or in a simple language “our study did not prove anything” ? None!  scientific journals have their own unwritten rules and they selectively publish studies with positive results.

Logical Fallacies: Fallacies are other forms of biases:

-Post Hoc Ergo Propter Hoc (After this therefore because of this): e.g. Neurologist says (tPA lover and NINDS believer):  I gave thrombolytics, and after that the patient’s symptoms resolved, therefore thrombolytics  were the cause. [9]

-Statistical Fallacy: Inferences on cause and effects based on statistical significance without ruling out different explanations and evaluation of study designs. e.g. NINDs Trial in 1995 showed the efficacy of thrombolysis in ischemic stroke, despite the fact that it could not rule out the possibility of TIAs and its control group was worse than the experimental group at baseline. 😉

-Argumentum ad Verecundiam (Appeal to authority): Accepting a claim based on the source’s education, credibility, and reputation and achievements. e.g. Only because the national institute of neurological disorders and stroke (NINDS) supported the tPA trial does not mean it is a valid claim. ;))

-Argumentum ad Ignorantiam (Appeal to Ignorance): Using absence of proof for a claim as evidence for the truth of the opposing proposition. e.g because the possibility of TIA could not be ruled out, therefore the NINDS trial’s results are valid and tPA should be the standard of care for ischemic stroke management. :)))

 

Human Brain, Pattern Recognition , and Flawed Research

“When information overload occurs, pattern recognition is how to determine truth” Marshal McLuhan

We chunk information together, segment and de-segment auditory input, we see patterns every where, clouds that look like animals and objects, we recognize faces based on facial patterns, diagnose patients based on patterns, invest based on market patterns, schools accept students based on their previous academic patterns. Patterns are every where and we are amazing at recognizing them. Even sometimes we see patterns where they do not exist.

Why? because it makes it easier to deal with larger chunks of information. We have comparators in our brains and they compare a stimulus with previously stored information (experience), this is the reason why we need practice when exposed to a new learning task and that is why sometimes we can tell something is not right without knowing why. If you have never seen a dog in your life you will never see a cloud that looks like a dog!

How about Ribosomes? They are my favourite pattern recognition machines, and they live inside us. Ribosomes read genetic codes and synthesize proteins. Like the Allan Turing machine that could read and recognize infinite combinations of 0s and 1s, Ribosomes make sense of amino acids and genetic codes.

Well, just like our Ribosomes we do the same thing, we sacrifice extra effort to check all the facts and possible explanations to save some extra neural processes and make inferences based  present knowledge in our brains. And this is why the scientific method and critical thinking are quite painful! Pattern recognition without further evaluation of facts is the main reason why dogmatic and fallacious thinking occur.

You might ask, how is this related to research? quantitative measures often are based on observations and patterns. In medicine we look for common patterns among diseases and their signs and symptoms, and in research we look for common patterns among the cohort and assign quantitative values to them and statistically analyze them. There is a funny unwritten rule in research that suggests, if you believe in something you can prove it statistically.  The very basic fact that if you measure the same data with different statistical method  you end up with different results is the very main reason that many think the wiggle room in statistical methods is the culprit.

The P value and hypothesis tests

One thing too keep in mind before talking about the p value is that, mathematics is a language with well defined components made out of numbers and symbols. The reason it does not evolve the same way as our common languages such as english and french is that not everybody speaks it and the purity of it has been preserved over centuries.[1,3,4,11] just like a tribe on an isolated island that have kept the tradition and language the same for ages. Though, this does not mean that there is no flexibility in the language of math, we can present mathematical facts in different ways, especially when we assign values to our day to day qualitative phenomena to make a quantifiable measure out of them.

The p value is the probability of obtaining results equal or more extreme than what was actually observed. It is almost an informal index to be used as a measure of discrepancy between the data and the null hypothesis. We all look for a p value of 0.05 or less to make sure our data could explain the that our hypothesis is 95% correct and there is only a 5% chance the null hypothesis would outweigh our results. In most studies we see this ”  when p= 0.05, there is 95% chance or greater that the null hypothesis is incorrect, but p value only measures that the null be true, and not false.

One of the criticisms of the p value , is that p value is a measure of evidence that did not take into account the size of the observed effect, sometimes a small effect in large sample size can have the same p value as a large effect in a small cohort.  This is the main reason why we use confidence intervals rather than p values, but it was Pearson and Neyman who immortalized the p values as a part of their hypothesis testing. The outcome of a hypothesis test was to be a behaviour not an inference: to reject one hypothesis and accept the other based on the data, which puts the researcher at risk for two types of errors behaving as though two therapies differ when they are actually the same or false positive (type I error) or concluding that they are the same when in fact they differ or false negative (type II error)  as doctor Steven Goodman states in his paper on Toward evidence-based medicine statistics ” Hypothesis tests are equivalent to a system of justice that is not concerned with which individual defendant is found guilty or innocent (that is wether each separate hypothesis is true or false) but it tries instead to control the overall number of incorrect verdicts (that is the long run of experience, we shall not often be wrong). Controlling mistakes in the long run is a laudable goal, but just as our sense of justice demands that individual persons be correctly judged, scientific intuition says that we should try to draw the proper conclusions from individual studies”.

EBM

Evidence-based medicine is the most modern version of EBM. Experience-based medicine and expert-based medicine were the earlier forms of EBM  that ruled our practice for centuries and still play a big role in modern medicine. Biostaticians and epidemiologists from Hippocrates to John Graunt, and  John snow to Doll and hill, have provided amazing insights into cause and effect, and correlations among many disease causing phenomena, but it is not until recent years that a great shift in the realm of research has caused  turmoil and in our understanding of modern science and medicine in general. [3,6,13]

 

EBM emphasis is on randomized controlled trials and statistical analysis to determine risks and benefits of an intervention, to help practitioners make their decisions based on current evidence.  Here you can see the history of EBM from the Hippocrates era to the modern medicine and a work of one of my great heroes out of McMaster university, the late David Sackett, MD [2,4,7] :

Soon EBM with all its glorious debut began to see a decline in trust due to multiple dubious results coming from “bad research” “big Pharma” and “alternative medicine research”.(1,2,5) Is acupuncture better than placebo? how about Homeopathy?  Is Zofran better that Droperidol? Why does FDA black boxes some medications and approves others. Level a, b, c, d …. recommendations ? where do they come from? How do they find significant results? p-values of <0.05 ? These confusing results and the urge to get published, pushed scientists and clinicians to come up with hypotheses and study designed that in some instances even omitted basic scientific and clinical facts .[3,5,8]

One important thing that has been forgotten in the recent years is the value of basic sciences and clinical experience. we have forgotten the very basic element of science which is based on well established  proven facts and theories that are reproducible and have withstood many tests, and substitute that for research paper published in prestigious journals to the point that these studies have been used in malpractice cases in favour of the practitioner at fault.

 

 

In conclusion I would have to add that statistical methods and the scientific method have been tremendously helpful to our understanding of the natural snd synthetic phenomena around us. P values, hypothesis testing, correlation formula, different study designs are all amazing tools that can give us amazing insights into our current problems in medicine and other fields of science, but using these tools in the wrong setting can be harmful and a waste of time, money and resources. Just like the Ottawa ankle rule, heart score, head Ct rule, ABCD2, CHADSVASC, HASBLED, Well’s criteria, and so many other clinical tools that give us amazing but simplified pattern recognition power, one should practice clinical Judgment, consider individual differences, and know that in order to reproduce the results of a study or a clinical tool an identical setting and criteria are detrimental pieces to success.  Tacit knowledge, clinical expertise and those basic science text books we studied in undergrad and medical school play an integral part to our knowledge.

 

Watch this video by John Ioannidis MD, phd on “why most clinical research is not useful”

And this video concludes this post:

 

 

Refrences:

[1] K. M. Fedak, A. Bernal, Z. A. Capshaw, and S. Gross, “Applying the Bradford Hill criteria in the 21st century: how data integration has changed causal inference in molecular epidemiology,” Emerg Themes Epidemiol, vol. 12, Sep. 2015.
[2] D. L. Sackett and M. Gent, “Controversy in counting and attributing events in clinical trials,” N. Engl. J. Med., vol. 301, no. 26, pp. 1410–1412, Dec. 1979.
[3] D. Grahame-Smith, “Evidence based medicine: Socratic dissent.,” BMJ, vol. 310, no. 6987, pp. 1126–1127, Apr. 1995.
[4] D. L. Sackett, W. M. Rosenberg, J. A. Gray, R. B. Haynes, and W. S. Richardson, “Evidence based medicine: what it is and what it isn’t,” BMJ, vol. 312, no. 7023, pp. 71–72, Jan. 1996.
[5] A. Bhatt, “Evolution of Clinical Research: A History Before and Beyond James Lind,” Perspect Clin Res, vol. 1, no. 1, pp. 6–10, 2010.
[6] R. B. Haynes et al., “Improvement of medication compliance in uncontrolled hypertension,” Lancet, vol. 1, no. 7972, pp. 1265–1268, Jun. 1976.
[7] D. L. Sackett, L. Macdonald, R. B. Haynes, and D. W. Taylor, “Labeling of hypertensive patients,” N. Engl. J. Med., vol. 309, no. 20, p. 1253, Nov. 1983.
[8] M. Gent, S. R. Leeder, and D. L. Sackett, “Making research relevant: experience in a Canadian health region,” Med. J. Aust., vol. 2, no. 24, pp. 807–812, Dec. 1977.
[9] G. W. and T. N. Group, “Thrombolytics for Stroke | TheNNT.” .
[10] D. L. Sackett and R. B. Haynes, “The architecture of diagnostic research,” BMJ, vol. 324, no. 7336, pp. 539–541, Mar. 2002.
[11] D. Chavalarias and J. P. A. Ioannidis, “Science mapping analysis characterizes 235 biases in biomedical research,” J Clin Epidemiol, vol. 63, no. 11, pp. 1205–1215, Nov. 2010.
[12] “The beta-blocker heart attack trial. beta-Blocker Heart Attack Study Group,” JAMA, vol. 246, no. 18, pp. 2073–2074, Nov. 1981.
[13] “The SkepDoc.” [Online]. Available: http://www.skepdoc.info/. [Accessed: 20-Dec-2016].

Leave a Reply

Your email address will not be published. Required fields are marked *