How to read a scientific paper pt 1: H0, Ha, p values

medicRob

Forum Deputy Chief
1,754
3
0
Background:

Many paramedics have not been fortunate enough to be part of a program which offers a course on research and its evaluation, and I can think of less than 4 paramedic programs in the US which require Probability & Statistics, leaving EMS professionals to pretty much fend for themselves with regard to current literature and evidence-based research. Therefore, I have taken the time to compile a few resources showing how to evaluate a scientific research paper speaking on various topics such as the p values, correlation, regression, etc. In essence, this post is a basic statistics overview for clinicians.

* All of the third party resources provided below are either resources freely available to the public with no expectation of royalties or copyright claims (although credit is given where credit is due) or I have the expressed permission of the controlling entities to distribute this content on the forum (Admins, feel free to contact me for proof)

Resources

The first resource I would like to present, is a series of 4 articles published in the Canadian Medical Association Journal. I have pieced the 4 articles together into a single PDF for your convenience.

Click Here for The Articles1-4

Statistics in Research

Without statistics, research would fall apart. We use statistics in medicine to ensure that our treatments constantly evolve and work to decrease mortality and ensure a better outcome for our patients. Statistics tells us when a drug is too dangerous to be allowed on the market or when the risks outweigh the benefit of a given treatment. Additionally, statistics tell us when something we are doing either is or is not working, such as the research that states early defibrillation is key to survivability in cardiac arrest. Statistics are the basis upon which professions develop and evolve.

That being said, there are several key statistics that we use in evaluation of scientific research. It is my hope that the articles above coupled with the videos I have chosen to include in this post will give you a fundamental understanding of how to evaluate a research paper. I hope that your initiative and willingness to learn is enough to further your study into research evaluation.


Common Statistical Techniques

The most common statistical technique that we will come across in research is the p value. However, in order for the p value to be properly explained, I must explain to you another term known as a null hypothesis. The Null hypothesis (H0) is the claim that we evaluate in our statistical hypothesis testing. Let's take a look at this video explaining the Null hypothesis (H0)

[YOUTUBE]http://www.youtube.com/watch?v=mGUPOdXN8gA[/YOUTUBE]

Now that we have learned what a null hypothesis is, we can move on to the p value (Forgive the

[YOUTUBE]http://www.youtube.com/watch?v=ZFXy_UdlQJg&playnext=1&list=PL8D5A2C76EBC21276[/YOUTUBE]

To help you remember:
"P is low, no must go."

The Alternative Hypothesis (Ha or H1) is known as our research hypothesis. It is the question we are asking, we weigh this hypothesis against our H0 to determine whether our hypothesis is true or false.

The following video will give you an overview of Alternative Hypothesis.

[YOUTUBE]http://www.youtube.com/watch?v=zZ0fz7KMtng[/YOUTUBE]


So, to sum it up, the null hypothesis (H0) is what our Alternative Hypothesis (Ha or H1) is setup to test and, possibly, disprove. The p value is our method of measuring this.

The CMAJ article cites a perfect example where a p value would be considered useful involving something as simple as flipping a coin. Thus, I quote:

"way that an investigator can go wrong is to conclude
that there is a difference in outcomes between a treatment
and a control group when, in fact, no such difference
exists. In statistical terminology, erroneously concluding
that there is a difference is called a Type I error, and the
probability of making such an error is designated a. Imagine
a situation in which we are uncertain whether a coin is
biased. That is, we suspect (but do not know for sure) that a
coin toss is more likely to result in heads than tails. We
could construct a null hypothesis that the true proportions
of heads and tails are equal. That is, the probability of any
given toss landing heads is 0.5, and so is the probability of
any given toss landing tails. We could test this hypothesis
in an experiment in which the coin is tossed a number of
times. Statistical analysis of the results would address
whether the results observed were consistent with chance.
Let us conduct a thought experiment in which the suspect
coin is tossed 10 times, and on all 10 occasions the result
is heads. How likely is this result if the coin is unbiased?
Most people would conclude that this extreme result is
highly unlikely to be explained by chance. They would
therefore reject the null hypothesis and conclude that the
coin is biased. Statistical methods allow us to be more precise
and state just how unlikely it is that the result occurred
simply by chance if the null hypothesis is true. The probability
of 10 consecutive heads can be found by multiplying
the probability of a single head (0.5) by itself 10 times: 0.5 x
0.5 x 0.5 and so on. Therefore, the probability is slightly
less than one in 1000. In an article we would likely see this
probability expressed as a p value: p < 0.001. What is the
precise meaning of this p value? If the null hypothesis were
true (that is, the coin was unbiased) and we were to repeat
the experiment of the 10 coin tosses many times, 10 consecutive
heads would be expected to occur by chance less than
once in 1000 times. The probability of obtaining either 10
heads or 10 tails is approximately 0.002, or two in 1000.
In the framework of hypothesis testing the experiment
would not be over, for we have yet to make a decision. Are
we willing to reject the null hypothesis and conclude that
the coin is biased? How unlikely would an outcome have to
be before we were willing to dismiss the possibility that the
coin was unbiased? In other words, what chance of making
a Type I error are we willing to accept? This reasoning implies
that there is a threshold probability that marks a
boundary; on one side of the boundary we are unwilling to
reject the null hypothesis, but on the other we conclude
that chance is no longer a plausible explanation for the result.
To return to the example of 10 consecutive heads,
most people would be ready to reject the null hypothesis
when the observed results would be expected to occur by
chance less than once in 1000 times.
Let us repeat the thought experiment with a new coin.
This time we obtain nine tails and one head. Once again, it
is unlikely that the result is due to chance alone. This time
the p value is 0.02. That is, if the null hypothesis were true
and the coin were unbiased, the results observed, or more
extreme than those observed, (10 heads or 10 tails, 9 heads
and 1 tail or 9 tails and I head) would be expected to occur
by chance twice in 100 repetitions of the experiment.
Given this result, are we willing to reject the null hypothesis?
The decision is arbitrary and a matter of judgement.
However, by statistical convention, the boundary or
threshold that separates the plausible and the implausible is
five times in 100 (p = 0.05). This boundary is dignified by
long tradition, although other choices of a boundary value
could be equally reasonable. The results that fall beyond
this boundary (i.e., p < 0.05) are considered "statistically
significant." Statistical significance, therefore, means that a
result is "sufficiently unlikely to be due to chance that we
are ready to reject the null hypothesis."
Let us repeat our experiment twice more with a new coin.
On the first repetition eight heads and two tails are obtained.
The p value associated with such a split tells us that, if the
coin were unbiased, a result as extreme as eight to two (or
two to eight), or more extreme, would occur by chance 1 1
times in 100 (p = 0.111). This result has crossed the conventional
boundary between the plausible and implausible. If we
accept the convention, the results are not statistically significant,
and the null hypothesis is not rejected.
On our final repetition of the experiment seven tails and
three heads are obtained. Experience tells us that such a result,
although it is not the most common, would not be unusual
even if the coin were unbiased. The p value confirms
our intuition: results as extreme as this split would occur
under the null hypothesis 34 times in 100 (p = 0.34). Again,
the null hypothesis is not rejected."1

Since I wanted to deliver this a piece at a time, so as not to confuse. I decided to deliver this post in parts, which can later be merged into 1 post by the admins and stickied.

In the next post, we will discuss: Confidence Intervals

References

1. Guyatt G, Jaeschke R, Heddle N, Cook D, Shannon H, Walter S. Basic statistics for clinicians: 1. Hypothesis testing. CMAJ. 1995;152:27-32
 
Last edited by a moderator:

JPINFV

Gadfly
12,681
197
63
Can we insert commentary here on the uselessness of P value in contrast to things like relative risk or odds ratio?
 
Top