Subject: Biost 517 Q&A: HW #6
QUESTION:
Using question one as an example of a scientific question, I might report
-the mean spd12 for dose 0 with associated CI
-the mean spd12 for dose 0.4 with associated CI
-the p-value
Based on a p-value of <0.05 and non-overlapping CI, I could say the difference
is significant, for example.
Is it also important to report the difference in means and the associated CI
for the difference in means and then remark because the CI of the difference of
the means does not cross zero then the difference in means is significant if
that is the case?
ANSWER:
It is very important to report the difference in means and the associated CI
for the difference, as that is the measure of treatment effect. As we saw on
the midterm, it is not unusual for there to be a trend in the placebo group
perhaps due to
1) Aging
2) Seasonal trends (but less important with a 1 year timeframe)
3) Secular trends in diet or other behavior
4) Trends in laboratory measurement error
5) "Hawthorne effect" in which subjects being studied modify their
habits
6) Any of a gazillion other possible reasons
Furthermore, as I have said (and will continue to say) repeatedly,
non-overlapping CIs are a very imprecise way to judge statistical significance
of a comparison. While that criterion does serve as "elevator statistics"
sometimes, there is no excuse for not doing the proper analysis to obtain
precise inference on the relevant measure. I have shown you where we had
substantially overlapping CI on problem #4 of the midterm, but a HIGHLY
significant P value for the difference. In fact, the CI for one group actually
included the point estimate for the other group-- something that could happen
with significant differences because the estimates for each group were
correlated with each other.
The point is that we find a single number that measures the scientific
quantity. When looking for the effect of a risk factor, it will usually
represent the difference or ratio of some summary measure of the distribution.
We then make inference on that single number. (See below for comments about
reasons for reporting results for each group as well.) Full inference will
include a point estimate of the difference or ratio, a CI for the difference or
ratio, and a P value that the true difference might be zero or that the true
ratio might be 1 (unless you had reason to test some other null hypothesis).
For emphasis: I STRONGLY urge each and every one of you to use the criterion of
non-overlapping CIs as a criterion of last resort. First and foremost, you
should ask for and obtain inference on the true measure of treatment effect.
This includes all tasks you perform in this class, and ought to include all
presentations you see in other classes, seminars, research papers, newspaper
articles, ...
Elevator statistics are worthwhile only when you are standing in an elevator or
confronted with a situation where your requests for proper inference will
necessarily go unanswered (the scientific literature if you aren't the
referee). Then you might want to be able to interpret CI from separate groups
in order to make a comparison. But if you want to use this criterion (around
me, at any rate), you better be able to remember the ENTIRE rules (I bet you
will find it hard to remember all the disclaimers, but on future exams you will
be required to establish all the conditions before I will accept this criterion
in response to a question):
1) If you have CI computed from INDEPENDENT samples and they do not overlap,
the difference between the respective summary measures will be statistically
significant at the corresponding level of confidence ON THE SCALE ON WHICH THEY
WERE INITIALLY ANALYZED.
-- So if you computed CI for means directly, then the difference of means is
OK. If you computed CI for log geometric means, then the difference of log
geometric means is OK. If you computed CI for the log means (and we do this in
Poisson regression models), then the difference of log means is OK.
-- This approach does not make any particular statement about what the CI for
the difference will have ruled out, nor what strength of evidence there would
be in the P value for the difference
2) If you have CI computed from INDEPENDENT samples and the CI for one group
includes the point estimate from the other group, the difference (see all the
disclaimers above) will not be statistically different from zero at the
corresonding level of confidence.
-- Again, this approach does not make any particular statement about what the
CI for the difference will have ruled out, nor what strength of evidence there
would be in the P value for the difference. You would be unable to interpret a
"negative" study (see my lecture slides on the importance of CI relative to P
values: using this approach we can say, for instance, P > 0.05, but nothing
else).
3) In any other setting, you can say "I haven't a clue".
-- These other settings include
. some overlapping CI from independent samples
. all CI from correlated samples
There is a HUGE difference between "I don't know whether or not the difference
is statistically significant" (which is what we have to say under the third
rule) and "We know it is not statistically significant".
I think that anyone who does a study and merely gives the "I don't know
whether..." response needs to be fired, and, if they are using NIH funds,
refund me (at least) my tax money.
I do note that if you can figure out how the CI were computed, and if that
method involved a normally distributed statistic, then we can often figure out
what the SE was and be able to compute the SE for the difference, providing the
CI were from independent samples. This is what I usually do when confronted
with problems in the scientific literature, but since I no longer carry a
calculator (and contrary to the stereotype, even in my days in physics or in an
engineering school, I never carried a slide rule except to exams), I cannot
usually do this on an elevator.
All of that having been said, as a general rule, I do think it relevant to
present for each group:
-- mean, SD (not SE), min, max, percent with response above some scientifically
meaningful threshold (all of these allow us to assess possible individual
toxicity)
-- CI for the summary measure within each group
-- Perhaps: A P value for the within group response IF the summary of response
within each risk group is some sort of change over time or place. I will not
make too much of this due to all the possible reasons for trends in the placebo
group. I note that a nonsignificant change in the placebo group and a
significant change in the treatment group can still lead to a nonsignificant
difference between the two groups. I have seen way too many erroneous (and
potentially harmful) reports in the scientific literature that merely look for
nonsignificance in the Placebo group and then overinterpret a significant
result in the new treatment group, when the proper analysis says that there was
no significant difference between the groups. To avoid anyone making this
mistake, I sometimes suppress the inference that is irrelevant to the primary
question.
I note that editors will often not let you present all that many aspects of the
inference, in which case the descriptive statistics for each group (mean, SD,
range, etc.) should be presented in a table somewhere, but the CI and P values
for the groups can safely be omitted.
And as I have stressed in class, presenting the point estimate, the CI, and the
full P value is important. Merely stating "the difference is statistically
significant" does not allow a reader with more stringent or less stringent
standards of evidence to evaluate your results. I do note that it is rarely
useful to present a P value to more than 4 decimal places, and often two
suffice.
Scott