Biost 518: Q&A - Measured vs Computed SD of change
QUESTION:
I have a question regarding homework 2 problem 3. It would seem to me
that the variance we use in calculating V should be the variance we found
in question 1b (square of the standard deviation of the difference in
bilirubin). However, I notice on the homework key from last year (where
problem 4 is like our current problem 3), the variance used was analogous
to the one in question 1a of the current homework; the variance of the
difference was not even calculated on last year's homework.
Could you explain why it is possible to use just variance of the baseline
values in this situation, when we are trying to find sample size given a
study design based on differences in values? What kind of assumptions are
being made? Is this how one would handle a situation for which the
baseline values' variance was available, but no follow-up data were
available for calculating a variance of the difference?
ANSWER:
You are correct on all counts, but thanks for asking and letting me
expound on this.
1) When we have data on relevant changes, we should just figure on the SD of those measurements. That is, if we know that we are going to compare the difference
Yfinal - Ybsln
for each individual, then we should find out the SD of those differences,
just as I had you do in Problem 1. And, ideally, that is what would be
most relevant.
2) Of course, if we had the SD of baseline, the SD of follow-up, and the correlation rho, then we can compute the SD of change from the variance
of the baseline measurements (Vbsln) and the variance of the final
measurements (Vfinal):
Vchange = Vbsln + Vfinal - 2 * rho * sqrt(Vbsln * Vfinal)
And the SD would just be the square root of the variance Vchange.
So, if the SD at final were equal to the SD an baseline, either approach would give us the same answer whether we used the SD of the change or the formula given in problem 3:
Vchange = 2 * Vbsln * (1 - rho) = 2 * Vfinal * (1 - rho)
This is the reason I suggested you compare the value of V used in problem
3 to the square of the SD for change in problem 1. I believe you will find
that they do not agree all that well. That is because in this data Vbsln
and Vfinal are not the same. That might just be random sampling error, or
it might be real. Personally, I think it is real: We restricted entry to
the study to those subjects having bilirubin less than 3. So at the start
of the study we have a more homogeneous group. Over time, however, some
patients progressed, others did not. This would create greater variability
of bilirubin after 3 years on study.
3) Quite often, we do not have longitudinal data, and we are forced to
just make educated guesses. Most often we do guess by using some sort of
pilot data on a cross-sectional variance (e.g., either bsln or final) and
having some estimate of the correlation from some other source (perhaps
our imagination or our fervent wishes). This sometimes turns into the
category of making up two numbers in order to arrive at one. But it is
what we do quite often.
Okay, maybe I am being too harsh on our practices. But we may have
cross-sectional data that we believe to be most relevant for the relevant
entry criteria, and then we have an estimate of the correlation from
another dataset entirely. The methods you used here would be as good a
guess as any in this situation.
NOW for the question you didn't ask:
Q: And why would we use the methods in problem 3, anyway? Won't we always do better using the ANCOVA model in a randomized clinical trial?
A: Yes.
But I do note that when doing power calculations for the ANCOVA model, we
would truly have the same issues about the variance at baseline and the
variance at follow-up. A better formula for V in the ANCOVA case would be
V = Vfinal + rho^2 * (Vbsln - 2 * sqrt(Vbsln * Vfinal) )
If Vbsln = Vfinal, this reduces to the formula
V = Vfinal * (1 - rho^2) = Vbsln * (1 - rho^2)
Scott
#####################################################################
Scott S. Emerson, M.D., Ph.D. Biost Dept: (O) 206-543-1044
Professor of Biostatistics (F) 206-543-3286
Department of Biostatistics Box 357232 ROC: (O) 206-221-4185
University of Washington (F) 206-543-0131
Seattle, Washington 98195 semerson@u.washington.edu
#####################################################################