Biost 518: Q&A - Measured vs Computed SD of change QUESTION: I have a question regarding homework 2 problem 3. It would seem to me that the variance we use in calculating V should be the variance we found in question 1b (square of the standard deviation of the difference in bilirubin). However, I notice on the homework key from last year (where problem 4 is like our current problem 3), the variance used was analogous to the one in question 1a of the current homework; the variance of the difference was not even calculated on last year's homework. Could you explain why it is possible to use just variance of the baseline values in this situation, when we are trying to find sample size given a study design based on differences in values? What kind of assumptions are being made? Is this how one would handle a situation for which the baseline values' variance was available, but no follow-up data were available for calculating a variance of the difference? ANSWER: You are correct on all counts, but thanks for asking and letting me expound on this. 1) When we have data on relevant changes, we should just figure on the SD of those measurements. That is, if we know that we are going to compare the difference Yfinal - Ybsln for each individual, then we should find out the SD of those differences, just as I had you do in Problem 1. And, ideally, that is what would be most relevant. 2) Of course, if we had the SD of baseline, the SD of follow-up, and the correlation rho, then we can compute the SD of change from the variance of the baseline measurements (Vbsln) and the variance of the final measurements (Vfinal): Vchange = Vbsln + Vfinal - 2 * rho * sqrt(Vbsln * Vfinal) And the SD would just be the square root of the variance Vchange. So, if the SD at final were equal to the SD an baseline, either approach would give us the same answer whether we used the SD of the change or the formula given in problem 3: Vchange = 2 * Vbsln * (1 - rho) = 2 * Vfinal * (1 - rho) This is the reason I suggested you compare the value of V used in problem 3 to the square of the SD for change in problem 1. I believe you will find that they do not agree all that well. That is because in this data Vbsln and Vfinal are not the same. That might just be random sampling error, or it might be real. Personally, I think it is real: We restricted entry to the study to those subjects having bilirubin less than 3. So at the start of the study we have a more homogeneous group. Over time, however, some patients progressed, others did not. This would create greater variability of bilirubin after 3 years on study. 3) Quite often, we do not have longitudinal data, and we are forced to just make educated guesses. Most often we do guess by using some sort of pilot data on a cross-sectional variance (e.g., either bsln or final) and having some estimate of the correlation from some other source (perhaps our imagination or our fervent wishes). This sometimes turns into the category of making up two numbers in order to arrive at one. But it is what we do quite often. Okay, maybe I am being too harsh on our practices. But we may have cross-sectional data that we believe to be most relevant for the relevant entry criteria, and then we have an estimate of the correlation from another dataset entirely. The methods you used here would be as good a guess as any in this situation. NOW for the question you didn't ask: Q: And why would we use the methods in problem 3, anyway? Won't we always do better using the ANCOVA model in a randomized clinical trial? A: Yes. But I do note that when doing power calculations for the ANCOVA model, we would truly have the same issues about the variance at baseline and the variance at follow-up. A better formula for V in the ANCOVA case would be V = Vfinal + rho^2 * (Vbsln - 2 * sqrt(Vbsln * Vfinal) ) If Vbsln = Vfinal, this reduces to the formula V = Vfinal * (1 - rho^2) = Vbsln * (1 - rho^2) Scott ##################################################################### Scott S. Emerson, M.D., Ph.D. Biost Dept: (O) 206-543-1044 Professor of Biostatistics (F) 206-543-3286 Department of Biostatistics Box 357232 ROC: (O) 206-221-4185 University of Washington (F) 206-543-0131 Seattle, Washington 98195 semerson@u.washington.edu #####################################################################