|
1
|
- Issues in Implementing Stopping Rules
- Schedule of Analyses
- Estimation of Statistical Information
- Constraining Boundaries at Prior Analyses
- Flexible Determination of Boundaries
- Boundary scales
- Measuring study time
- Monitoring Secondary Endpoints
|
|
2
|
|
|
3
|
- Design of clinical trial
- Selection of stopping rule to provide desired operating characteristics
- Type I error
- Statistical power to detect design alternative
- Efficiency
- Bayesian properties
- Futility considerations
|
|
4
|
- At time of study design
- Sample size (power, alternative) calculations based on
- Specifying a maximum of J analyses
- Specifying sample sizes at which analyses will be performed
|
|
5
|
- During conduct of study
- Timing of analyses may be different
- Monitoring scheduled by calendar time
- Slow (or fast) accrual
- Estimation of available information at time of locking database
- External causes
- (should not be influenced by study results)
|
|
6
|
- Example: Stopping rule chosen at design
- Test of normal mean:
- Null: m £ 0.0
- Alternative: m ³ 0.5
- One-sided symmetric test
- Size .025, Power .975
- Four equally spaced analyses
- Pocock (1977) boundary relationships
|
|
7
|
- Example: Stopping rule chosen at design (cont.)
- One-sided test of a greater alternative:
- Null : Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.5 (power = 0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Sample Mean scale
- a d
- Time 1 (N= 86.31) 0.0000 0.5000
- Time 2 (N= 172.62) 0.1464
0.3536
- Time 3 (N= 258.92) 0.2113
0.2887
- Time 4 (N= 345.23) 0.2500
0.2500
|
|
8
|
- Example: Analyses after 40%, 60%, 80%, 100% (maintain power)
- Null: Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.5 (power =
0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Sample Mean scale
- a d
- Time 1 (N= 131.97) 0.1047
0.3953
- Time 2 (N= 197.95) 0.1773
0.3227
- Time 3 (N= 263.93) 0.2205
0.2795
- Time 4 (N= 329.91) 0.2500
0.2500
|
|
9
|
- Example: Analyses after 40%, 60%, 80%, 100% (maintain maximal sample
size)
- Null: Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.4888 (power
= 0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Sample Mean scale
- a d
- Time 1 (N= 138.09) 0.1024
0.3864
- Time 2 (N= 207.14) 0.1733
0.3155
- Time 3 (N= 276.19) 0.2155
0.2732
- Time 4 (N= 345.23) 0.2444
0.2444
|
|
10
|
- During conduct of study
- Number of analyses may be different
- Monitoring scheduled by calendar time
- Slow (or fast) accrual
- External causes
- (should not be influenced by study results)
|
|
11
|
- Example: Stopping rule chosen at design (cont.)
- One-sided test of a greater alternative:
- Null : Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.5 (power = 0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Sample Mean scale
- a d
- Time 1 (N= 86.31) 0.0000 0.5000
- Time 2 (N= 172.62) 0.1464
0.3536
- Time 3 (N= 258.92) 0.2113
0.2887
- Time 4 (N= 345.23) 0.2500
0.2500
|
|
12
|
- Example: Analyses after 20%, 40%, 60%, 80%, 100% (maintain power)
- Null: Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.5 (power =
0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Sample Mean scale
-
a d
- Time 1 (N= 72.10) -0.0590 0.5590
- Time 2 (N= 144.20) 0.1047 0.3953
- Time 3 (N= 216.31) 0.1773 0.3227
- Time 4 (N= 288.41) 0.2205 0.2795
- Time 5 (N= 360.51) 0.2500 0.2500
|
|
13
|
- Example: Analyses after 20%, 40%, 60%, 80%, 100% (maintain maximal
sample size)
- Null: Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.5109 (power
= 0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Sample Mean scale
-
a d
- Time 1 (N= 69.05) -0.0603 0.5713
- Time 2 (N= 138.09) 0.1070 0.4039
- Time 3 (N= 207.14) 0.1811 0.3298
- Time 4 (N= 276.19) 0.2253 0.2856
- Time 5 (N= 345.23) 0.2555 0.2555
|
|
14
|
- Summary for Pocock boundary relationships
-
Final
- Analysis Times Alt Max N Bound
- ========================
==== ====== =====
- .25, .50, .75, 1.00
.500 345.23 .2500
- .40, .60, .80, 1.00
.500 329.91 .2500
- .40, .60, .80, 1.00
.489 345.23 .2444
- .20, .40, .60, .80, 1.00
.500 360.51 .2500
- .20, .40, .60, .80, 1.00
.511 345.23 .2555
|
|
15
|
- Summary for O’Brien-Fleming boundary relationships
-
Final
- Analysis Times Alt Max N Bound
- ========================
==== ====== =====
- .25, .50, .75, 1.00
.500 256.83 .2500
- .40, .60, .80, 1.00
.500 259.44 .2500
- .40, .60, .80, 1.00
.503 256.83 .2513
- .20, .40, .60, .80, 1.00
.500 259.45 .2500
- .20, .40, .60, .80, 1.00
.503 256.83 .2513
|
|
16
|
- Need methods that allow flexibility in determining number and timing of
analyses
- Should maintain some (but not, in general, all) desired operating
characteristics, e.g.:
- Type I error
- Type II error
- Maximal sample size
- Futility properties
- Bayesian properties
|
|
17
|
- Validity of flexible determination of analysis times
- Inference conditional on actual schedule of analyses
- Can disregard rule for scheduling analyses if it is independent of
measures of treatment effect
- If all possible adaptations maintain particular operating
characteristics, then so will adaptive rule
- (May affect other operating characteristics of design)
|
|
18
|
|
|
19
|
- At time of study design
- Sample size (power, alternative) calculations based on
- Specifying statistical information available from each sampling unit
|
|
20
|
- During conduct of study
- Statistical information from a sampling unit may be different than
originally estimated
- Variance of measurements
- Baseline event rates
- (Altered sampling distribution for treatment levels)
|
|
21
|
- Sample size formulas used in group sequential test design
- n is the maximal number of
sampling units
- d1 is the alternative for which a standardized
form of a level a
test has power b
- 1 / V is the
statistical information contributed by each sampling unit
|
|
22
|
- Parallels with fixed sample test design
- Sample size formulas used in group sequential test design are
completely analogous to those used in fixed sample studies
- In fixed sample two arm tests of a normal mean
|
|
23
|
- Effect of using incorrect estimates of statistical information at the
design stage
- Using the specified sample size,
the design alternative will not be detected with the desired power
- Using the specified sample size, the alternative detected with the
desired power will not be the design alternative
- In order to detect the design alternative with the desired power, a
different sample size is needed
|
|
24
|
- If maximal sample size is maintained, the study discriminates between
null hypothesis and an alternative measured in units of statistical
information
|
|
25
|
- If statistical power is maintained, the study sample size is measured in
units of statistical information
|
|
26
|
- Validity of flexible determination of sample size
- Inference conditional on actual sample size
- Can disregard rule for determining sample size if it is independent of
measures of treatment effect
- If all possible adaptations maintain particular operating
characteristics, then so will adaptive rule
- (May affect other operating characteristics of design)
|
|
27
|
|
|
28
|
- Previously described methods for implementing stopping rules
- (Adhere exactly to monitoring plan)
- (Approximations based on design parameters: Emerson and Fleming, 1989)
- Christmas tree approximation for triangular tests: Whitehead and
Stratton, 1983
- Error spending functions: Lan and DeMets, 1983; Pampallona, Tsiatis,
and Kim, 1995
- Constrained boundaries in unified design family: Emerson, 2000
|
|
29
|
- Common features
- Stopping rule specified at design parameterizes the boundary for some
statistic (boundary scale)
- At the first interim analysis, parametric form is used to compute the
boundary for actual time on study
- At successive analyses, the boundaries are recomputed accounting for
the exact boundaries used at previously conducted analyses
- Maximal sample size estimates may be updated
|
|
30
|
- Specification of implementation strategy
- Boundary scale used to modify boundaries
- How analysis times will be determined (maintain blind)
- How study time will be measured
- Operating characteristics which will be maintained
|
|
31
|
|
|
32
|
- Families of group sequential stopping rules can be defined on a number
of scales
- Parametric family relates stopping boundaries at successive analyses
- Pj = proportion
of maximal information available at j-th analysis
- dj = stopping boundary at j-th analysis for some statistic
- dj = f(Pj)
is parametric boundary function
|
|
33
|
- Unified family of group sequential designs (Kittelson and Emerson, 1999)
- Defined for estimate of treatment effect (sample mean scale)
- Includes Pocock (1977), O’Brien and Fleming (1979), Whitehead and
Stratton (1983), Wang and Tsiatis (1987), Emerson and Fleming (1989),
Pampallona and Tsiatis (1994), Xiong (1995)
|
|
34
|
- Error spending family (Kim and DeMets, 1987; Jennison and Turnbull,
1989)
- Power family for error spending function
- Pampallona, Tsiatis, and Kim (1995) describe a family by interpolating
the error spending function for tests defined on the sample mean scale
|
|
35
|
- Extensions to those parametric families in S+SeqTrial: Constrained
boundaries
- Motivation: Extreme conservatism of the O’Brien-Fleming design
- Specify a design that has stopping boundaries that are the least
extreme of an O’Brien-Fleming boundary relationship or a fixed sample P
value of .001
|
|
36
|
- Example: O’Brien-Fleming boundaries on fixed sample P value scale
- Null: Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.5 (power =
0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Fixed Sample P-value scale
- a d
- Time 1 (N= 64.21) 0.9774 0.0000
- Time 2 (N= 128.41) 0.5000
0.0023
- Time 3 (N= 192.62) 0.1237
0.0104
- Time 4 (N= 256.83) 0.0226
0.0226
|
|
37
|
- Example: Constrained O’Brien-Fleming boundaries on fixed sample P value
scale
- Null: Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.5 (power =
0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Fixed Sample P-value scale
- a d
- Time 1 (N= 64.31) 0.9773 0.0005
- Time 2 (N= 128.61) 0.4989
0.0023
- Time 3 (N= 192.92) 0.1231
0.0102
- Time 4 (N= 257.23) 0.0224
0.0224
|
|
38
|
- Example: Display of boundaries
|
|
39
|
- Example: Display of power curves
|
|
40
|
- Example: Display of ASN curves
|
|
41
|
- Constrained boundaries also defined for error spending family
- Allows arbitrary departures from the parametric families
|
|
42
|
- Use of constrained families in flexible implementation of stopping rules
- At the first analysis, compute stopping boundary from parametric family
- At successive analyses, use parametric family with constraints (on some
scale) for the previously conducted interim analyses
- When the error spending scale is used, this is just the error spending
approach of Lan & DeMets or Pampallona, Tsiatis, & Kim
|
|
43
|
- Example: Stopping rule chosen at first analysis (with estimates for
later analyses)
- One-sided test of a greater alternative:
- Null : Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.5 (power = 0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Sample Mean scale
-
a d
- Time 1 (N= 77.08) -0.0743 0.5744
- Time 2 (N= 154.16) 0.1211 0.3789
- Time 3 (N= 231.23) 0.2029 0.2971
- Time 4 (N= 308.31) 0.2500 0.2500
|
|
44
|
- Example: Stopping rule based on updated schedule (when first analysis
boundary unconstrained)
- One-sided test of a greater alternative:
- Null : Theta <= 0
(size = 0.025)
- Alt : Theta >= 0.4973 (power =
0.975)
- (Emerson & Fleming (1989) symmetric test)
- STOPPING BOUNDARIES: Sample Mean scale
-
a d
- Time 1 (N= 77.08) -0.0740 0.5713
- Time 2 (N= 100.00) 0.0087 0.4887
- Time 3 (N= 231.23) 0.2018 0.2955
- Time 4 (N= 308.31) 0.2487 0.2487
|
|
45
|
- Use of constrained families is necessary because critical values are
dependent upon exact schedule
- In Unified Family, boundary at first analysis is affected by timing of
later analyses
- Compare boundary at first analysis when timing of second analysis
differs:
- `a’ boundary: -0.0743 versus
-0.0740
- ‘d’ boundary: 0.5744 versus 0.5713
- Must constrain first boundaries at the levels actually used, and then
use parametric form for future analyses
|
|
46
|
|
|
47
|
- Flexible methods compute boundaries at an interim analysis according to
study time at that analysis
- Study time can be measured by
- Proportion of planned number of subjects accrued (maintains maximal
sample size)
- Proportion of planned statistical information accrued (maintains
statistical power)
- (Calendar time-- not really advised)
|
|
48
|
- In either case, we must decide how we will deal with estimates of
statistical information at each analysis when constraining boundaries
- Statistical information in clinical trials typically has two parts
- V = variability associated with a single sampling unit
- The distribution of sampled levels of treatment
- In many clinical trials, the dependence on the distribution of
treatment levels across analyses is only on the sample size N
|
|
49
|
- Possible approaches
- At each analysis estimate the statistical information available, and
use that estimate at all future analyses
- Theoretically, this can result in estimates of negative information
gained between analyses
- At each analysis use the sample size with the current best estimate of
V
- The 1:1 correspondence between boundary scales is thus broken at
previously conducted analyses
|
|
50
|
- Possible approaches (cont.)
- In S+SeqTrial, all probability models have statistical information
directly proportional to sample size for block randomized experiments,
thus we chose to update V at all analyses using the current best
estimate
- Other statistical packages (PEST, ?EaSt) constrain boundaries using the
estimate of statistical information available at the previous analyses.
- There is no clear best approach
|
|
51
|
- Example
- A clinical trial of a binary endpoint is designed using a unified
family design
- One-sided test for an increased event probability
- Designed with 5 analyses
- O’Brien-Fleming efficacy boundary
- Futility boundary intermediate to O’Brien-Fleming and Pocock
|
|
52
|
- Example: At first interim analysis using unified family approach
- The use of the parametric form for the boundary function will result in
a boundary on the same curve as the original design
- I had the sample size re-estimated to allow for errors in guessing the
baseline rate at the design phase
|
|
53
|
|
|
54
|
- Example: Comparison with error spending approach using interpolated
error spending function
- The stopping boundary based on the error spending function will not
agree exactly with the curve for the original design, because the error
spending function is not linear for this design.
- Had the monitoring occurred at the prespecified time, the two curves
would agree.
|
|
55
|
|
|
56
|
- Example: Superposed stopping rules from first and second interim
analyses using unified family
- Because the monitoring bounds were constrained on the sample mean
scale, the stopping boundaries computed for the first analysis agree at
both analyses when plotted on the sample mean scale
- If the boundaries were plotted on some other scale, they would not
agree
|
|
57
|
|
|
58
|
- Example: Superposed stopping rules from first and second interim
analyses using error spending
- When plotted on the sample mean scale, the monitoring bounds from the
first and second analyses will not agree if the boundary at the first
analysis is constrained on the error spending scale
- This is due to the need to estimate the statistical information
|
|
59
|
|
|
60
|
- I think it makes more sense to use the best estimate of the variance of
an observation when estimating a sampling distribution. This avoids the
possibility of negative information, but allows the conflicting results
described above.
- In the absence of a need to estimate the statistical information,
monitoring on the sample mean or error spending scales would agree
exactly (modulo interpolation to obtain the error spending function).
|
|
61
|
- When estimating the statistical information, all approaches merely
approximate the sampling distribution of the test statistic. At this
point there is no clear “best” approach
- On purely esthetic grounds, I prefer that the monitoring bounds match
across analyses on the sample mean scale
|
|
62
|
|
|
63
|
- So far, we have stressed the monitoring of the primary endpoint
- Of course, far more time in a DSMB meeting is devoted to monitoring the
secondary endpoints related to patient safety than is devoted to
examining the primary endpoint
|
|
64
|
- Role of DSMB: Maintain validity of informed consent
- Evaluate the safety of the trial in light of information made available
since the start of study
- Data from current trial
- Data from related trials
- Changing clinical environment
|
|
65
|
- Safety issues to be addressed
- Is there evidence that individual patients might be being harmed?
- Serious adverse experiences
- Individual abnormal lab values
- Is there evidence of trends toward harm in the population of treated
patients
- Proportion with adverse experiences
- Average (median) lab values
|
|
66
|
- Statistical issues due to rare events
- Invariably, there is very little statistical precision to establish
increased rates of Serious Adverse Experiences (SAEs) or increased
rates of individual toxicities
- As a general rule, the DSMB therefore must act based on their prior
knowledge and principles of conservatism
- E.g., decisions to modify entry criteria by age due to statistically
nonsignificant trends in the data
|
|
67
|
- Statistical issues due to rare events (cont.)
- The increased error rate of acting on such trends is a necessary evil
- Of some solace is the fact that most new treatments do not prove
beneficial, so such conservatism is probably not too harmful in the
quest for new treatments
- In essence, we decide to only look at the most safe treatments (and
the trials that tended to result in the safest profile)
|
|
68
|
- Statistical issues when using aggregate statistics to examine the safety
profile
- When examining the safety profile statistically, must consider multiple
comparison problems
- over multiple adverse experience categories
- (the DSMB is largely on their own)
- over multiple analyses of the accruing data
- (group sequential methods can be used as a guideline)
|
|
69
|
- Statistical issues when using aggregate statistics to examine the safety
profile (cont.)
- Group sequential methods for monitoring safety profiles
- Bayesian approaches
- But how do you ever detect unexpected toxicities-- where is the
burden of proof?
- Frequentist approaches
- Using group sequential stopping rules to compute
- Repeated confidence intervals
- Ersatz P values
|
|
70
|
- Presentation of results to the DSMB
- Generally avoid providing any P values or RCI for specific analyses to
avoid their difficult interpretation
- Have to account for multiple comparisons across endpoints
- Have to consider tradeoffs between efficacy and toxicity
- Statistical significance may be secondary to safety concerns-- may
need to act before statistical significance is attained
|
|
71
|
- Presentation of results to the DSMB (cont.)
- If an issue arises where stopping a trial for safety reasons is
potentially indicated, it is useful to have some sort of guideline
available for reference
|
|
72
|
- Selection of stopping rules for use with safety endpoints
- Need to consider whether harm should be proven
- I think that the general philosophy of clinical testing dictates that
such a stopping rule should not be as conservative as those typically
used for efficacy endpoints
- An O’Brien-Fleming guideline is probably too conservative for safety
|