thr3ads.net - R help - [R] Analyzing an unbalanced AB/BA cross-over design [Jan 2003]

If this information is useful, please help other people find it:
Share via:

martin wolfsegger

2003-Jan-29 11:02 UTC

[R] Analyzing an unbalanced AB/BA cross-over design

I am looking for help to analyze an unbalanced AB/BA cross-over design by
requesting the type III SS !

# Example 3.1 from S. Senn (1993). Cross-over Trials in Clinical
Research
outcome<-c(310,310,370,410,250,380,330,270,260,300,390,210,350,365,370,310,380,290,260,90,385,400,410,320,340,220)
subject<-as.factor(c(1,4,6,7,10,11,14,1,4,6,7,10,11,14,2,3,5,9,12,13,2,3,5,9,12,13))
period<-as.factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2))
treatment<-as.factor(c('F','F','F','F','F','F','F','S','S','S','S','S','S','S','S','S','S','S','S','S','F','F','F','F','F','F'))
sequence<-as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2))
example<-data.frame(outcome,subject,period,treatment,sequence)

The recommended SAS code equals

PROC GLM DATA=example;
CLASS subject period treatment sequence;
MODEL outcome = treatment sequence period subject(sequence);
RANDOM subject(sequence);
RUN;

For PROC GLM, the random effects are treated in a post hoc fashion after the
complete fixed effect model is fit. This distinction affects other features
in the GLM procedure, such as the results of the LSMEANS and ESTIMATE
statements. Looking only on treatment, period, sequence and subject effects, the
random statement can be omitted.

The R code for type I SS equals

example.lm<-lm(outcome~treatment+period+sequence+subject%in%sequence,
data=example)
anova(example.lm)

Response: outcome
                 Df Sum Sq Mean Sq F value    Pr(>F)    
treatment         1  13388   13388 17.8416  0.001427 ** 
period            1   1632    1632  2.1749  0.168314    
sequence          1    335     335  0.4467  0.517697    
sequence:subject 11 114878   10443 13.9171 6.495e-05 ***
Residuals        11   8254     750                     

According to the unbalanced design, I requested the type III SS which
resulted in an error statement

library(car)
Anova(example.lm, type="III")

Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,  : 
        One or more terms aliased in model.
 

by using glm I got results with 0 df for the sequence effect !!!!

example.glm<-glm(outcome~treatment+period+sequence+subject%in%sequence,
data=example, family=gaussian)
library(car)
Anova(example.glm,type="III",test.statistic="F")

Anova Table (Type III tests)

Response: outcome
                     SS Df       F    Pr(>F)    
treatment         14036  1 18.7044  0.001205 ** 
period             1632  1  2.1749  0.168314    
sequence              0  0                      
sequence:subject 114878 11 13.9171 6.495e-05 ***
Residuals          8254 11            


The questions based on this output are

1) Why was there an error statement requesting type III SS based on lm ?
2) Why I got a result by using glm with 0 df for the period effect ?
3) How can I get the estimate, the StdError for the constrast (-1,1) of the
treatment effect ?


Thanks

--

Peter Dalgaard BSA

2003-Jan-29 12:30 UTC

head link

[R] Analyzing an unbalanced AB/BA cross-over design

martin wolfsegger <wolfseggerm at gmx.at> writes:
> I am looking for help to analyze an unbalanced AB/BA cross-over design by
> requesting the type III SS !
> 
> # Example 3.1 from S. Senn (1993). Cross-over Trials in Clinical
> Research
>
outcome<-c(310,310,370,410,250,380,330,270,260,300,390,210,350,365,370,310,380,290,260,90,385,400,410,320,340,220)
>
subject<-as.factor(c(1,4,6,7,10,11,14,1,4,6,7,10,11,14,2,3,5,9,12,13,2,3,5,9,12,13))
>
period<-as.factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2))
>
treatment<-as.factor(c('F','F','F','F','F','F','F','S','S','S','S','S','S','S','S','S','S','S','S','S','F','F','F','F','F','F'))
>
sequence<-as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2))
> example<-data.frame(outcome,subject,period,treatment,sequence)
> 
> The recommended SAS code equals
> 
> PROC GLM DATA=example;
> CLASS subject period treatment sequence;
> MODEL outcome = treatment sequence period subject(sequence);
> RANDOM subject(sequence);
> RUN;
> 
> For PROC GLM, the random effects are treated in a post hoc fashion after
the
> complete fixed effect model is fit. This distinction affects other features
> in the GLM procedure, such as the results of the LSMEANS and ESTIMATE
> statements. Looking only on treatment, period, sequence and subject
effects, the
> random statement can be omitted.
> 
> The R code for type I SS equals
> 
> example.lm<-lm(outcome~treatment+period+sequence+subject%in%sequence,
> data=example)
> anova(example.lm)
> 
> Response: outcome
>                  Df Sum Sq Mean Sq F value    Pr(>F)    
> treatment         1  13388   13388 17.8416  0.001427 ** 
> period            1   1632    1632  2.1749  0.168314    
> sequence          1    335     335  0.4467  0.517697    
> sequence:subject 11 114878   10443 13.9171 6.495e-05 ***
> Residuals        11   8254     750                     
> 
> According to the unbalanced design, I requested the type III SS which
> resulted in an error statement
> 
> library(car)
> Anova(example.lm, type="III")
> 
> Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,  : 
>         One or more terms aliased in model.
>  
> 
> by using glm I got results with 0 df for the sequence effect !!!!
> 
> example.glm<-glm(outcome~treatment+period+sequence+subject%in%sequence,
> data=example, family=gaussian)
> library(car)
> Anova(example.glm,type="III",test.statistic="F")
> 
> Anova Table (Type III tests)
> 
> Response: outcome
>                      SS Df       F    Pr(>F)    
> treatment         14036  1 18.7044  0.001205 ** 
> period             1632  1  2.1749  0.168314    
> sequence              0  0                      
> sequence:subject 114878 11 13.9171 6.495e-05 ***
> Residuals          8254 11            
> 
> 
> The questions based on this output are
> 
> 1) Why was there an error statement requesting type III SS based on lm ?
> 2) Why I got a result by using glm with 0 df for the period effect ?
> 3) How can I get the estimate, the StdError for the constrast (-1,1) of the
> treatment effect ?
The type I/III issues my be better left to people who actually
understand them, but note that sequence is really a embedded in
subject (7 subjects had one sequence and 6 subjects the other) so
sequence:subject is equivalent to just subject, and this is likely
also the reason that sequence is aliased within subject. Basically,
this is a fixed-effects model, and if each subject has his own
parameter, you can't estimate the sequence effect. 

I'd do this kind of design with an explicit mixed-effects model:
> summary(aov(outcome~treatment+period+sequence+Error(subject)))
Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
sequence   1    335     335  0.0321  0.861
Residuals 11 114878   10443

Error: Within
          Df  Sum Sq Mean Sq F value   Pr(>F)
treatment  1 13388.5 13388.5 17.8416 0.001427 **
period     1  1632.1  1632.1  2.1749 0.168314
Residuals 11  8254.5   750.4

possibly substituting treatment:period for sequence (it's the same
thing). 

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

John Fox

2003-Jan-29 13:53 UTC

head link

[R] Analyzing an unbalanced AB/BA cross-over design

Dear Martin,

As Peter Dalgaard has pointed out, the model you fit has redundant 
parameters, as you can see from its summary (or from the coefficients):

     > summary(example.lm)

     . . .

     Coefficients: (13 not defined because of singularities)

     . . .

     > coef(example.lm)
             (Intercept)          treatmentS             period2 
sequence2  sequence1:subject2
             305.35714           -46.60714            15.89286 
-135.00000                  NA
     sequence2:subject2  sequence1:subject3  sequence2:subject3 
sequence1:subject4  sequence2:subject4
             222.50000                  NA           200.00000 
-5.00000                  NA
     sequence1:subject5  sequence2:subject5  sequence1:subject6 
sequence2:subject6  sequence1:subject7
                     NA           240.00000            45.00000 
      NA           110.00000
     sequence2:subject7  sequence1:subject9  sequence2:subject9 
sequence1:subject10 sequence2:subject10
                     NA                  NA           150.00000 
-60.00000                  NA
     sequence1:subject11 sequence2:subject11 sequence1:subject12 
sequence2:subject12 sequence1:subject13
             75.00000                  NA                  NA 
145.00000                  NA
     sequence2:subject13 sequence1:subject14 sequence2:subject14
                     NA            57.50000                  NA

The computational procedure used by Anova (in the car package) assumes a 
full-rank parametrization of the model. The difference between the Anova 
methods for lm and glm is that the former uses the coefficient estimates 
and their covariance matrix to compute sums of squares while the latter 
refits the model. You got a result from SAS because it works with a 
deficient-rank parametrization. Finally (and more generally), if you really 
want "type-III" sums of squares in R, be careful with the kind of 
contrast-coding that you use. The default "treatment" contrasts
won't give
you what you want.

I'm sorry that you encountered this problem.
  John


At 11:00 AM 1/29/2003 +0100, martin wolfsegger wrote:>I am looking for help to analyze an unbalanced AB/BA cross-over design by
>requesting the type III SS !
>
># Example 3.1 from S. Senn (1993). Cross-over Trials in Clinical
>Research
>outcome<-c(310,310,370,410,250,380,330,270,260,300,390,210,350,365,370,310,380,290,260,90,385,400,410,320,340,220)
>subject<-as.factor(c(1,4,6,7,10,11,14,1,4,6,7,10,11,14,2,3,5,9,12,13,2,3,5,9,12,13))
>period<-as.factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2))
>treatment<-as.factor(c('F','F','F','F','F','F','F','S','S','S','S','S','S','S','S','S','S','S','S','S','F','F','F','F','F','F'))
>sequence<-as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2))
>example<-data.frame(outcome,subject,period,treatment,sequence)
>
>The recommended SAS code equals
>
>PROC GLM DATA=example;
>CLASS subject period treatment sequence;
>MODEL outcome = treatment sequence period subject(sequence);
>RANDOM subject(sequence);
>RUN;
>
>For PROC GLM, the random effects are treated in a post hoc fashion after the
>complete fixed effect model is fit. This distinction affects other features
>in the GLM procedure, such as the results of the LSMEANS and ESTIMATE
>statements. Looking only on treatment, period, sequence and subject 
>effects, the
>random statement can be omitted.
>
>The R code for type I SS equals
>
>example.lm<-lm(outcome~treatment+period+sequence+subject%in%sequence,
>data=example)
>anova(example.lm)
>
>Response: outcome
>                  Df Sum Sq Mean Sq F value    Pr(>F)
>treatment         1  13388   13388 17.8416  0.001427 **
>period            1   1632    1632  2.1749  0.168314
>sequence          1    335     335  0.4467  0.517697
>sequence:subject 11 114878   10443 13.9171 6.495e-05 ***
>Residuals        11   8254     750
>
>According to the unbalanced design, I requested the type III SS which
>resulted in an error statement
>
>library(car)
>Anova(example.lm, type="III")
>
>Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,  :
>         One or more terms aliased in model.
>
>
>by using glm I got results with 0 df for the sequence effect !!!!
>
>example.glm<-glm(outcome~treatment+period+sequence+subject%in%sequence,
>data=example, family=gaussian)
>library(car)
>Anova(example.glm,type="III",test.statistic="F")
>
>Anova Table (Type III tests)
>
>Response: outcome
>                      SS Df       F    Pr(>F)
>treatment         14036  1 18.7044  0.001205 **
>period             1632  1  2.1749  0.168314
>sequence              0  0
>sequence:subject 114878 11 13.9171 6.495e-05 ***
>Residuals          8254 11
>
>
>The questions based on this output are
>
>1) Why was there an error statement requesting type III SS based on lm ?
>2) Why I got a result by using glm with 0 df for the period effect ?
>3) How can I get the estimate, the StdError for the constrast (-1,1) of the
>treatment effect ?
>
>
>Thanks
>
>--
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>http://www.stat.math.ethz.ch/mailman/listinfo/r-help
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------

Russell Reeve

2003-Jan-29 16:50 UTC

head link

[R] Analyzing an unbalanced AB/BA cross-over design

For Type III SS, the sequence effect is determined by the subject, since subject
is nested within sequence Type III gives the additional reduction in the
residual SS after accounting for the other model terms. For Type I SS, you get
the reduction in the residual SS after accounting for the model terms before the
term in question. Since subject within sequence comes after subject, you get
subject Type I SS. Note that you can omit the sequence effect entirely

lm(outcome~treatment+period+subject, data=example)

The contrast of interest (on treatment) will not be affected. One should also
note that sequence is confounded with period by treatment interaction, so
beware. Further, the subject that has an observation missing is essentially
removed from the analysis (doesn't affect the results). But that is not true
if you use a mixed effect modeling engine, i.e.,

library(nlme)
my.lme <- lme(outcome~treatment+period, data=example, random=~1|subject)

In this model, all 8+7 observations affect the likelihood, and are used in
fitting the fixed effects: The mixed effect model and the purely fixed effect
model will give difference results for unbalanced data, same results for
balanced data.

summary(my.lme) give the output:

Linear mixed-effects model fit by REML
 Data: example 
       AIC      BIC    logLik
  265.1293 270.8068 -127.5647

Random effects:
 Formula: ~1 | subject
        (Intercept) Residual
StdDev:    66.52344 27.39351

Fixed effects: outcome ~ treatment + period 
               Value Std.Error DF   t-value p-value
(Intercept) 333.8187  20.56392 12 16.233219  <.0001
treatmentS  -46.6071  10.77655 11 -4.324868  0.0012
period2      15.8929  10.77655 11  1.474763  0.1683
 Correlation: 
           (Intr) trtmnS
treatmentS -0.242       
period2    -0.242 -0.077

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.69842428 -0.41583473  0.06268074  0.52314988  1.28230120 

Number of Observations: 26
Number of Groups: 13 

The difference in means of treatment levels is S - F = -46.6, with SE 10.78.

If you add the sequence term to test for period by treatment interaction, you
get fixed effects

Fixed effects: outcome ~ treatment + period + sequence 
               Value Std.Error DF   t-value p-value
(Intercept) 337.1429  28.27660 11 11.923035  <.0001
treatmentS  -46.6071  10.77654 11 -4.324871  0.0012
period2      15.8929  10.77654 11  1.474765  0.1683
sequence2    -7.2024  40.20272 11 -0.179152  0.8611

The sequence effect is not statistically significant (P=0.8611), and so one
would not worry about treatment by period interaction here. If one had observed
significant sequence effect, then one would need to either (a) analyze only the
first period data, or (b) explain why the interaction is not real and can be
ignored.

The advantages to the lme() analysis over the lm() analysis are: (1) sequence
effect automatically gets appropriate denominator (not so for the lm version)
(2) all data are actually used in the analysis, (3) We are trying to infer to
the population of subjects, hence they should be thought of as random effects,
not fixed effects (unless you really are interested in only those particular
subjects).

If any of this is confusing, let me know.

Russell Reeve, Ph.D.
Dir of Experimental Design, Analysis, and Quality
rreeve at liposcience.com

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jan 2003 - Analyzing an unbalanced AB/BA cross-over design

[R] Analyzing an unbalanced AB/BA cross-over design

[R] Analyzing an unbalanced AB/BA cross-over design

[R] Analyzing an unbalanced AB/BA cross-over design

[R] Analyzing an unbalanced AB/BA cross-over design

Possibly Parallel Threads