martin wolfsegger
2003-Jan-29 11:02 UTC
[R] Analyzing an unbalanced AB/BA cross-over design
I am looking for help to analyze an unbalanced AB/BA cross-over design by requesting the type III SS ! # Example 3.1 from S. Senn (1993). Cross-over Trials in Clinical Research outcome<-c(310,310,370,410,250,380,330,270,260,300,390,210,350,365,370,310,380,290,260,90,385,400,410,320,340,220) subject<-as.factor(c(1,4,6,7,10,11,14,1,4,6,7,10,11,14,2,3,5,9,12,13,2,3,5,9,12,13)) period<-as.factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2)) treatment<-as.factor(c('F','F','F','F','F','F','F','S','S','S','S','S','S','S','S','S','S','S','S','S','F','F','F','F','F','F')) sequence<-as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2)) example<-data.frame(outcome,subject,period,treatment,sequence) The recommended SAS code equals PROC GLM DATA=example; CLASS subject period treatment sequence; MODEL outcome = treatment sequence period subject(sequence); RANDOM subject(sequence); RUN; For PROC GLM, the random effects are treated in a post hoc fashion after the complete fixed effect model is fit. This distinction affects other features in the GLM procedure, such as the results of the LSMEANS and ESTIMATE statements. Looking only on treatment, period, sequence and subject effects, the random statement can be omitted. The R code for type I SS equals example.lm<-lm(outcome~treatment+period+sequence+subject%in%sequence, data=example) anova(example.lm) Response: outcome Df Sum Sq Mean Sq F value Pr(>F) treatment 1 13388 13388 17.8416 0.001427 ** period 1 1632 1632 2.1749 0.168314 sequence 1 335 335 0.4467 0.517697 sequence:subject 11 114878 10443 13.9171 6.495e-05 *** Residuals 11 8254 750 According to the unbalanced design, I requested the type III SS which resulted in an error statement library(car) Anova(example.lm, type="III") Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry, : One or more terms aliased in model. by using glm I got results with 0 df for the sequence effect !!!! example.glm<-glm(outcome~treatment+period+sequence+subject%in%sequence, data=example, family=gaussian) library(car) Anova(example.glm,type="III",test.statistic="F") Anova Table (Type III tests) Response: outcome SS Df F Pr(>F) treatment 14036 1 18.7044 0.001205 ** period 1632 1 2.1749 0.168314 sequence 0 0 sequence:subject 114878 11 13.9171 6.495e-05 *** Residuals 8254 11 The questions based on this output are 1) Why was there an error statement requesting type III SS based on lm ? 2) Why I got a result by using glm with 0 df for the period effect ? 3) How can I get the estimate, the StdError for the constrast (-1,1) of the treatment effect ? Thanks --
Peter Dalgaard BSA
2003-Jan-29 12:30 UTC
[R] Analyzing an unbalanced AB/BA cross-over design
martin wolfsegger <wolfseggerm at gmx.at> writes:> I am looking for help to analyze an unbalanced AB/BA cross-over design by > requesting the type III SS ! > > # Example 3.1 from S. Senn (1993). Cross-over Trials in Clinical > Research > outcome<-c(310,310,370,410,250,380,330,270,260,300,390,210,350,365,370,310,380,290,260,90,385,400,410,320,340,220) > subject<-as.factor(c(1,4,6,7,10,11,14,1,4,6,7,10,11,14,2,3,5,9,12,13,2,3,5,9,12,13)) > period<-as.factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2)) > treatment<-as.factor(c('F','F','F','F','F','F','F','S','S','S','S','S','S','S','S','S','S','S','S','S','F','F','F','F','F','F')) > sequence<-as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2)) > example<-data.frame(outcome,subject,period,treatment,sequence) > > The recommended SAS code equals > > PROC GLM DATA=example; > CLASS subject period treatment sequence; > MODEL outcome = treatment sequence period subject(sequence); > RANDOM subject(sequence); > RUN; > > For PROC GLM, the random effects are treated in a post hoc fashion after the > complete fixed effect model is fit. This distinction affects other features > in the GLM procedure, such as the results of the LSMEANS and ESTIMATE > statements. Looking only on treatment, period, sequence and subject effects, the > random statement can be omitted. > > The R code for type I SS equals > > example.lm<-lm(outcome~treatment+period+sequence+subject%in%sequence, > data=example) > anova(example.lm) > > Response: outcome > Df Sum Sq Mean Sq F value Pr(>F) > treatment 1 13388 13388 17.8416 0.001427 ** > period 1 1632 1632 2.1749 0.168314 > sequence 1 335 335 0.4467 0.517697 > sequence:subject 11 114878 10443 13.9171 6.495e-05 *** > Residuals 11 8254 750 > > According to the unbalanced design, I requested the type III SS which > resulted in an error statement > > library(car) > Anova(example.lm, type="III") > > Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry, : > One or more terms aliased in model. > > > by using glm I got results with 0 df for the sequence effect !!!! > > example.glm<-glm(outcome~treatment+period+sequence+subject%in%sequence, > data=example, family=gaussian) > library(car) > Anova(example.glm,type="III",test.statistic="F") > > Anova Table (Type III tests) > > Response: outcome > SS Df F Pr(>F) > treatment 14036 1 18.7044 0.001205 ** > period 1632 1 2.1749 0.168314 > sequence 0 0 > sequence:subject 114878 11 13.9171 6.495e-05 *** > Residuals 8254 11 > > > The questions based on this output are > > 1) Why was there an error statement requesting type III SS based on lm ? > 2) Why I got a result by using glm with 0 df for the period effect ? > 3) How can I get the estimate, the StdError for the constrast (-1,1) of the > treatment effect ?The type I/III issues my be better left to people who actually understand them, but note that sequence is really a embedded in subject (7 subjects had one sequence and 6 subjects the other) so sequence:subject is equivalent to just subject, and this is likely also the reason that sequence is aliased within subject. Basically, this is a fixed-effects model, and if each subject has his own parameter, you can't estimate the sequence effect. I'd do this kind of design with an explicit mixed-effects model:> summary(aov(outcome~treatment+period+sequence+Error(subject)))Error: subject Df Sum Sq Mean Sq F value Pr(>F) sequence 1 335 335 0.0321 0.861 Residuals 11 114878 10443 Error: Within Df Sum Sq Mean Sq F value Pr(>F) treatment 1 13388.5 13388.5 17.8416 0.001427 ** period 1 1632.1 1632.1 2.1749 0.168314 Residuals 11 8254.5 750.4 possibly substituting treatment:period for sequence (it's the same thing). -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Dear Martin, As Peter Dalgaard has pointed out, the model you fit has redundant parameters, as you can see from its summary (or from the coefficients): > summary(example.lm) . . . Coefficients: (13 not defined because of singularities) . . . > coef(example.lm) (Intercept) treatmentS period2 sequence2 sequence1:subject2 305.35714 -46.60714 15.89286 -135.00000 NA sequence2:subject2 sequence1:subject3 sequence2:subject3 sequence1:subject4 sequence2:subject4 222.50000 NA 200.00000 -5.00000 NA sequence1:subject5 sequence2:subject5 sequence1:subject6 sequence2:subject6 sequence1:subject7 NA 240.00000 45.00000 NA 110.00000 sequence2:subject7 sequence1:subject9 sequence2:subject9 sequence1:subject10 sequence2:subject10 NA NA 150.00000 -60.00000 NA sequence1:subject11 sequence2:subject11 sequence1:subject12 sequence2:subject12 sequence1:subject13 75.00000 NA NA 145.00000 NA sequence2:subject13 sequence1:subject14 sequence2:subject14 NA 57.50000 NA The computational procedure used by Anova (in the car package) assumes a full-rank parametrization of the model. The difference between the Anova methods for lm and glm is that the former uses the coefficient estimates and their covariance matrix to compute sums of squares while the latter refits the model. You got a result from SAS because it works with a deficient-rank parametrization. Finally (and more generally), if you really want "type-III" sums of squares in R, be careful with the kind of contrast-coding that you use. The default "treatment" contrasts won't give you what you want. I'm sorry that you encountered this problem. John At 11:00 AM 1/29/2003 +0100, martin wolfsegger wrote:>I am looking for help to analyze an unbalanced AB/BA cross-over design by >requesting the type III SS ! > ># Example 3.1 from S. Senn (1993). Cross-over Trials in Clinical >Research >outcome<-c(310,310,370,410,250,380,330,270,260,300,390,210,350,365,370,310,380,290,260,90,385,400,410,320,340,220) >subject<-as.factor(c(1,4,6,7,10,11,14,1,4,6,7,10,11,14,2,3,5,9,12,13,2,3,5,9,12,13)) >period<-as.factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2)) >treatment<-as.factor(c('F','F','F','F','F','F','F','S','S','S','S','S','S','S','S','S','S','S','S','S','F','F','F','F','F','F')) >sequence<-as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2)) >example<-data.frame(outcome,subject,period,treatment,sequence) > >The recommended SAS code equals > >PROC GLM DATA=example; >CLASS subject period treatment sequence; >MODEL outcome = treatment sequence period subject(sequence); >RANDOM subject(sequence); >RUN; > >For PROC GLM, the random effects are treated in a post hoc fashion after the >complete fixed effect model is fit. This distinction affects other features >in the GLM procedure, such as the results of the LSMEANS and ESTIMATE >statements. Looking only on treatment, period, sequence and subject >effects, the >random statement can be omitted. > >The R code for type I SS equals > >example.lm<-lm(outcome~treatment+period+sequence+subject%in%sequence, >data=example) >anova(example.lm) > >Response: outcome > Df Sum Sq Mean Sq F value Pr(>F) >treatment 1 13388 13388 17.8416 0.001427 ** >period 1 1632 1632 2.1749 0.168314 >sequence 1 335 335 0.4467 0.517697 >sequence:subject 11 114878 10443 13.9171 6.495e-05 *** >Residuals 11 8254 750 > >According to the unbalanced design, I requested the type III SS which >resulted in an error statement > >library(car) >Anova(example.lm, type="III") > >Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry, : > One or more terms aliased in model. > > >by using glm I got results with 0 df for the sequence effect !!!! > >example.glm<-glm(outcome~treatment+period+sequence+subject%in%sequence, >data=example, family=gaussian) >library(car) >Anova(example.glm,type="III",test.statistic="F") > >Anova Table (Type III tests) > >Response: outcome > SS Df F Pr(>F) >treatment 14036 1 18.7044 0.001205 ** >period 1632 1 2.1749 0.168314 >sequence 0 0 >sequence:subject 114878 11 13.9171 6.495e-05 *** >Residuals 8254 11 > > >The questions based on this output are > >1) Why was there an error statement requesting type III SS based on lm ? >2) Why I got a result by using glm with 0 df for the period effect ? >3) How can I get the estimate, the StdError for the constrast (-1,1) of the >treatment effect ? > > >Thanks > >-- > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >http://www.stat.math.ethz.ch/mailman/listinfo/r-help----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox -----------------------------------------------------
For Type III SS, the sequence effect is determined by the subject, since subject is nested within sequence Type III gives the additional reduction in the residual SS after accounting for the other model terms. For Type I SS, you get the reduction in the residual SS after accounting for the model terms before the term in question. Since subject within sequence comes after subject, you get subject Type I SS. Note that you can omit the sequence effect entirely lm(outcome~treatment+period+subject, data=example) The contrast of interest (on treatment) will not be affected. One should also note that sequence is confounded with period by treatment interaction, so beware. Further, the subject that has an observation missing is essentially removed from the analysis (doesn't affect the results). But that is not true if you use a mixed effect modeling engine, i.e., library(nlme) my.lme <- lme(outcome~treatment+period, data=example, random=~1|subject) In this model, all 8+7 observations affect the likelihood, and are used in fitting the fixed effects: The mixed effect model and the purely fixed effect model will give difference results for unbalanced data, same results for balanced data. summary(my.lme) give the output: Linear mixed-effects model fit by REML Data: example AIC BIC logLik 265.1293 270.8068 -127.5647 Random effects: Formula: ~1 | subject (Intercept) Residual StdDev: 66.52344 27.39351 Fixed effects: outcome ~ treatment + period Value Std.Error DF t-value p-value (Intercept) 333.8187 20.56392 12 16.233219 <.0001 treatmentS -46.6071 10.77655 11 -4.324868 0.0012 period2 15.8929 10.77655 11 1.474763 0.1683 Correlation: (Intr) trtmnS treatmentS -0.242 period2 -0.242 -0.077 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.69842428 -0.41583473 0.06268074 0.52314988 1.28230120 Number of Observations: 26 Number of Groups: 13 The difference in means of treatment levels is S - F = -46.6, with SE 10.78. If you add the sequence term to test for period by treatment interaction, you get fixed effects Fixed effects: outcome ~ treatment + period + sequence Value Std.Error DF t-value p-value (Intercept) 337.1429 28.27660 11 11.923035 <.0001 treatmentS -46.6071 10.77654 11 -4.324871 0.0012 period2 15.8929 10.77654 11 1.474765 0.1683 sequence2 -7.2024 40.20272 11 -0.179152 0.8611 The sequence effect is not statistically significant (P=0.8611), and so one would not worry about treatment by period interaction here. If one had observed significant sequence effect, then one would need to either (a) analyze only the first period data, or (b) explain why the interaction is not real and can be ignored. The advantages to the lme() analysis over the lm() analysis are: (1) sequence effect automatically gets appropriate denominator (not so for the lm version) (2) all data are actually used in the analysis, (3) We are trying to infer to the population of subjects, hence they should be thought of as random effects, not fixed effects (unless you really are interested in only those particular subjects). If any of this is confusing, let me know. Russell Reeve, Ph.D. Dir of Experimental Design, Analysis, and Quality rreeve at liposcience.com