thr3ads.net - R help - [R] parameterization of glm nested design [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Huso, Manuela

2010-Jun-30 18:15 UTC

[R] parameterization of glm nested design

Dear R community,

I am new to R, a reforming SAS user :)  I am running R 2.10.1 on a Windows XP
machine.   I would like to write linear functions of my coefficient parameter
estimates from a glm, but am having a difficult time understanding the
parameterization R uses.   In the toy example below I am running a glm on
binomial data, with clones and lines within clones as fixed effects, each with 6
replicates.  I cannot figure out the algorithm R uses for determining which
combination of levels is used as the reference.  Crawley, in the chapter on
Statistical Modeling in the R Book states "The writers of R agree that
treatment contrasts represent the best solution.  This method does away with
parameter a, the overall mean. The mean of the factor level that comes first in
the alphabet (control, in our example) is promoted to pole position, and the
other effects are shown as differences (contrasts) between this mean and the
other four factor level means."

This pattern seems to hold for full factorials, but it doesn't appear to
work with this example in which Line is nested within Clone.  In this example, R
appears to use the last line in alphabet order of the first clone (Clone 1) as
the intercept.  Thereafter, the reference levels for Clones are the last lines
of Clones 2 and 3, but the first line (Line 1) of Clone 4.    The first line of
Clone 4, is also the first line over all lines.  Can someone please explain what
process R uses to determine the parameterization of this model?  Of course, I
know what the parameterization is in this model and can write the functions I
need.  However, I am trying to understand the algorithm R uses to determine the
parameterization.  This is only a toy example of a much larger study and I would
like to know when I should use the last level as the reference and when I should
use the first.

Many thanks,
Manuela

tmp <- data.frame(Cl=rep(1:4,c(12,18,18,18)),
                   L
=rep(c(91,104,14,91,96,"84-1",96,116,1,9,14),each=6),
                   N =rep(10,(12+18*3)),
                   D =c(1,6,8,1,1,1,2,6,10,3,3,1,2,1,1,0,4,4,6,5,3,5,1,3,
                        1,6,8,5,5,3,5,2,1,0,4,5,1,0,2,3,6,7,4,0,2,5,3,8,1,
                        4,7,0,6,3,7,2,3,6,1,9,7,2,1,3,0,1) )
tmp$N[c(13,15,59)] <- c(8,9,9)  # not always 10 trials
tmp$Cl <- factor(tmp$Cl)            # Make sure clone and line within clone
are factors
tmp$L  <- factor(tmp$L)

model <- formula(cbind(D,(N-D))~ Cl/L)
tmp.glm<-glm(data=tmp,formula=model,
family=quasibinomial(link="logit"))

with(tmp,table(L,Cl))
summary.glm(tmp.glm)$coefficients

unique(tmp[order(tmp$L),1:2])
# The coeff estimates R gives are in this order,
#   but R is using the rows c(1,8,10,11) as reference
# Why?

	[[alternative HTML version deleted]]

Charles C. Berry

2010-Jul-01 16:30 UTC

head link

[R] parameterization of glm nested design

On Wed, 30 Jun 2010, Huso, Manuela wrote:
> Dear R community,
>
> I am new to R, a reforming SAS user :)
Welcome aboard!
> I am running R 2.10.1 on a Windows XP machine.  I would like to write 
> linear functions of my coefficient parameter estimates from a glm, but 
> am having a difficult time understanding the parameterization R uses. 
> In the toy example below I am running a glm on binomial data, with 
> clones and lines within clones as fixed effects, each with 6 replicates. 
> I cannot figure out the algorithm R uses for determining which 
> combination of levels is used as the reference.  Crawley, in the chapter 
> on Statistical Modeling in the R Book states "The writers of R agree 
> that treatment contrasts represent the best solution.  This method does 
> away with parameter a, the overall mean. The mean of the factor level 
> that comes first in the alphabet (control, in our example) is promoted 
> to pole position, and the other effects are shown as differences 
> (contrasts) between this mean and the other four factor level means."
>
> This pattern seems to hold for full factorials, but it doesn't appear
to
> work with this example in which Line is nested within Clone.  In this 
> example, R appears to use the last line in alphabet order of the first 
> clone (Clone 1) as the intercept.
Not in my locale:
> sort(levels(tmp$Cl))[1] "1" "2" "3"
"4">
> sort(levels(tmp$L))[1] "1"    "104"  "116"  "14"  
"84-1" "9"    "91"  
"96"> 
> cat(names(coef(tmp.glm)),fill=50)(Intercept) Cl2 Cl3 Cl4 Cl1:L104 Cl2:L104
Cl3:L104 Cl4:L104 Cl1:L116 Cl2:L116 Cl3:L116
Cl4:L116 Cl1:L14 Cl2:L14 Cl3:L14 Cl4:L14
Cl1:L84-1 Cl2:L84-1 Cl3:L84-1 Cl4:L84-1 Cl1:L9
Cl2:L9 Cl3:L9 Cl4:L9 Cl1:L91 Cl2:L91 Cl3:L91
Cl4:L91 Cl1:L96 Cl2:L96 Cl3:L96 Cl4:L96>
> # list any terms ending in Cl1 or L1 
> grep("Cl1$|L1$",names(coef(tmp.glm)))
integer(0)> # so those terms were dropped!!
Cl1 is dropped as are Cl[1234]*:L1.
> Thereafter, the reference levels for 
> Clones are the last lines of Clones 2 and 3, but the first line (Line 1) 
> of Clone 4.  The first line of Clone 4, is also the first line over all 
> lines.  Can someone please explain what process R uses to determine the 
> parameterization of this model?
Huh?

AFAICS, it dropped the first level in sort(levels(...)) of Cl and the 
first level of L in the Cl:L terms as noted above.

And if you want something different, "contrasts()<-" is your
friend.

See
 	?contrasts

and the See Also's there.


Perhaps you are asking why the pivoting dropped some terms and not 
others??

If so, you'll need to look at tmp.glm$qr$pivot or just 
qr(model.matrix(~Cl/L,tmp))$pivot and dig through the qr() source codes to 
follow the algebra.

You do know that the design is not of full rank, right?

HTH,

Chuck

p.s. Thanks for following the posting guide's dictum to include a 
self-contained example.

p.p.s. Here is my locale:
> sessionInfo()$locale[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United
States.1252">
> Of course, I know what the parameterization is in this model and can 
> write the functions I need.  However, I am trying to understand the 
> algorithm R uses to determine the parameterization.  This is only a toy 
> example of a much larger study and I would like to know when I should 
> use the last level as the reference and when I should use the first.
>
> Many thanks,
> Manuela
>
> tmp <- data.frame(Cl=rep(1:4,c(12,18,18,18)),
>                   L
=rep(c(91,104,14,91,96,"84-1",96,116,1,9,14),each=6),
>                   N =rep(10,(12+18*3)),
>                   D =c(1,6,8,1,1,1,2,6,10,3,3,1,2,1,1,0,4,4,6,5,3,5,1,3,
>                        1,6,8,5,5,3,5,2,1,0,4,5,1,0,2,3,6,7,4,0,2,5,3,8,1,
>                        4,7,0,6,3,7,2,3,6,1,9,7,2,1,3,0,1) )
> tmp$N[c(13,15,59)] <- c(8,9,9)  # not always 10 trials
> tmp$Cl <- factor(tmp$Cl)            # Make sure clone and line within
clone are factors
> tmp$L  <- factor(tmp$L)
>
> model <- formula(cbind(D,(N-D))~ Cl/L)
> tmp.glm<-glm(data=tmp,formula=model,
family=quasibinomial(link="logit"))
>
> with(tmp,table(L,Cl))
> summary.glm(tmp.glm)$coefficients
>
> unique(tmp[order(tmp$L),1:2])
> # The coeff estimates R gives are in this order,
> #   but R is using the rows c(1,8,10,11) as reference
> # Why?
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Reasonably Related Threads

Search for more maybe matching threads

R help - Jun 2010 - parameterization of glm nested design

[R] parameterization of glm nested design

[R] parameterization of glm nested design

Reasonably Related Threads