E Joffe
2013-Sep-13 09:15 UTC
[R] Creating dummy vars with contrasts - why does the returned identity matrix contain all levels (and not n-1 levels) ?
Hello,
I have a problem with creating an identity matrix for glmnet by using the
contrasts function.
I have a factor with 4 levels.
When I create dummy variables I think there should be n-1 variables (in this
case 3) - so that the contrasts would be against the baseline level.
This is also what is written in the help file for 'contrasts'.
The problem is that the function creates a matrix with n variables (i.e. the
same as the number of levels) and not n-1 (where I would have a baseline
level for comparison).
My questions are:
1. How can I create a matrix with n-1 dummy vars ? was I supposed to
define explicitly that I want contr.treatment (contrasts) ?
2. If it is not possible, how should I interpret the hazard ratios in
the Cox regression I am generating (I use glmnet for variable selection and
then generate a Cox regression) - That is, if I get an HR of 3 for the
variable 300mg what does it mean ? the hazard is 3 times higher of what ?
Here is some code to reproduce the issue:
# Create a 4 level example factor
trt <- factor( sample( c("PLACEBO", "300 MG", "600
MG", "1200 MG"),
100, replace=TRUE ) )
# Use contrasts to get the identity matrix of dummy variables to be used in
glmnet
trt2 <- contrasts (trt,contrasts=FALSE)
Results (as you can see all levels are represented in the identity matrix):
> levels (trt)
[1] "1200 MG" "300 MG" "600 MG"
"PLACEBO"
> print (trt2)
1200 MG 300 MG 600 MG PLACEBO
1200 MG 1 0 0 0
300 MG 0 1 0 0
600 MG 0 0 1 0
PLACEBO 0 0 0 1
Thank you,
Erel
[[alternative HTML version deleted]]
David Winsemius
2013-Sep-13 13:04 UTC
[R] Creating dummy vars with contrasts - why does the returned identity matrix contain all levels (and not n-1 levels) ?
On Sep 13, 2013, at 4:15 AM, E Joffe wrote:> Hello, > > > > I have a problem with creating an identity matrix for glmnet by > using the > contrasts function.Why do you want to do this?> I have a factor with 4 levels. > > When I create dummy variables I think there should be n-1 variables > (in this > case 3) - so that the contrasts would be against the baseline level. > > This is also what is written in the help file for 'contrasts'. > > The problem is that the function creates a matrix with n variables > (i.e. the > same as the number of levels) and not n-1 (where I would have a > baseline > level for comparison).Only if you specify contrasts=FALSE does it do so and this is documented in that help file.> > > > My questions are: > > 1. How can I create a matrix with n-1 dummy vars ?See below.> was I supposed to > define explicitly that I want contr.treatment (contrasts) ?No need to do so.> > 2. If it is not possible, how should I interpret the hazard > ratios in > the Cox regression I am generating (I use glmnet for variable > selection and > then generate a Cox regression) - That is, if I get an HR of 3 for > the > variable 300mg what does it mean ? the hazard is 3 times higher of > what ? >Relative hazards are generally referenced to the "baseline hazard", i.e. the hazard for a group with the omitted level for treatment constrasts and the mean value for any numeric predictors.> Here is some code to reproduce the issue: > > # Create a 4 level example factor > > trt <- factor( sample( c("PLACEBO", "300 MG", "600 MG", "1200 MG"), > > 100, replace=TRUE ) )# If your intent is to use constrasts different than the defaults used by # regression functions, these factor contrasts need to be assigned, either # within the construction of the factor or after the fact. > contrasts(trt) 300 MG 600 MG PLACEBO 1200 MG 0 0 0 300 MG 1 0 0 600 MG 0 1 0 PLACEBO 0 0 1 # the default value for the contrasts parameter is TRUE and the default type is treatement # That did not cause any change to the 'trt'-object: trt #To make a change you need to use the `contrasts<-` function: contrasts (trt) <- contrasts(trt) trt> > # Use contrasts to get the identity matrix of dummy variables to be > used in > glmnet > > trt2 <- contrasts (trt,contrasts=FALSE) > > Results (as you can see all levels are represented in the identity > matrix): > >> levels (trt) > [1] "1200 MG" "300 MG" "600 MG" "PLACEBO" > > >> print (trt2) > > 1200 MG 300 MG 600 MG PLACEBO > > 1200 MG 1 0 0 0 > > 300 MG 0 1 0 0 > > 600 MG 0 0 1 0 > > PLACEBO 0 0 0 1 > > > > [[alternative HTML version deleted]]Rhelp is a plain text mailing list. -- David Winsemius, MD Alameda, CA, USA