E Joffe
2013-Sep-13 09:15 UTC
[R] Creating dummy vars with contrasts - why does the returned identity matrix contain all levels (and not n-1 levels) ?
Hello, I have a problem with creating an identity matrix for glmnet by using the contrasts function. I have a factor with 4 levels. When I create dummy variables I think there should be n-1 variables (in this case 3) - so that the contrasts would be against the baseline level. This is also what is written in the help file for 'contrasts'. The problem is that the function creates a matrix with n variables (i.e. the same as the number of levels) and not n-1 (where I would have a baseline level for comparison). My questions are: 1. How can I create a matrix with n-1 dummy vars ? was I supposed to define explicitly that I want contr.treatment (contrasts) ? 2. If it is not possible, how should I interpret the hazard ratios in the Cox regression I am generating (I use glmnet for variable selection and then generate a Cox regression) - That is, if I get an HR of 3 for the variable 300mg what does it mean ? the hazard is 3 times higher of what ? Here is some code to reproduce the issue: # Create a 4 level example factor trt <- factor( sample( c("PLACEBO", "300 MG", "600 MG", "1200 MG"), 100, replace=TRUE ) ) # Use contrasts to get the identity matrix of dummy variables to be used in glmnet trt2 <- contrasts (trt,contrasts=FALSE) Results (as you can see all levels are represented in the identity matrix):> levels (trt)[1] "1200 MG" "300 MG" "600 MG" "PLACEBO"> print (trt2)1200 MG 300 MG 600 MG PLACEBO 1200 MG 1 0 0 0 300 MG 0 1 0 0 600 MG 0 0 1 0 PLACEBO 0 0 0 1 Thank you, Erel [[alternative HTML version deleted]]
David Winsemius
2013-Sep-13 13:04 UTC
[R] Creating dummy vars with contrasts - why does the returned identity matrix contain all levels (and not n-1 levels) ?
On Sep 13, 2013, at 4:15 AM, E Joffe wrote:> Hello, > > > > I have a problem with creating an identity matrix for glmnet by > using the > contrasts function.Why do you want to do this?> I have a factor with 4 levels. > > When I create dummy variables I think there should be n-1 variables > (in this > case 3) - so that the contrasts would be against the baseline level. > > This is also what is written in the help file for 'contrasts'. > > The problem is that the function creates a matrix with n variables > (i.e. the > same as the number of levels) and not n-1 (where I would have a > baseline > level for comparison).Only if you specify contrasts=FALSE does it do so and this is documented in that help file.> > > > My questions are: > > 1. How can I create a matrix with n-1 dummy vars ?See below.> was I supposed to > define explicitly that I want contr.treatment (contrasts) ?No need to do so.> > 2. If it is not possible, how should I interpret the hazard > ratios in > the Cox regression I am generating (I use glmnet for variable > selection and > then generate a Cox regression) - That is, if I get an HR of 3 for > the > variable 300mg what does it mean ? the hazard is 3 times higher of > what ? >Relative hazards are generally referenced to the "baseline hazard", i.e. the hazard for a group with the omitted level for treatment constrasts and the mean value for any numeric predictors.> Here is some code to reproduce the issue: > > # Create a 4 level example factor > > trt <- factor( sample( c("PLACEBO", "300 MG", "600 MG", "1200 MG"), > > 100, replace=TRUE ) )# If your intent is to use constrasts different than the defaults used by # regression functions, these factor contrasts need to be assigned, either # within the construction of the factor or after the fact. > contrasts(trt) 300 MG 600 MG PLACEBO 1200 MG 0 0 0 300 MG 1 0 0 600 MG 0 1 0 PLACEBO 0 0 1 # the default value for the contrasts parameter is TRUE and the default type is treatement # That did not cause any change to the 'trt'-object: trt #To make a change you need to use the `contrasts<-` function: contrasts (trt) <- contrasts(trt) trt> > # Use contrasts to get the identity matrix of dummy variables to be > used in > glmnet > > trt2 <- contrasts (trt,contrasts=FALSE) > > Results (as you can see all levels are represented in the identity > matrix): > >> levels (trt) > [1] "1200 MG" "300 MG" "600 MG" "PLACEBO" > > >> print (trt2) > > 1200 MG 300 MG 600 MG PLACEBO > > 1200 MG 1 0 0 0 > > 300 MG 0 1 0 0 > > 600 MG 0 0 1 0 > > PLACEBO 0 0 0 1 > > > > [[alternative HTML version deleted]]Rhelp is a plain text mailing list. -- David Winsemius, MD Alameda, CA, USA