Sean Zhang
2009-Jun-20 04:13 UTC
[R] how to apply the dummy coding rule in a dataframe with complete factor levels to another dataframe with incomplete factor levels?
Dear R helpers: Sorry to bother for a basic question about model.matrix. Basically, I want to apply the dummy coding rule in a dataframe with complete factor levels to another dataframe with incomplete factor levels. I used model.matrix, but could not get what I want. The following is an example. #Suppose I have two dataframe A and B dfA=data.frame(f1=factor(c('a','b','c')), f2=factor(c('aa','bb','cc'))) dfB =data.frame(f1=factor(c('a','b','b')), f2=factor(c('aa','bb','bb'))) #dfB's factor variables have less number of levels #use model.matrix on dfA (matA<-model.matrix(~f1+f2,data=dfA)) #use model.matrix on dfB (matB<-model.matrix(~f1+f2,data=dfB)) #I actaully like to dummy code dfB using the dummy coding rule defined in model.matrix(~f1+f2,data=dfA)) #matB_wanted is below (matB_wanted<-rbind(c(1,0,0,0,0),c(1,1,0,1,0),c(1,1,0,1,0)) ) colnames(matB_wanted)<-colnames(matA) matB_wanted Can someone kindly show me how to get matB_wanted? Many thanks in advance! -Sean [[alternative HTML version deleted]]
Kingsford Jones
2009-Jun-20 16:34 UTC
[R] how to apply the dummy coding rule in a dataframe with complete factor levels to another dataframe with incomplete factor levels?
Hi Sean, The levels attribute of a factor can contain levels that are not represented in the data. So, in your example we can get the desired result by adding the missing levels via the levels argument to the factor function:> dfB =data.frame(f1=factor(c('a','b','b'), levels=c('a','b','c')), f2=factor(c('aa','bb','bb'), levels=c('aa','bb','cc'))) > model.matrix(~f1+f2, data=dfB)(Intercept) f1b f1c f2bb f2cc 1 1 0 0 0 0 2 1 1 0 1 0 3 1 1 0 1 0 attr(,"assign") [1] 0 1 1 2 2 attr(,"contrasts") attr(,"contrasts")$f1 [1] "contr.treatment" attr(,"contrasts")$f2 [1] "contr.treatment" hth, Kingsford Jones On Fri, Jun 19, 2009 at 10:13 PM, Sean Zhang<seanecon at gmail.com> wrote:> Dear R helpers: > > Sorry to bother for a basic question about model.matrix. > Basically, I want to apply the dummy coding rule in a dataframe with > complete factor levels to another dataframe with incomplete factor levels. > I used model.matrix, but could not get what I want. > The following is an example. > > #Suppose I have two dataframe A and B > dfA=data.frame(f1=factor(c('a','b','c')), f2=factor(c('aa','bb','cc'))) > dfB =data.frame(f1=factor(c('a','b','b')), f2=factor(c('aa','bb','bb'))) > #dfB's factor variables have less number of levels > > #use model.matrix on dfA > (matA<-model.matrix(~f1+f2,data=dfA)) > #use model.matrix on dfB > (matB<-model.matrix(~f1+f2,data=dfB)) > #I actaully like to dummy code dfB using the dummy coding rule defined in > model.matrix(~f1+f2,data=dfA)) > #matB_wanted ?is below > (matB_wanted<-rbind(c(1,0,0,0,0),c(1,1,0,1,0),c(1,1,0,1,0)) ) > colnames(matB_wanted)<-colnames(matA) > matB_wanted > Can someone kindly show me how to get matB_wanted? > Many thanks in advance! > > -Sean > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >