Hi, I'm looking for an easy way to discretize factors in R I've noticed that the lm function does this automatically with a nice result. If I have group <- c("A", "B","B","C","C","C") and run: lm(result ~ x1 + group) The lm function has split the group into separate binary variables {0,1} before performing the regression. I now have: groupA groupB groupC Some of the other models that I want to try won't accept factors, so they need to be discretized this way. Is there a command in R for this, or some easy shortcut? (I tried digging into the lm code, but couldn't find where this is being done.) Thanks! -N
Maybe this? group <- factor(c("A", "B","B","C","C","C")) model.matrix(~0+group) -tgs On Sat, May 15, 2010 at 2:02 PM, Noah Silverman <noah@smartmediacorp.com>wrote:> Hi, > > I'm looking for an easy way to discretize factors in R > > I've noticed that the lm function does this automatically with a nice > result. > > If I have > > group <- c("A", "B","B","C","C","C") > > and run: > > lm(result ~ x1 + group) > > The lm function has split the group into separate binary variables {0,1} > before performing the regression. I now have: > groupA > groupB > groupC > > Some of the other models that I want to try won't accept factors, so > they need to be discretized this way. > > Is there a command in R for this, or some easy shortcut? (I tried > digging into the lm code, but couldn't find where this is being done.) > > Thanks! > > -N > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Noah Silverman > Sent: Saturday, May 15, 2010 11:03 AM > To: r-help at r-project.org > Subject: [R] Discretize factors? > > Hi, > > I'm looking for an easy way to discretize factors in R > > I've noticed that the lm function does this automatically with a nice > result. > > If I have > > group <- c("A", "B","B","C","C","C") > > and run: > > lm(result ~ x1 + group) > > The lm function has split the group into separate binary variables {0,1} > before performing the regression. I now have: > groupA > groupB > groupC > > Some of the other models that I want to try won't accept factors, so > they need to be discretized this way. > > Is there a command in R for this, or some easy shortcut? (I tried > digging into the lm code, but couldn't find where this is being done.) > > Thanks! > > -N >Noah, You might try something like model.matrix(~ group -1) Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA
Update, I have it working, but now its producing really ugly labels. Must be a small adjustment to the code. Any ideas?? ##Create example data.frame group <- c("A", "B","B","C","C","C") a <- c(1,4,3,4,5,6) b <- c(5,4,5,3,4,5) d <- data.frame(cbind(a,b,group)) #create new frame with discretized group>cbind(d[,1:2], model.matrix(~0+d[,3]) )a b d[, 3]A d[, 3]B d[, 3]C 1 1 5 1 0 0 2 4 4 0 1 0 3 3 5 0 1 0 4 4 3 0 0 1 5 5 4 0 0 1 6 6 5 0 0 1 So, as you can see, it works, but the labels for the groups don't I then tried using the column name instead of number and still got ugly results:> cbind(d[,1:2], model.matrix(~0+d[,"group"]) )a b d[, "group"]A d[, "group"]B d[, "group"]C 1 1 5 1 0 0 2 4 4 0 1 0 3 3 5 0 1 0 4 4 3 0 0 1 5 5 4 0 0 1 6 6 5 0 0 1 Any ideas? -N On 5/15/10 11:02 AM, Noah Silverman wrote:> Hi, > > I'm looking for an easy way to discretize factors in R > > I've noticed that the lm function does this automatically with a nice > result. > > If I have > > group <- c("A", "B","B","C","C","C") > > and run: > > lm(result ~ x1 + group) > > The lm function has split the group into separate binary variables {0,1} > before performing the regression. I now have: > groupA > groupB > groupC > > Some of the other models that I want to try won't accept factors, so > they need to be discretized this way. > > Is there a command in R for this, or some easy shortcut? (I tried > digging into the lm code, but couldn't find where this is being done.) > > Thanks! > > -N > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >