Josip Dasovic
2009-Feb-09 19:30 UTC
[R] Generating new variable based on values of an existing variable
Dear R Help-Listers: I have a problem that seems like it should have a simple solution, but I've spent hours on it (and searching the r-help archives) to no avail. What I'd like to do is to generate a new variable within a data frame, the values of which are dependent upon the values of an existing variable within that data frame. Assume that I have the following data: mydf<-data.frame(region=c(rep("North", 5), rep("East", 5), rep("South", 5), rep("West", 5))) Assume, in addition, that I have a factor vector with four values (I actually have a factor with almost two-hundred values): element<-c("earth", "water", "air", "fire") I would like to add a new variable to the data frame (called "element") such that the value of "element" is "earth" in each observation for which mydf$region=="North", etc. In STATA, this was relatively easy; is there a simple way to do this in R? This is what the final result should look like:> mydfregion element 1 North earth 2 North earth 3 North earth 4 North earth 5 North earth 6 East water 7 East water 8 East water 9 East water 10 East water 11 South air 12 South air 13 South air 14 South air 15 South air 16 West fire 17 West fire 18 West fire 19 West fire 20 West fire Thanks in advance, Josip
Christos Hatzis
2009-Feb-09 19:48 UTC
[R] Generating new variable based on values of an existing variable
One way to do this is through transform, assuming that there is one-to-one correspondence between regions and elements: mydf <- data.frame(region=c(rep("North", 5), rep("East", 5), rep("South", 5), rep("West", 5))) elements <- c("earth", "water", "air", "fire") transform(mydf, element = factor(region, levels=c("North", "East", "South", "West"), labels=elements)) -Christos> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Josip Dasovic > Sent: Monday, February 09, 2009 2:30 PM > To: r-help at r-project.org > Subject: [R] Generating new variable based on values of an > existing variable > > Dear R Help-Listers: > > I have a problem that seems like it should have a simple > solution, but I've spent hours on it (and searching the > r-help archives) to no avail. What I'd like to do is to > generate a new variable within a data frame, the values of > which are dependent upon the values of an existing variable > within that data frame. > > Assume that I have the following data: > > mydf<-data.frame(region=c(rep("North", 5), rep("East", 5), > rep("South", 5), rep("West", 5))) > > Assume, in addition, that I have a factor vector with four > values (I actually have a factor with almost two-hundred values): > > element<-c("earth", "water", "air", "fire") > > I would like to add a new variable to the data frame (called > "element") such that the value of "element" is "earth" in > each observation for which mydf$region=="North", etc. In > STATA, this was relatively easy; is there a simple way to do > this in R? > > This is what the final result should look like: > > > mydf > region element > 1 North earth > 2 North earth > 3 North earth > 4 North earth > 5 North earth > 6 East water > 7 East water > 8 East water > 9 East water > 10 East water > 11 South air > 12 South air > 13 South air > 14 South air > 15 South air > 16 West fire > 17 West fire > 18 West fire > 19 West fire > 20 West fire > > Thanks in advance, > Josip > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Marc Schwartz
2009-Feb-09 20:07 UTC
[R] Generating new variable based on values of an existing variable
on 02/09/2009 01:30 PM Josip Dasovic wrote:> Dear R Help-Listers: > > I have a problem that seems like it should have a simple solution, but I've spent hours on it (and searching the r-help archives) to no avail. What I'd like to do is to generate a new variable within a data frame, the values of which are dependent upon the values of an existing variable within that data frame. > > Assume that I have the following data: > > mydf<-data.frame(region=c(rep("North", 5), rep("East", 5), rep("South", 5), rep("West", 5))) > > Assume, in addition, that I have a factor vector with four values (I actually have a factor with almost two-hundred values): > > element<-c("earth", "water", "air", "fire") > > I would like to add a new variable to the data frame (called "element") such that the value of "element" is "earth" in each observation for which mydf$region=="North", etc. In STATA, this was relatively easy; is there a simple way to do this in R? > > This is what the final result should look like: > >> mydf > region element > 1 North earth > 2 North earth > 3 North earth > 4 North earth > 5 North earth > 6 East water > 7 East water > 8 East water > 9 East water > 10 East water > 11 South air > 12 South air > 13 South air > 14 South air > 15 South air > 16 West fire > 17 West fire > 18 West fire > 19 West fire > 20 West fire > > Thanks in advance, > JosipI am going to presume that unlike your example data above, the real data may not be sequenced in unique sequential runs. Thus, a more general approach would be to set mydf$region as a factor, with the factor levels set to match 1:1 the sequence in 'elements': mydf$region <- factor(mydf$region, levels = c("North", "East", "South", "West")) element <- c("earth", "water", "air", "fire") # Set mydf$element to the value in 'element' which corresponds to the # underlying factor integer code for mydf$region mydf$element <- element[as.numeric(mydf$region)]> mydfregion element 1 North earth 2 North earth 3 North earth 4 North earth 5 North earth 6 East water 7 East water 8 East water 9 East water 10 East water 11 South air 12 South air 13 South air 14 South air 15 South air 16 West fire 17 West fire 18 West fire 19 West fire 20 West fire HTH, Marc Schwartz