Dimitri Liakhovitski
2012-Dec-07 12:27 UTC
[R] Assigning cases to groupings based on the values of several variables
Dear R-ers, my task is to simple: to assign cases to desired groupings based on the combined values on 2 variables. I can think of 3 methods of doing it. Method 1 seems to me pretty r-like, but it requires a lot of lines of code - onerous. Method 2 is a loop, so not very good - as it loops through all rows of mydata. Method 3 is a loop but loops through fewer lines, so it seems to me more efficient. Can you please tell me: 1. Which of my methods is more efficient? 2. Is there maybe an even more efficient r-like way of doing it? Imagine - "mydata" is actually a very tall data frame. Thanks a lot! Dimitri ### My Data: mydata<-data.frame(sex=rep(c(rep("m",4),rep("f",4)),2),age=rep(c(1:4,1:4),2)) (mydata) ### My desired assignments (in column "mygroup") groupings<-data.frame(sex=c(rep("m",4),rep("f",4)),age=c(1:4,1:4),mygroup=1:8) (groupings) # No, I don't need a solution where the last column of "groupings" is stacked twice and bound to "mydata" # Method 1 of assigning to groups - requires a lot of lines of code: mydata$mygroup.m1<-NA mydata[(mydata$sex %in% "m")&(mydata$age %in% 1),"mygroup.m1"]<-1 mydata[(mydata$sex %in% "m")&(mydata$age %in% 2),"mygroup.m1"]<-2 mydata[(mydata$sex %in% "m")&(mydata$age %in% 3),"mygroup.m1"]<-3 mydata[(mydata$sex %in% "m")&(mydata$age %in% 4),"mygroup.m1"]<-4 mydata[(mydata$sex %in% "f")&(mydata$age %in% 1),"mygroup.m1"]<-5 mydata[(mydata$sex %in% "f")&(mydata$age %in% 2),"mygroup.m1"]<-6 mydata[(mydata$sex %in% "f")&(mydata$age %in% 3),"mygroup.m1"]<-7 mydata[(mydata$sex %in% "f")&(mydata$age %in% 4),"mygroup.m1"]<-8 (mydata) # Method 2 of assigning to groups - very "loopy": mydata$mygroup.m2<-NA for(i in 1:nrow(mydata)){ # i<-1 mysex<-mydata[i,"sex"] myage<-mydata[i,"age"] mydata[i,"mygroup.m2"]<-groupings[(groupings$sex %in% mysex)&(groupings$age %in% myage),"mygroup"] } (mydata) # Method 3 of assigning to groups - also "loopy", but less than Method 2: mydata$mygroup.m3<-NA for(i in 1:nrow(groupings)){ # i<-1 mysex<-groupings[i,"sex"] myage<-groupings[i,"age"] mydata[(mydata$sex %in% mysex)&(mydata$age %in% myage),"mygroup.m3"]<-groupings[i,"mygroup"] } (mydata) -- Dimitri Liakhovitski gfk.com <http://marketfusionanalytics.com/> [[alternative HTML version deleted]]
Duncan Murdoch
2012-Dec-07 12:54 UTC
[R] Assigning cases to groupings based on the values of several variables
On 12-12-07 7:27 AM, Dimitri Liakhovitski wrote:> Dear R-ers, > > my task is to simple: to assign cases to desired groupings based on the > combined values on 2 variables. I can think of 3 methods of doing it. > Method 1 seems to me pretty r-like, but it requires a lot of lines of code > - onerous.Since your groups are so regular, you can compute the groups directly. Convert each column to a factor (this might have happened automatically, depending on your data and options), then use as.integer to convert to a numeric value. So a simple solution would be mydata$mygroup.m4 <- with(mydata, 4*(2-as.integer(factor(sex))) + as.integer(factor(age))) It would be a little simpler if you wanted the sex factor in alphbetical order; then you wouldn't need to subtract from 2. If your real data wasn't so regular, another approach would be to set up a matrix, indexed by sex and age, that gives the desired group number. That is somewhat like your "groupings" solution; I'm not sure it would be preferable to what you did. Duncan Murdoch> Method 2 is a loop, so not very good - as it loops through all rows of > mydata. > Method 3 is a loop but loops through fewer lines, so it seems to me more > efficient. > Can you please tell me: > 1. Which of my methods is more efficient? > 2. Is there maybe an even more efficient r-like way of doing it? > Imagine - "mydata" is actually a very tall data frame. > Thanks a lot! > Dimitri > > ### My Data: > mydata<-data.frame(sex=rep(c(rep("m",4),rep("f",4)),2),age=rep(c(1:4,1:4),2)) > (mydata) > > ### My desired assignments (in column "mygroup") > groupings<-data.frame(sex=c(rep("m",4),rep("f",4)),age=c(1:4,1:4),mygroup=1:8) > (groupings) > > # No, I don't need a solution where the last column of "groupings" is > stacked twice and bound to "mydata" > > # Method 1 of assigning to groups - requires a lot of lines of code: > mydata$mygroup.m1<-NA > mydata[(mydata$sex %in% "m")&(mydata$age %in% 1),"mygroup.m1"]<-1 > mydata[(mydata$sex %in% "m")&(mydata$age %in% 2),"mygroup.m1"]<-2 > mydata[(mydata$sex %in% "m")&(mydata$age %in% 3),"mygroup.m1"]<-3 > mydata[(mydata$sex %in% "m")&(mydata$age %in% 4),"mygroup.m1"]<-4 > mydata[(mydata$sex %in% "f")&(mydata$age %in% 1),"mygroup.m1"]<-5 > mydata[(mydata$sex %in% "f")&(mydata$age %in% 2),"mygroup.m1"]<-6 > mydata[(mydata$sex %in% "f")&(mydata$age %in% 3),"mygroup.m1"]<-7 > mydata[(mydata$sex %in% "f")&(mydata$age %in% 4),"mygroup.m1"]<-8 > (mydata) > > # Method 2 of assigning to groups - very "loopy": > mydata$mygroup.m2<-NA > for(i in 1:nrow(mydata)){ # i<-1 > mysex<-mydata[i,"sex"] > myage<-mydata[i,"age"] > mydata[i,"mygroup.m2"]<-groupings[(groupings$sex %in% > mysex)&(groupings$age %in% myage),"mygroup"] > } > (mydata) > > # Method 3 of assigning to groups - also "loopy", but less than Method 2: > mydata$mygroup.m3<-NA > for(i in 1:nrow(groupings)){ # i<-1 > mysex<-groupings[i,"sex"] > myage<-groupings[i,"age"] > mydata[(mydata$sex %in% mysex)&(mydata$age %in% > myage),"mygroup.m3"]<-groupings[i,"mygroup"] > } > (mydata) >
arun
2012-Dec-07 13:09 UTC
[R] Assigning cases to groupings based on the values of several variables
HI, In your method2 and method3, you are using the groupings data.? If that is the case, is it possible for you to use ?merge() or ?join() from library(plyr) ?join(mydata,groupings,by=c("sex","age"),type="inner") ?#? sex age mygroup #1??? m?? 1?????? 1 #2??? m?? 2?????? 2 #3??? m?? 3?????? 3 #4??? m?? 4?????? 4 #5??? f?? 1?????? 5 #6??? f?? 2?????? 6 #7??? f?? 3?????? 7 #8??? f?? 4?????? 8 #9??? m?? 1?????? 1 #10?? m?? 2?????? 2 #11?? m?? 3?????? 3 #12?? m?? 4?????? 4 #13?? f?? 1?????? 5 #14?? f?? 2?????? 6 #15?? f?? 3?????? 7 #16?? f?? 4?????? 8 A.K. ----- Original Message ----- From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> To: r-help <r-help at r-project.org> Cc: Sent: Friday, December 7, 2012 7:27 AM Subject: [R] Assigning cases to groupings based on the values of several variables Dear R-ers, my task is to simple: to assign cases to desired groupings based on the combined values on 2 variables. I can think of 3 methods of doing it. Method 1 seems to me pretty r-like, but it requires a lot of lines of code - onerous. Method 2 is a loop, so not very good - as it loops through all rows of mydata. Method 3 is a loop but loops through fewer lines, so it seems to me more efficient. Can you please tell me: 1. Which of my methods is more efficient? 2. Is there maybe an even more efficient r-like way of doing it? Imagine - "mydata" is actually a very tall data frame. Thanks a lot! Dimitri ### My Data: mydata<-data.frame(sex=rep(c(rep("m",4),rep("f",4)),2),age=rep(c(1:4,1:4),2)) (mydata) ### My desired assignments (in column "mygroup") groupings<-data.frame(sex=c(rep("m",4),rep("f",4)),age=c(1:4,1:4),mygroup=1:8) (groupings) # No, I don't need a solution where the last column of "groupings" is stacked twice and bound to "mydata" # Method 1 of assigning to groups - requires a lot of lines of code: mydata$mygroup.m1<-NA mydata[(mydata$sex %in% "m")&(mydata$age %in% 1),"mygroup.m1"]<-1 mydata[(mydata$sex %in% "m")&(mydata$age %in% 2),"mygroup.m1"]<-2 mydata[(mydata$sex %in% "m")&(mydata$age %in% 3),"mygroup.m1"]<-3 mydata[(mydata$sex %in% "m")&(mydata$age %in% 4),"mygroup.m1"]<-4 mydata[(mydata$sex %in% "f")&(mydata$age %in% 1),"mygroup.m1"]<-5 mydata[(mydata$sex %in% "f")&(mydata$age %in% 2),"mygroup.m1"]<-6 mydata[(mydata$sex %in% "f")&(mydata$age %in% 3),"mygroup.m1"]<-7 mydata[(mydata$sex %in% "f")&(mydata$age %in% 4),"mygroup.m1"]<-8 (mydata) # Method 2 of assigning to groups - very "loopy": mydata$mygroup.m2<-NA for(i in 1:nrow(mydata)){? # i<-1 ? mysex<-mydata[i,"sex"] ? myage<-mydata[i,"age"] ? mydata[i,"mygroup.m2"]<-groupings[(groupings$sex %in% mysex)&(groupings$age %in% myage),"mygroup"] } (mydata) # Method 3 of assigning to groups - also "loopy", but less than Method 2: mydata$mygroup.m3<-NA for(i in 1:nrow(groupings)){? # i<-1 ? mysex<-groupings[i,"sex"] ? myage<-groupings[i,"age"] ? mydata[(mydata$sex %in% mysex)&(mydata$age %in% myage),"mygroup.m3"]<-groupings[i,"mygroup"] } (mydata) -- Dimitri Liakhovitski gfk.com <http://marketfusionanalytics.com/> ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Maybe Matching Threads
- formatting a 6 million row data set; creating a censoring variable
- data manipulation and summaries with few million rows
- code works in R desktop but not iin RWeb - How do I modify to get it working in RWeb, please?
- F values from a Repeated Measures aov
- how to plot two histograms overlapped in the same plane coordinate