Walter Anderson
2011-Apr-06 20:02 UTC
[R] Need a more efficient way to implement this type of logic in R
I have cobbled together the following logic. It works but is very slow. I'm sure that there must be a better r-specific way to implement this kind of thing, but have been unable to find/understand one. Any help would be appreciated. hh.sub <- households[c("HOUSEID","HHFAMINC")] for (indx in 1:length(hh.sub$HOUSEID)) { if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') | (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') | (hh.sub$HHFAMINC[indx] == '05')) hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000 if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') | (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') | (hh.sub$HHFAMINC[indx] == '10')) hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000 if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') | (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') | (hh.sub$HHFAMINC[indx] == '15')) hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000 if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17')) hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000 if ((hh.sub$HHFAMINC[indx] == '18')) hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000 if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') | (hh.sub$HHFAMINC[indx] == '-9')) hh.sub$CS_FAMINC[indx] = 0 }
Duncan Murdoch
2011-Apr-06 20:48 UTC
[R] Need a more efficient way to implement this type of logic in R
On 06/04/2011 4:02 PM, Walter Anderson wrote:> I have cobbled together the following logic. It works but is very > slow. I'm sure that there must be a better r-specific way to implement > this kind of thing, but have been unable to find/understand one. Any > help would be appreciated. > > hh.sub<- households[c("HOUSEID","HHFAMINC")] > for (indx in 1:length(hh.sub$HOUSEID)) { > if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') > | (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') | > (hh.sub$HHFAMINC[indx] == '05')) > hh.sub$CS_FAMINC[indx]<- 1 # Less than $25,000The answer is to think in terms of vectors and logical indexing. The code above is equivalent to hh.sub$CS_FAMINC[ hh.sub$HHFAMINC %in% c('01', '02', '03', '04', '05') ] <- 1 I've left off the rest of the loop, but I think it's similar. Duncan Murdoch
Joshua Wiley
2011-Apr-06 20:49 UTC
[R] Need a more efficient way to implement this type of logic in R
Hi Walter, Take a look at the function ?cut. It is designed to take a continuous variable and categorize it, and will be much simpler and faster. The only qualification is that your data would need to be numeric, not character. However, if your only values are the ones you put in quotes in your code ('02' etc), a simple call to as.numeric(variablename) ought to do the trick. Beyond being faster, you can probably get down to one line of code, which should be much easier on the eyes. To see some examples with cut(), type (at the console): example(cut) Hope this helps, Josh P.S. If you are planning on doing any modelling with this data, why not leave it continuous? On Wed, Apr 6, 2011 at 1:02 PM, Walter Anderson <wandrson01 at gmail.com> wrote:> ?I have cobbled together the following logic. ?It works but is very slow. > ?I'm sure that there must be a better r-specific way to implement this kind > of thing, but have been unable to find/understand one. ?Any help would be > appreciated. > > hh.sub <- households[c("HOUSEID","HHFAMINC")] > for (indx in 1:length(hh.sub$HOUSEID)) { > ?if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') | > (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') | > (hh.sub$HHFAMINC[indx] == '05')) > ? ?hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000 > ?if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') | > (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') | > (hh.sub$HHFAMINC[indx] == '10')) > ? ?hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000 > ?if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') | > (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') | > (hh.sub$HHFAMINC[indx] == '15')) > ? ?hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000 > ?if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17')) > ? ?hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000 > ?if ((hh.sub$HHFAMINC[indx] == '18')) > ? ?hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000 > ?if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') | > (hh.sub$HHFAMINC[indx] == '-9')) > ? ?hh.sub$CS_FAMINC[indx] = 0 > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Phil Spector
2011-Apr-06 20:58 UTC
[R] Need a more efficient way to implement this type of logic in R
Walter - Since your codes represent numbers, you could use something like this: chk = as.numeric((hh.sub$HHFAMINC) hh.sub$CS_FAMINC = cut(chk,c(-10,0,5,10,15,17,18),labels=c(0,1:5)) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Wed, 6 Apr 2011, Walter Anderson wrote:> I have cobbled together the following logic. It works but is very slow. > I'm sure that there must be a better r-specific way to implement this kind of > thing, but have been unable to find/understand one. Any help would be > appreciated. > > hh.sub <- households[c("HOUSEID","HHFAMINC")] > for (indx in 1:length(hh.sub$HOUSEID)) { > if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') | > (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') | > (hh.sub$HHFAMINC[indx] == '05')) > hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000 > if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') | > (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') | > (hh.sub$HHFAMINC[indx] == '10')) > hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000 > if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') | > (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') | > (hh.sub$HHFAMINC[indx] == '15')) > hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000 > if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17')) > hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000 > if ((hh.sub$HHFAMINC[indx] == '18')) > hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000 > if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') | > (hh.sub$HHFAMINC[indx] == '-9')) > hh.sub$CS_FAMINC[indx] = 0 > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Alexander Engelhardt
2011-Apr-06 21:04 UTC
[R] Need a more efficient way to implement this type of logic in R
Am 06.04.2011 22:02, schrieb Walter Anderson:> I have cobbled together the following logic. It works but is very slow. > I'm sure that there must be a better r-specific way to implement this > kind of thing, but have been unable to find/understand one. Any help > would be appreciated. > > hh.sub <- households[c("HOUSEID","HHFAMINC")] > for (indx in 1:length(hh.sub$HOUSEID)) { > if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') | > (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') | > (hh.sub$HHFAMINC[indx] == '05')) > hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000 > if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') | > (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') | > (hh.sub$HHFAMINC[indx] == '10')) > hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000 > if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') | > (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') | > (hh.sub$HHFAMINC[indx] == '15')) > hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000 > if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17')) > hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000 > if ((hh.sub$HHFAMINC[indx] == '18')) > hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000 > if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') | > (hh.sub$HHFAMINC[indx] == '-9')) > hh.sub$CS_FAMINC[indx] = 0 > }Hi, the for-loop is entirely unnecessary. You can, as a first step, rewrite the code like this: if ((hh.sub$HHFAMINC == '01') | (hh.sub$HHFAMINC == '02') | (hh.sub$HHFAMINC == '03') | (hh.sub$HHFAMINC == '04') | (hh.sub$HHFAMINC == '05')) hh.sub$CS_FAMINC <- 1 # Less than $25,000 This very basic concept is called "vectorization" in R. You should read about it, it rocks. In this case, though, you don't even need to do that: If you cast the variable HHFAMINC into a number like this: hh.sub$HHFAMINC <- as.numeric(hh.sub$HHFAMINC) , then you can apply the cut() function to create a factor variable: hh.sub$myawesomefactor <- cut(hh.sub$HHFAMINC, breaks=c(5.5, 10.5, 15.5, 17.5)) or something like that should do the trick. You will then have to rename the factor values. I think it is the function names(), but I'm only 95% sure (heh.) Also, this might be my OCD speaking, but I would use NA instead of 0 for non-available values. Have fun, Alex
Petr PIKAL
2011-Apr-07 06:24 UTC
[R] Odp: Need a more efficient way to implement this type of logic in R
Hi r-help-bounces at r-project.org napsal dne 06.04.2011 22:02:29:> I have cobbled together the following logic. It works but is very > slow. I'm sure that there must be a better r-specific way to implement > this kind of thing, but have been unable to find/understand one. Any > help would be appreciated. > > hh.sub <- households[c("HOUSEID","HHFAMINC")] > for (indx in 1:length(hh.sub$HOUSEID)) { > if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02')> | (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') | > (hh.sub$HHFAMINC[indx] == '05')) > hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000 > if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07')> | (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') | > (hh.sub$HHFAMINC[indx] == '10')) > hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000 > if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12')> | (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') | > (hh.sub$HHFAMINC[indx] == '15')) > hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000 > if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] =='17'))> hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000 > if ((hh.sub$HHFAMINC[indx] == '18')) > hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000 > if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8')> | (hh.sub$HHFAMINC[indx] == '-9')) > hh.sub$CS_FAMINC[indx] = 0 > }Take advantage of factors. If hh.sub$HHFAMINC was factor you can recode it by levels(hh.sub$HHFAMINC)<-appropriate vector of new levels with the same length as levels Something like> x<-factor(letters[1:5]) > x[1] a b c d e Levels: a b c d e> levels(x)<-c(1,1,2,2,1) > x[1] 1 1 2 2 1 Levels: 1 2>Regards Petr> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Maybe Matching Threads
- How to eliminate for next loops in this script
- [non-statistics question]methodological problem
- query on converting survey data from one structure to another
- enter a survey design in survey2.9
- error "variable names are limited to 256 bytes" when sourcing code