Juan Carlos Laguardia
2009-May-29 23:17 UTC
[R] code optimization problem ... using or not using "which" function
hello all, I have two data sets that share certain fields of of interest ( facility, unit, date) which I want to match up, and from this extract information from one dataset and store it in the other. my first initial idea (which I know is bad) goes like this: ## capacity and new_trayloc are datasets in example code: for( i in 1: nrow( new_trayloc) { theshifts<-which(as.Date(capacity$shift_dt) == new_trayloc$admit_dt[i] & as.character(capacity$unit)==as.character(new_trayloc$UNIT_1[i]) & as.character(capacity$fac_id)==as.character(new_trayloc$ORIG_FAC_ID[i])) thenightshifts<-which(as.Date(capacity$shift_dt) == new_trayloc$admit_dt[i]-1 & as.character(capacity$unit)==as.character(new_trayloc$UNIT_1[i]) & as.character(capacity$fac_id)==as.character(new_trayloc$ORIG_FAC_ID[i])) ..... obtain information by using theshifts and thenightshifts objects and store in new_trayloc } . by doing a system.time on the entire for loop for 5 iterations, i get a time of user system elapsed 25.66 1.04 26.72 That seems really bad... and plus, i need to run it for over 100,000 iterations. Any suggestions in either the way I match the fields, or my approach to my problem? Cheers, Juan Carlos
jim holtman
2009-May-30 00:55 UTC
[R] code optimization problem ... using or not using "which" function
For a start, do all your conversions to character and Date once outside the loop so you are not doing them for each iteration. Not exactly sure what you are doing, but it looks like with the 'and's you are only checking for the rows that are the same. You might want to use a 'match' function like: x <- match(capacity$shift_dt, new_trayloc$admin) to get where each of the items match and then when you have done it for the three conditions, you then find columns that have the same number indicating all condition match for that row. On Fri, May 29, 2009 at 7:17 PM, Juan Carlos Laguardia < brassman785@gmail.com> wrote:> hello all, > > I have two data sets that share certain fields of of interest ( > facility, unit, date) which I want to match up, and from this extract > information from one dataset and store it in the other. > > my first initial idea (which I know is bad) goes like this: > > ## capacity and new_trayloc are datasets in example code: > > for( i in 1: nrow( new_trayloc) { > > > theshifts<-which(as.Date(capacity$shift_dt) == new_trayloc$admit_dt[i] & > as.character(capacity$unit)==as.character(new_trayloc$UNIT_1[i]) & > > as.character(capacity$fac_id)==as.character(new_trayloc$ORIG_FAC_ID[i])) > > > thenightshifts<-which(as.Date(capacity$shift_dt) => new_trayloc$admit_dt[i]-1 & > as.character(capacity$unit)==as.character(new_trayloc$UNIT_1[i]) & > > as.character(capacity$fac_id)==as.character(new_trayloc$ORIG_FAC_ID[i])) > > > ..... obtain information by using theshifts and thenightshifts objects > and store in new_trayloc > > } > > . by doing a system.time on the entire for loop for 5 iterations, i > get a time of > user system elapsed > 25.66 1.04 26.72 > > That seems really bad... and plus, i need to run it for over 100,000 > iterations. > > Any suggestions in either the way I match the fields, or my approach > to my problem? > > > Cheers, > Juan Carlos > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]