Lingyi Ma
2014-Aug-05 06:51 UTC
[R] How to optimizing the code for calculating the launch time
My dataset: Item_Id Year_Month B65623262 201204 B58279745 201204 B33671102 201204 B36630946 201204 B63270151 201204 B63270133 201204 I have written my code to calculate one more column which is the product maturity time as the following: launchtime<-function(g){ su<-data.frame(NULL) #product maturity time g<-g[order(g$Item_Id,g$Year_Month),] #get the lauching time index2<-unique(g$Item_Id) for(u in 1:length(index2)){ m2<-g[g$Item_Id==index2[u],] lt<-numeric(0) lt[1]<-0 year<-as.numeric(substring(m2$Year_Month,1,4)) month<-as.numeric(substring(m2$Year_Month,5,6)) if(dim(m2)[1]==1){}else{ for(i in 1:(dim(m2)[1]-1)){ j<-i+1 lt[j]<-(12*(year[j]-year[1])+month[j])-(month[1]) }} g2<-cbind(m2,lt) su<-rbind(su,g2) } return(su) } How to optimize my code-. it takes so long time to run. Kind regards, Tammy [[alternative HTML version deleted]]
David McPearson
2014-Aug-06 07:59 UTC
[R] How to optimizing the code for calculating the launch time
On Tue, 5 Aug 2014 09:51:56 +0300 Lingyi Ma <lingyi.ma at gmail.com> wrote> My dataset: > > Item_Id Year_Month > B65623262 201204 > B58279745 201204 > B33671102 201204 > B36630946 201204 > B63270151 201204 > B63270133 201204 > > > > I have written my code to calculate one more column which is the product > maturity time as the following: > > launchtime<-function(g){ > > su<-data.frame(NULL) #product maturity time > > g<-g[order(g$Item_Id,g$Year_Month),] > > #get the lauching time > index2<-unique(g$Item_Id) > > for(u in 1:length(index2)){ > m2<-g[g$Item_Id==index2[u],] > > lt<-numeric(0) > lt[1]<-0 > year<-as.numeric(substring(m2$Year_Month,1,4)) > month<-as.numeric(substring(m2$Year_Month,5,6)) > if(dim(m2)[1]==1){}else{ > for(i in 1:(dim(m2)[1]-1)){ > j<-i+1 > lt[j]<-(12*(year[j]-year[1])+month[j])-(month[1]) > }} > g2<-cbind(m2,lt) > su<-rbind(su,g2) > } > > return(su) > > } > > > > How to optimize my code-. it takes so long time to run. > > > Kind regards, > Tammy > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Please read the posting guide - and the footer to just about every message on the list (just above): Do not post html - it makes your messages hard to read. Sample data and an example of the expected output will also go a long way to help us understand your problem and help you. Having writ that, welcome to Circles 1, 2, 3, 3.1, and 8.1.30 of The R Inferno http://www.burns-stat.com/documents/books/the-r-inferno/ Get a copy. Read it. The time it takes will pay off ten-fold in coping with common R mistakes. (And we should all thank Dr Burns for providing the download at the best of all possible prices...) (Don't try and understand it all at once - take note of the bits that make sense.) To optimise your code: * get rid of the inner loop (I don't think you need it) * do not grow objects if possible, especially do not grow them inside a loop * to get the results (I think) you want, put in the parentheses you (seem to have) missed. My version of your code below. Hopefully the comments will be of assistance. #=================================================================== launchtime <- function(g) { g <- cbind(g[order(g$Item_Id,g$Year_Month),], launch = NA] # Inside a function you are working with a copy of the data. # This will not change the original and will (should be) safe. #get the lauching time index2 <- unique(g$Item_Id) next_row <- 1 # for(u in 1:length(index2)){ # m2<-g[g$Item_Id==index2[u],] # for (u in seq_along(index2)) is better, but in this case I prefer for (u in index2) { m2 <- g[g$Item_Id == index2] year<-as.numeric(substring(m2$Year_Month,1,4)) month<-as.numeric(substring(m2$Year_Month,5,6)) # hmmm - probably better to use "as.integer()" in this case. # Also, if Year_Month is already numeric, the following might # work better (or not...) # year <- floor(m2$Year_Month / 100) # month <- m2$Year_Month %% 100 # I don't think you need to loop... lt <- 12 * (year - year[1]) + month - month[1] # (Parentheses are mine - but I'm sure you need them) g$launch[next_row:(next_row + length(lt) - 1)] <- lt next_row <- next_row + length(lt) # given that we've eliminated the inner loop, you could probably # cbind() and rbind() OK - but this should be (at least a bit) # faster } g } # #===================================================================Cheers. ____________________________________________________________ South Africas premier free email service - www.webmail.co.za Cotlands - Shaping tomorrows Heroes http://www.cotlands.org.za/