Faranak Golestaneh
2015-Jan-15 10:38 UTC
[R] How to speed up the simulation when processing time and date
Dear Friends, I am trying to program a forecasting method in R. The predictors are weather variables in addition to lag measured Power values. The accuracy of data is one minute and their corresponding time and date are available. To add lag values of power to the predictors list, I am aiming to consider last ten minutes values. If I was sure that the database is perfect and the values for all minutes throughout the year are available I could simply shift the Power columns but as it may not be always the case, I have used the following codes for each time t to check if all its corresponding ten minutes lag values are available and extract them and store in a matrix. The problem is that, the process is highly time consuming and it takes a long time to be simulated. Here I ve given reproducible example. I was wondering any of you can suggest a better approach. Thank you. rm(list = ls()) cat("\014") st="2012/01/01" et="2012/02/27" st <- as.POSIXlt(as.Date(st)) et <- as.POSIXlt(as.Date(et)) time= seq(from=st, to=et,by=60) time<as.POSIXlt(time) #Window is the number of lag values #leadTime is look-ahead time (forecast horizon) leadTime=10; Window=15; zzzz=time[1:8000] Total_Zone1=abind(matrix(rnorm(4000*2),4000*2,1), matrix(rnorm(4000*2),4000*2,1), matrix(rnorm(4000*2),4000*2,1),time[1:8000]) N_Train=nrow(Total_Zone1); lag_Power=matrix(0,N_Train,Window) colnames(Total_Zone1) <- c( "airtemp","humidity", "Power", "time") Total_Zone1<- as.data.frame(Total_Zone1) for (tt in 4000:N_Train){ Statlag=Total_Zone1$time[tt]-(leadTime+Window)*60 EndLag=Total_Zone1$time[tt]-(leadTime)*60 Index_lags=which((Total_Zone1$time>Statlag)&(Total_Zone1$time<=EndLag)) if (size(Index_lags)[2]<Window) { Statlag2=Total_Zone1$time[tt]-24*60*60 Index_lags2=which(Total_Zone1$time==Statlag2) tem1=rep(Total_Zone1[Index_lags2,c("Power")],Window-size(Index_lags)[2]) lag_Power[tt,]=t(c(Total_Zone1[Index_lags,c("Power")],tem1)) }else{ lag_Power[tt,]=t(Total_Zone1[Index_lags,c("Power")]) } } [[alternative HTML version deleted]]
MacQueen, Don
2015-Jan-15 20:07 UTC
[R] How to speed up the simulation when processing time and date
I don't have time to look at your example in detail, but there are couple of things that caught my eye. Use as.POSIXct() instead of as.POSIXlt() I don't see anything that requires POSIXlt, and POSIXct is simpler. If everything in Total_Zone1 is numeric, then leave it as a matrix, do not convert to data frame. If you use as.POSIXct() then the times are actually the number of seconds since an origin, and thus can be treated as numeric, making it possible to leave Total_Zone1 as a matrix. If it is a matrix, you can refer to the times using Total_Zone1[,'time'] instead of Total_Zone1$time Either of these might help speed things up, though I can't be sure without trying it. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/15/15, 2:38 AM, "Faranak Golestaneh" <faranak.golestaneh at gmail.com> wrote:>Dear Friends, > > >I am trying to program a forecasting method in R. The predictors are >weather variables in addition to lag measured Power values. The accuracy >of >data is one minute and their corresponding time and date are available. > >To add lag values of power to the predictors list, I am aiming to consider >last ten minutes values. If I was sure that the database is perfect and >the >values for all minutes throughout the year are available I could simply >shift the Power columns but as it may not be always the case, I have used >the following codes for each time t to check if all its corresponding ten >minutes lag values are available and extract them and store in a matrix. >The problem is that, the process is highly time consuming and it takes a >long time to be simulated. Here I ve given reproducible example. I was >wondering any of you can suggest a better approach. Thank you. > > > >rm(list = ls()) > >cat("\014") > > > >st="2012/01/01" > >et="2012/02/27" > > > >st <- as.POSIXlt(as.Date(st)) > >et <- as.POSIXlt(as.Date(et)) > >time= seq(from=st, to=et,by=60) > >time<as.POSIXlt(time) > >#Window is the number of lag values > >#leadTime is look-ahead time (forecast horizon) > >leadTime=10; > >Window=15; > > > >zzzz=time[1:8000] > >Total_Zone1=abind(matrix(rnorm(4000*2),4000*2,1), >matrix(rnorm(4000*2),4000*2,1), >matrix(rnorm(4000*2),4000*2,1),time[1:8000]) > >N_Train=nrow(Total_Zone1); > >lag_Power=matrix(0,N_Train,Window) > >colnames(Total_Zone1) <- c( "airtemp","humidity", "Power", "time") > >Total_Zone1<- as.data.frame(Total_Zone1) > >for (tt in 4000:N_Train){ > > Statlag=Total_Zone1$time[tt]-(leadTime+Window)*60 > > EndLag=Total_Zone1$time[tt]-(leadTime)*60 > > Index_lags=which((Total_Zone1$time>Statlag)&(Total_Zone1$time<=EndLag)) > > if (size(Index_lags)[2]<Window) { > > Statlag2=Total_Zone1$time[tt]-24*60*60 > > Index_lags2=which(Total_Zone1$time==Statlag2) > > >tem1=rep(Total_Zone1[Index_lags2,c("Power")],Window-size(Index_lags)[2]) > > lag_Power[tt,]=t(c(Total_Zone1[Index_lags,c("Power")],tem1)) > > }else{ > > lag_Power[tt,]=t(Total_Zone1[Index_lags,c("Power")]) > > } > >} > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.