Hi! I am doing a lapply and for comparaison and I get that for is faster than lapply. What I have done: n<-100000 set.seed(123) x<-rnorm(n) y<-x+rnorm(n) rand.data<-data.frame(x,y) k<-100 samples<-split(sample(1:n),rep(1:k,length=n)) res<-list() t<-Sys.time() for(i in 1:100){ modelo<-lm(y~x,rand.data[-samples[[i]]]) prediccion<-predict(modelo,rand.data[samples[[i]],]) res[[i]] <- (prediccion - rand.data$y[samples[[i]]]) } print(Sys.time()-t) Which takes 8.042 seconds and using Lapply cv.fold.fun <- function(index){ fit <- lm(y~x, data = rand.data[-samples[[index]],]) pred <- predict(fit, newdata = rand.data[samples[[index]],]) return((pred - rand.data$y[samples[[index]]])^2) } t<-Sys.time() nuevo<-lapply(seq(along = samples),cv.fold.fun) print(Sys.time()-t) Which takes 9.56 seconds. So... has been improved the FOR loop on R??? Thanks! [[alternative HTML version deleted]]
The lapply loop and the for loop have very similar speed characteristics. Differences seen are almost always due to how you use memory in the body of the loop. This fact is not new. You may be under the incorrect assumption that using lapply is somehow equivalent to "vectorization", which it is not. -- Sent from my phone. Please excuse my brevity. On August 7, 2017 7:29:58 AM PDT, "Jes?s Para Fern?ndez" <j.para.fernandez at hotmail.com> wrote:>Hi! > >I am doing a lapply and for comparaison and I get that for is faster >than lapply. > > >What I have done: > > > >n<-100000 >set.seed(123) >x<-rnorm(n) >y<-x+rnorm(n) >rand.data<-data.frame(x,y) >k<-100 >samples<-split(sample(1:n),rep(1:k,length=n)) > >res<-list() >t<-Sys.time() >for(i in 1:100){ > modelo<-lm(y~x,rand.data[-samples[[i]]]) > prediccion<-predict(modelo,rand.data[samples[[i]],]) > res[[i]] <- (prediccion - rand.data$y[samples[[i]]]) > >} >print(Sys.time()-t) > >Which takes 8.042 seconds > >and using Lapply > >cv.fold.fun <- function(index){ > fit <- lm(y~x, data = rand.data[-samples[[index]],]) > pred <- predict(fit, newdata = rand.data[samples[[index]],]) > return((pred - rand.data$y[samples[[index]]])^2) > } > > >t<-Sys.time() > >nuevo<-lapply(seq(along = samples),cv.fold.fun) >print(Sys.time()-t) > > >Which takes 9.56 seconds. > >So... has been improved the FOR loop on R??? > >Thanks! > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
A Google search on "lapply vs for r" or "lapply vs loop r" might have saved you some trouble. Many people have debunked this myth. Strangely they all start out with "everyone knows" or "it is commonly said that." I'm sure someone must have said it, but no one seems to be able to provide an authoritative citation before proceeding to demonstrate that it is false. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jes?s Para Fern?ndez Sent: Monday, August 7, 2017 9:30 AM To: r-help at r-project.org Subject: [R] Has For bucle be impooved in R Hi! I am doing a lapply and for comparaison and I get that for is faster than lapply. What I have done: n<-100000 set.seed(123) x<-rnorm(n) y<-x+rnorm(n) rand.data<-data.frame(x,y) k<-100 samples<-split(sample(1:n),rep(1:k,length=n)) res<-list() t<-Sys.time() for(i in 1:100){ modelo<-lm(y~x,rand.data[-samples[[i]]]) prediccion<-predict(modelo,rand.data[samples[[i]],]) res[[i]] <- (prediccion - rand.data$y[samples[[i]]]) } print(Sys.time()-t) Which takes 8.042 seconds and using Lapply cv.fold.fun <- function(index){ fit <- lm(y~x, data = rand.data[-samples[[index]],]) pred <- predict(fit, newdata = rand.data[samples[[index]],]) return((pred - rand.data$y[samples[[index]]])^2) } t<-Sys.time() nuevo<-lapply(seq(along = samples),cv.fold.fun) print(Sys.time()-t) Which takes 9.56 seconds. So... has been improved the FOR loop on R??? Thanks! [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Jesus, The difference is marginal when each code chunk does the same things. Your for loop does not yields the same output as the lapply. Here is the cleaned version of your code. n<-10000 set.seed(123) x<-rnorm(n) y<-x+rnorm(n) rand.data<-data.frame(x,y) k<-100 samples <- split(sample(n), rep(seq_len(k),length=n)) library(microbenchmark) microbenchmark( "for" = { res <- vector("list", length(samples)) for(index in seq_along(samples)) { fit <- lm(y~x, data = rand.data[-samples[[index]],]) pred <- predict(fit, newdata = rand.data[samples[[index]],]) res[[i]] <- ((pred - rand.data$y[samples[[index]]])^2) } }, lapply = { cv.fold.fun <- function(index){ fit <- lm(y~x, data = rand.data[-samples[[index]],]) pred <- predict(fit, newdata = rand.data[samples[[index]],]) return((pred - rand.data$y[samples[[index]]])^2) } lapply(seq_along(samples), cv.fold.fun) } ) Unit: milliseconds expr min lq mean median uq max neval cld for 866.4196 897.3137 949.8155 926.1918 946.8390 1767.463 100 a lapply 837.7804 889.6620 947.2401 909.9946 939.6379 2476.415 100 a Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2017-08-07 16:48 GMT+02:00 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>:> The lapply loop and the for loop have very similar speed characteristics. > Differences seen are almost always due to how you use memory in the body of > the loop. This fact is not new. You may be under the incorrect assumption > that using lapply is somehow equivalent to "vectorization", which it is not. > -- > Sent from my phone. Please excuse my brevity. > > On August 7, 2017 7:29:58 AM PDT, "Jes?s Para Fern?ndez" < > j.para.fernandez at hotmail.com> wrote: > >Hi! > > > >I am doing a lapply and for comparaison and I get that for is faster > >than lapply. > > > > > >What I have done: > > > > > > > >n<-100000 > >set.seed(123) > >x<-rnorm(n) > >y<-x+rnorm(n) > >rand.data<-data.frame(x,y) > >k<-100 > >samples<-split(sample(1:n),rep(1:k,length=n)) > > > >res<-list() > >t<-Sys.time() > >for(i in 1:100){ > > modelo<-lm(y~x,rand.data[-samples[[i]]]) > > prediccion<-predict(modelo,rand.data[samples[[i]],]) > > res[[i]] <- (prediccion - rand.data$y[samples[[i]]]) > > > >} > >print(Sys.time()-t) > > > >Which takes 8.042 seconds > > > >and using Lapply > > > >cv.fold.fun <- function(index){ > > fit <- lm(y~x, data = rand.data[-samples[[index]],]) > > pred <- predict(fit, newdata = rand.data[samples[[index]],]) > > return((pred - rand.data$y[samples[[index]]])^2) > > } > > > > > >t<-Sys.time() > > > >nuevo<-lapply(seq(along = samples),cv.fold.fun) > >print(Sys.time()-t) > > > > > >Which takes 9.56 seconds. > > > >So... has been improved the FOR loop on R??? > > > >Thanks! > > > > > > > > > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]