thr3ads.net - R help - [R] Has For bucle be impooved in R [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Jesús Para Fernández

2017-Aug-07 14:29 UTC

[R] Has For bucle be impooved in R

Hi!

I am doing a lapply and for comparaison and I get that for is faster than
lapply.


What I have done:



n<-100000
set.seed(123)
x<-rnorm(n)
y<-x+rnorm(n)
rand.data<-data.frame(x,y)
k<-100
samples<-split(sample(1:n),rep(1:k,length=n))

res<-list()
t<-Sys.time()
for(i in 1:100){
  modelo<-lm(y~x,rand.data[-samples[[i]]])
  prediccion<-predict(modelo,rand.data[samples[[i]],])
  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])

}
print(Sys.time()-t)

Which takes 8.042 seconds

and using Lapply

cv.fold.fun <- function(index){
   fit <- lm(y~x, data = rand.data[-samples[[index]],])
   pred <- predict(fit, newdata = rand.data[samples[[index]],])
   return((pred - rand.data$y[samples[[index]]])^2)
  }


t<-Sys.time()

nuevo<-lapply(seq(along = samples),cv.fold.fun)
print(Sys.time()-t)


Which takes 9.56 seconds.

So... has been improved the FOR loop on R???

Thanks!





	[[alternative HTML version deleted]]

Jeff Newmiller

2017-Aug-07 14:48 UTC

head link

[R] Has For bucle be impooved in R

The lapply loop and the for loop have very similar speed characteristics.
Differences seen are almost always due to how you use memory in the body of the
loop. This fact is not new. You may be under the incorrect assumption that using
lapply is somehow equivalent to "vectorization", which it is not.
-- 
Sent from my phone. Please excuse my brevity.

On August 7, 2017 7:29:58 AM PDT, "Jes?s Para Fern?ndez"
<j.para.fernandez at hotmail.com> wrote:>Hi!
>
>I am doing a lapply and for comparaison and I get that for is faster
>than lapply.
>
>
>What I have done:
>
>
>
>n<-100000
>set.seed(123)
>x<-rnorm(n)
>y<-x+rnorm(n)
>rand.data<-data.frame(x,y)
>k<-100
>samples<-split(sample(1:n),rep(1:k,length=n))
>
>res<-list()
>t<-Sys.time()
>for(i in 1:100){
>  modelo<-lm(y~x,rand.data[-samples[[i]]])
>  prediccion<-predict(modelo,rand.data[samples[[i]],])
>  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])
>
>}
>print(Sys.time()-t)
>
>Which takes 8.042 seconds
>
>and using Lapply
>
>cv.fold.fun <- function(index){
>   fit <- lm(y~x, data = rand.data[-samples[[index]],])
>   pred <- predict(fit, newdata = rand.data[samples[[index]],])
>   return((pred - rand.data$y[samples[[index]]])^2)
>  }
>
>
>t<-Sys.time()
>
>nuevo<-lapply(seq(along = samples),cv.fold.fun)
>print(Sys.time()-t)
>
>
>Which takes 9.56 seconds.
>
>So... has been improved the FOR loop on R???
>
>Thanks!
>
>
>
>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

David L Carlson

2017-Aug-07 14:57 UTC

head link

[R] Has For bucle be impooved in R

A Google search on "lapply vs for r" or "lapply vs loop r"
might have saved you some trouble. Many people have debunked this myth.
Strangely they all start out with "everyone knows" or "it is
commonly said that." I'm sure someone must have said it, but no one
seems to be able to provide an authoritative citation before proceeding to
demonstrate that it is false.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jes?s Para
Fern?ndez
Sent: Monday, August 7, 2017 9:30 AM
To: r-help at r-project.org
Subject: [R] Has For bucle be impooved in R

Hi!

I am doing a lapply and for comparaison and I get that for is faster than
lapply.


What I have done:



n<-100000
set.seed(123)
x<-rnorm(n)
y<-x+rnorm(n)
rand.data<-data.frame(x,y)
k<-100
samples<-split(sample(1:n),rep(1:k,length=n))

res<-list()
t<-Sys.time()
for(i in 1:100){
  modelo<-lm(y~x,rand.data[-samples[[i]]])
  prediccion<-predict(modelo,rand.data[samples[[i]],])
  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])

}
print(Sys.time()-t)

Which takes 8.042 seconds

and using Lapply

cv.fold.fun <- function(index){
   fit <- lm(y~x, data = rand.data[-samples[[index]],])
   pred <- predict(fit, newdata = rand.data[samples[[index]],])
   return((pred - rand.data$y[samples[[index]]])^2)
  }


t<-Sys.time()

nuevo<-lapply(seq(along = samples),cv.fold.fun)
print(Sys.time()-t)


Which takes 9.56 seconds.

So... has been improved the FOR loop on R???

Thanks!





	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Thierry Onkelinx

2017-Aug-07 14:57 UTC

head link

[R] Has For bucle be impooved in R

Dear Jesus,

The difference is marginal when each code chunk does the same things. Your
for loop does not yields the same output as the lapply. Here is the cleaned
version of your code.

n<-10000
set.seed(123)
x<-rnorm(n)
y<-x+rnorm(n)
rand.data<-data.frame(x,y)
k<-100
samples <- split(sample(n), rep(seq_len(k),length=n))

library(microbenchmark)
microbenchmark(
  "for" = {
    res <- vector("list", length(samples))
    for(index in seq_along(samples)) {
      fit <- lm(y~x, data = rand.data[-samples[[index]],])
      pred <- predict(fit, newdata = rand.data[samples[[index]],])
      res[[i]] <- ((pred - rand.data$y[samples[[index]]])^2)
    }
  },
  lapply = {
    cv.fold.fun <- function(index){
      fit <- lm(y~x, data = rand.data[-samples[[index]],])
      pred <- predict(fit, newdata = rand.data[samples[[index]],])
      return((pred - rand.data$y[samples[[index]]])^2)
    }
    lapply(seq_along(samples), cv.fold.fun)
  }
)

Unit: milliseconds
   expr      min       lq     mean   median       uq      max neval cld
    for 866.4196 897.3137 949.8155 926.1918 946.8390 1767.463   100   a
 lapply 837.7804 889.6620 947.2401 909.9946 939.6379 2476.415   100   a

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2017-08-07 16:48 GMT+02:00 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>:
> The lapply loop and the for loop have very similar speed characteristics.
> Differences seen are almost always due to how you use memory in the body of
> the loop. This fact is not new. You may be under the incorrect assumption
> that using lapply is somehow equivalent to "vectorization", which
it is not.
> --
> Sent from my phone. Please excuse my brevity.
>
> On August 7, 2017 7:29:58 AM PDT, "Jes?s Para Fern?ndez" <
> j.para.fernandez at hotmail.com> wrote:
> >Hi!
> >
> >I am doing a lapply and for comparaison and I get that for is faster
> >than lapply.
> >
> >
> >What I have done:
> >
> >
> >
> >n<-100000
> >set.seed(123)
> >x<-rnorm(n)
> >y<-x+rnorm(n)
> >rand.data<-data.frame(x,y)
> >k<-100
> >samples<-split(sample(1:n),rep(1:k,length=n))
> >
> >res<-list()
> >t<-Sys.time()
> >for(i in 1:100){
> >  modelo<-lm(y~x,rand.data[-samples[[i]]])
> >  prediccion<-predict(modelo,rand.data[samples[[i]],])
> >  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])
> >
> >}
> >print(Sys.time()-t)
> >
> >Which takes 8.042 seconds
> >
> >and using Lapply
> >
> >cv.fold.fun <- function(index){
> >   fit <- lm(y~x, data = rand.data[-samples[[index]],])
> >   pred <- predict(fit, newdata = rand.data[samples[[index]],])
> >   return((pred - rand.data$y[samples[[index]]])^2)
> >  }
> >
> >
> >t<-Sys.time()
> >
> >nuevo<-lapply(seq(along = samples),cv.fold.fun)
> >print(Sys.time()-t)
> >
> >
> >Which takes 9.56 seconds.
> >
> >So... has been improved the FOR loop on R???
> >
> >Thanks!
> >
> >
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more maybe matching threads

R help - Aug 2017 - Has For bucle be impooved in R

[R] Has For bucle be impooved in R

[R] Has For bucle be impooved in R

[R] Has For bucle be impooved in R

[R] Has For bucle be impooved in R

Apparently Analagous Threads