@vi@e@gross m@iii@g oii gm@ii@com
2024-Jun-16 16:33 UTC
[R] slowness when I use a list comprehension
Laurent, Thank you for introducing me to a package I did not know existed as I use features like list comprehension in python all the time and could see using it in R now that I know it is available. As to why you see your example as slow, I see you used a fairly complex and nested expression and wonder if it was a better way to go. As you are dealing with an interpreter doing delayed evaluation, I can imagine reasons it can be slow. But note the package comprehenr may not be designed to be more efficient than loops or of the more built-in functional methods that can be faster. The package is there perhaps more as a compatibility helper that allows you to write closer to the python style and perhaps re-shapes what you wrote into a set of instructions in more native R. Just for comparison, in python, things like comprehensions for list or dictionaries or tuples often are syntactic sugar and the interpreter may simply rewrite them more like the first program you typed and evaluates that. The comprehensions are more designed for users who can think another way and write things more compactly as one-liners. Depending on implementations, they may be faster or slower on some examples. I am not saying there is nothing else that is slowing it down for you. I am suggesting that using the feature as currently implemented may not be an advantage except in your thought process. It may be it could be improved, such as by replacing more functionality out of R and into compiled languages as has been done for many packages. Avi -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Laurent Rhelp Sent: Sunday, June 16, 2024 11:28 AM To: r-help at r-project.org Subject: [R] slowness when I use a list comprehension Dear RHelp-list, I try to use the package comprehenr to replace a for loop by a list comprehension. I wrote the code but I certainly miss something because it is very slower compared to the for loops. May you please explain to me why the list comprehension is slower in my case. Here is my example. I do the calculation of the square difference between the values of two vectors vec1 and vec2, the ratio sampling between vec1 and vec2 is equal to ratio_sampling. I have to use only the 500th value of the first serie before doing the difference with the value of the second serie (vec2). Thank you Best regards Laurent library(tictoc) library(comprehenr) ratio_sampling <- 500 ## size of the first serie N1 <- 70000 ## size of the second serie N2 <- 100 ## mock data set.seed(123) vec1 <- rnorm(N1) vec2 <- runif(N2) ## 1. with the "for" loops ## the square differences will be stored in a vector S_diff2 <- numeric((N1-(N2-1)*ratio_sampling)) tic() for( j in 1:length(S_diff2)){ sum_squares <- 0 for( i in 1:length(vec2)){ sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2) } S_diff2[j] <- sum_squares } toc() ## 0.22 sec elapsed which.max(S_diff2) ## 7857 ## 2. with the lists comprehension tic() S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in 1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2)))) toc() ## 25.09 sec elapsed which.max(S_diff2) ## 7857 ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I would be more strong on this advice: learn to think in R, rather than thinking in Python, when programming in R. R has atomic vectors... Python does not (until you import a package that implements them). I find that while it is possible to import R thinking into Python, Python programmers seem to object for stylistic reasons even though such thinking speeds up Python also. A key step in that direction is to stop using lists directly for numeric calculations... use them to manage numeric vactors. In some cases you can switch to matrices or arrays to remove even more list manipulations from the script. library(microbenchmark) ratio_sampling <- 500 ## size of the first serie N1 <- 70000 ## size of the second serie N2 <- 100 ## mock data set.seed(123) vec1 <- rnorm(N1) vec2 <- runif(N2) dloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) { S_diff2 <- numeric( N1-(N2-1)*ratio_sampling ) for( j in 1:length(S_diff2) ) { sum_squares <- 0 for( i in 1:length(vec2)){ sum_squares <- ( sum_squares + ( vec1[ (i-1)*ratio_sampling+j ] - vec2[i] )**2 ) } S_diff2[j] <- sum_squares } S_diff2 } vloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) { S_diff3 <- numeric( N1-(N2-1)*ratio_sampling ) i <- seq_along( vec2 ) k <- (i-1)*ratio_sampling for( j in seq_along( S_diff3 ) ) { S_diff3[j] <- sum( ( vec1[ j + k ] - vec2 )^2 ) } S_diff3 } microbenchmark( S_diff2 <- dloop( N1, M2, ratio_sampling, vec1, vec2 ) , S_diff3 <- vloop( N1, M2, ratio_sampling, vec1, vec2 ) , times = 20 ) all.equal( S_diff2, S_diff3 ) On June 16, 2024 9:33:54 AM PDT, avi.e.gross at gmail.com wrote:>Laurent, > >Thank you for introducing me to a package I did not know existed as I use features like list comprehension in python all the time and could see using it in R now that I know it is available. > >As to why you see your example as slow, I see you used a fairly complex and nested expression and wonder if it was a better way to go. As you are dealing with an interpreter doing delayed evaluation, I can imagine reasons it can be slow. But note the package comprehenr may not be designed to be more efficient than loops or of the more built-in functional methods that can be faster. The package is there perhaps more as a compatibility helper that allows you to write closer to the python style and perhaps re-shapes what you wrote into a set of instructions in more native R. > >Just for comparison, in python, things like comprehensions for list or dictionaries or tuples often are syntactic sugar and the interpreter may simply rewrite them more like the first program you typed and evaluates that. The comprehensions are more designed for users who can think another way and write things more compactly as one-liners. Depending on implementations, they may be faster or slower on some examples. > >I am not saying there is nothing else that is slowing it down for you. I am suggesting that using the feature as currently implemented may not be an advantage except in your thought process. It may be it could be improved, such as by replacing more functionality out of R and into compiled languages as has been done for many packages. > >Avi > >-----Original Message----- >From: R-help <r-help-bounces at r-project.org> On Behalf Of Laurent Rhelp >Sent: Sunday, June 16, 2024 11:28 AM >To: r-help at r-project.org >Subject: [R] slowness when I use a list comprehension > >Dear RHelp-list, > > I try to use the package comprehenr to replace a for loop by a list >comprehension. > > I wrote the code but I certainly miss something because it is very >slower compared to the for loops. May you please explain to me why the >list comprehension is slower in my case. > >Here is my example. I do the calculation of the square difference >between the values of two vectors vec1 and vec2, the ratio sampling >between vec1 and vec2 is equal to ratio_sampling. I have to use only the >500th value of the first serie before doing the difference with the >value of the second serie (vec2). > >Thank you > >Best regards > >Laurent > >library(tictoc) >library(comprehenr) > >ratio_sampling <- 500 >## size of the first serie >N1 <- 70000 >## size of the second serie >N2 <- 100 >## mock data >set.seed(123) >vec1 <- rnorm(N1) >vec2 <- runif(N2) > > >## 1. with the "for" loops > >## the square differences will be stored in a vector >S_diff2 <- numeric((N1-(N2-1)*ratio_sampling)) >tic() >for( j in 1:length(S_diff2)){ > sum_squares <- 0 > for( i in 1:length(vec2)){ > sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] - >vec2[i])**2) > } > S_diff2[j] <- sum_squares >} >toc() >## 0.22 sec elapsed >which.max(S_diff2) >## 7857 > >## 2. with the lists comprehension >tic() >S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in >1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2)))) >toc() >## 25.09 sec elapsed >which.max(S_diff2) >## 7857 > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.