iuke-tier@ey m@iii@g oii uiow@@edu
2022-May-03 03:52 UTC
[R] [External] Somewhat disconcerting behavior of seq.int()
Something is very different about your system. On my Linux system I get> microbenchmark(l1 <- sieve1(1e5), times =50)Unit: milliseconds expr min lq mean median uq max neval l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085 50> microbenchmark(l2 <- sieve2(1e5), times =50)Unit: milliseconds expr min lq mean median uq max neval l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338 50 Similar on an Intel Mac. Best, luke On Tue, 3 May 2022, Bert Gunter wrote:> ** Disconcerting to me, anyway; perhaps not to others** > (Apologies if this has been discussed before. I was a bit nonplussed by > it, but maybe I'm just clueless.) Anyway: > > Here are two almost identical versions of the Sieve of Eratosthenes. > The difference between them is only in the call to seq.int() that is > highlighted > > sieve1 <- function(m){ > if(m < 2) return(NULL) > a <- floor(sqrt(m)) > pr <- Recall(a) > #################### > s <- seq.int(2, to = m) ## Only difference here > ###################### > for( i in pr) s <- s[as.logical(s %% i)] > c(pr,s) > } > > sieve2 <- function(m){ > if(m < 2) return(NULL) > a <- floor(sqrt(m)) > pr <- Recall(a) > #################### > s <- seq.int(2, to = m, by =1) ## Only difference here > ####################### > for( i in pr) s <- s[as.logical(s %% i)] > c(pr,s) > } > > However, execution time is *quite* different. > > library(microbenchmark) > >> microbenchmark(l1 <- sieve1(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751 > neval > 50 > >> microbenchmark(l2 <- sieve2(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464 > neval > 50 > > Now note that: >> identical(l1, l2) > [1] FALSE > > ## Because: >> str(l1) > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > >> str(l2) > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > I therefore assume that seq.int(), an internal generic, is dispatching > to a method that uses integer arithmetic for sieve1 and floating point > for sieve2. Is this correct? If not, what do I fail to understand? And > is this indeed the source of the large difference in execution time? > > Further, ?seq.int says: > "The interpretation of the unnamed arguments of seq and seq.int is not > standard, and it is recommended always to name the arguments when > programming." > > The above suggests that maybe this advice should be qualified, and/or > adding some comments to the Help file regarding this behavior might be > useful to na?fs like me. > > In case it makes a difference (and it might!): > >> sessionInfo() > R version 4.2.0 (2022-04-22) > Platform: x86_64-apple-darwin17.0 (64-bit) > Running under: macOS Monterey 12.3.1 > > Matrix products: default > LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] microbenchmark_1.4.9 > > loaded via a namespace (and not attached): > [1] compiler_4.2.0 tools_4.2.0 > > > Thanks for any enlightenment and again apologies if I am plowing old ground. > > Best to all, > > Bert Gunter > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Bert Gunter
2022-May-03 04:37 UTC
[R] [External] Somewhat disconcerting behavior of seq.int()
Well, I'm on an M1 Mac, so that is certainly different than either of your systems. I installed the precompiled binary, which may also have something to do with it. Whether these make a difference I have no clue. However, the fact remains that the Help file *does* warn that the type of the seq.int() value is essentially indeterminate, and when I explicitly cast it to integer, all is well. So mea culpa. I will fool around some tomorrow with more careful profiling to see if I can learn anything, but the best I say at present is: it is what it is. Unless, of course, someone provides an answer before then. Bert Gunter On Mon, May 2, 2022 at 8:53 PM <luke-tierney at uiowa.edu> wrote:> > Something is very different about your system. On my Linux system I get > > > microbenchmark(l1 <- sieve1(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max neval > l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085 50 > > microbenchmark(l2 <- sieve2(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max neval > l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338 50 > > Similar on an Intel Mac. > > Best, > > luke > > On Tue, 3 May 2022, Bert Gunter wrote: > > > ** Disconcerting to me, anyway; perhaps not to others** > > (Apologies if this has been discussed before. I was a bit nonplussed by > > it, but maybe I'm just clueless.) Anyway: > > > > Here are two almost identical versions of the Sieve of Eratosthenes. > > The difference between them is only in the call to seq.int() that is > > highlighted > > > > sieve1 <- function(m){ > > if(m < 2) return(NULL) > > a <- floor(sqrt(m)) > > pr <- Recall(a) > > #################### > > s <- seq.int(2, to = m) ## Only difference here > > ###################### > > for( i in pr) s <- s[as.logical(s %% i)] > > c(pr,s) > > } > > > > sieve2 <- function(m){ > > if(m < 2) return(NULL) > > a <- floor(sqrt(m)) > > pr <- Recall(a) > > #################### > > s <- seq.int(2, to = m, by =1) ## Only difference here > > ####################### > > for( i in pr) s <- s[as.logical(s %% i)] > > c(pr,s) > > } > > > > However, execution time is *quite* different. > > > > library(microbenchmark) > > > >> microbenchmark(l1 <- sieve1(1e5), times =50) > > Unit: milliseconds > > expr min lq mean median uq max > > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751 > > neval > > 50 > > > >> microbenchmark(l2 <- sieve2(1e5), times =50) > > Unit: milliseconds > > expr min lq mean median uq max > > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464 > > neval > > 50 > > > > Now note that: > >> identical(l1, l2) > > [1] FALSE > > > > ## Because: > >> str(l1) > > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > >> str(l2) > > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > > I therefore assume that seq.int(), an internal generic, is dispatching > > to a method that uses integer arithmetic for sieve1 and floating point > > for sieve2. Is this correct? If not, what do I fail to understand? And > > is this indeed the source of the large difference in execution time? > > > > Further, ?seq.int says: > > "The interpretation of the unnamed arguments of seq and seq.int is not > > standard, and it is recommended always to name the arguments when > > programming." > > > > The above suggests that maybe this advice should be qualified, and/or > > adding some comments to the Help file regarding this behavior might be > > useful to na?fs like me. > > > > In case it makes a difference (and it might!): > > > >> sessionInfo() > > R version 4.2.0 (2022-04-22) > > Platform: x86_64-apple-darwin17.0 (64-bit) > > Running under: macOS Monterey 12.3.1 > > > > Matrix products: default > > LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] microbenchmark_1.4.9 > > > > loaded via a namespace (and not attached): > > [1] compiler_4.2.0 tools_4.2.0 > > > > > > Thanks for any enlightenment and again apologies if I am plowing old ground. > > > > Best to all, > > > > Bert Gunter > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Bert Gunter
2022-May-03 15:23 UTC
[R] [External] Somewhat disconcerting behavior of seq.int()
I resolved the problem by reinstalling R. See below. No clue as to what may have been the cause (just an ignorant wild guess that is not worth sharing). Thanks again to all for your help. Bert> s1 <- seq.int(2, 1e5, by =1) > s2 = as.integer(s1) > > microbenchmark( v1 <- s1 %% 2, times = 50)Unit: microseconds expr min lq mean median uq max neval v1 <- s1%%2 396.839 410.943 433.6234 432.3245 457.068 491.057 50> microbenchmark( v2 <- s2 %% 2L, times = 50)Unit: microseconds expr min lq mean median uq max neval v2 <- s2%%2L 145.837 150.019 159.5441 162.032 164.943 177.12 50 sieve1 <- function(m){ if(m < 2) return(NULL) a <- floor(sqrt(m)) pr <- Recall(a) s <- seq.int(2, to = m) ## Only difference here for( i in pr) s <- s[as.logical(s %% i)] c(pr,s) } sieve2 <- function(m){ if(m < 2) return(NULL) a <- floor(sqrt(m)) pr <- Recall(a) s <-seq.int(2L, to = m, by =1) ## Only difference here for( i in pr) s <- s[as.logical(s %% i)] c(pr,s) }> microbenchmark(l1 <- sieve1(1e5), times =50)Unit: milliseconds expr min lq mean median uq max l1 <- sieve1(1e+05) 3.69533 4.068307 5.679122 4.28327 7.561425 10.07493 neval 50> microbenchmark(l2 <- sieve2(1e5), times =50)Unit: milliseconds expr min lq mean median uq l2 <- sieve2(1e+05) 5.367679 6.128229 8.013111 8.940788 9.430246 max neval 11.52822 50 On Mon, May 2, 2022 at 8:53 PM <luke-tierney at uiowa.edu> wrote:> > Something is very different about your system. On my Linux system I get > > > microbenchmark(l1 <- sieve1(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max neval > l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085 50 > > microbenchmark(l2 <- sieve2(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max neval > l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338 50 > > Similar on an Intel Mac. > > Best, > > luke > > On Tue, 3 May 2022, Bert Gunter wrote: > > > ** Disconcerting to me, anyway; perhaps not to others** > > (Apologies if this has been discussed before. I was a bit nonplussed by > > it, but maybe I'm just clueless.) Anyway: > > > > Here are two almost identical versions of the Sieve of Eratosthenes. > > The difference between them is only in the call to seq.int() that is > > highlighted > > > > sieve1 <- function(m){ > > if(m < 2) return(NULL) > > a <- floor(sqrt(m)) > > pr <- Recall(a) > > #################### > > s <- seq.int(2, to = m) ## Only difference here > > ###################### > > for( i in pr) s <- s[as.logical(s %% i)] > > c(pr,s) > > } > > > > sieve2 <- function(m){ > > if(m < 2) return(NULL) > > a <- floor(sqrt(m)) > > pr <- Recall(a) > > #################### > > s <- seq.int(2, to = m, by =1) ## Only difference here > > ####################### > > for( i in pr) s <- s[as.logical(s %% i)] > > c(pr,s) > > } > > > > However, execution time is *quite* different. > > > > library(microbenchmark) > > > >> microbenchmark(l1 <- sieve1(1e5), times =50) > > Unit: milliseconds > > expr min lq mean median uq max > > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751 > > neval > > 50 > > > >> microbenchmark(l2 <- sieve2(1e5), times =50) > > Unit: milliseconds > > expr min lq mean median uq max > > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464 > > neval > > 50 > > > > Now note that: > >> identical(l1, l2) > > [1] FALSE > > > > ## Because: > >> str(l1) > > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > >> str(l2) > > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > > I therefore assume that seq.int(), an internal generic, is dispatching > > to a method that uses integer arithmetic for sieve1 and floating point > > for sieve2. Is this correct? If not, what do I fail to understand? And > > is this indeed the source of the large difference in execution time? > > > > Further, ?seq.int says: > > "The interpretation of the unnamed arguments of seq and seq.int is not > > standard, and it is recommended always to name the arguments when > > programming." > > > > The above suggests that maybe this advice should be qualified, and/or > > adding some comments to the Help file regarding this behavior might be > > useful to na?fs like me. > > > > In case it makes a difference (and it might!): > > > >> sessionInfo() > > R version 4.2.0 (2022-04-22) > > Platform: x86_64-apple-darwin17.0 (64-bit) > > Running under: macOS Monterey 12.3.1 > > > > Matrix products: default > > LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] microbenchmark_1.4.9 > > > > loaded via a namespace (and not attached): > > [1] compiler_4.2.0 tools_4.2.0 > > > > > > Thanks for any enlightenment and again apologies if I am plowing old ground. > > > > Best to all, > > > > Bert Gunter > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu