Bert Gunter
2022-May-03 04:37 UTC
[R] [External] Somewhat disconcerting behavior of seq.int()
Well, I'm on an M1 Mac, so that is certainly different than either of your systems. I installed the precompiled binary, which may also have something to do with it. Whether these make a difference I have no clue. However, the fact remains that the Help file *does* warn that the type of the seq.int() value is essentially indeterminate, and when I explicitly cast it to integer, all is well. So mea culpa. I will fool around some tomorrow with more careful profiling to see if I can learn anything, but the best I say at present is: it is what it is. Unless, of course, someone provides an answer before then. Bert Gunter On Mon, May 2, 2022 at 8:53 PM <luke-tierney at uiowa.edu> wrote:> > Something is very different about your system. On my Linux system I get > > > microbenchmark(l1 <- sieve1(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max neval > l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085 50 > > microbenchmark(l2 <- sieve2(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max neval > l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338 50 > > Similar on an Intel Mac. > > Best, > > luke > > On Tue, 3 May 2022, Bert Gunter wrote: > > > ** Disconcerting to me, anyway; perhaps not to others** > > (Apologies if this has been discussed before. I was a bit nonplussed by > > it, but maybe I'm just clueless.) Anyway: > > > > Here are two almost identical versions of the Sieve of Eratosthenes. > > The difference between them is only in the call to seq.int() that is > > highlighted > > > > sieve1 <- function(m){ > > if(m < 2) return(NULL) > > a <- floor(sqrt(m)) > > pr <- Recall(a) > > #################### > > s <- seq.int(2, to = m) ## Only difference here > > ###################### > > for( i in pr) s <- s[as.logical(s %% i)] > > c(pr,s) > > } > > > > sieve2 <- function(m){ > > if(m < 2) return(NULL) > > a <- floor(sqrt(m)) > > pr <- Recall(a) > > #################### > > s <- seq.int(2, to = m, by =1) ## Only difference here > > ####################### > > for( i in pr) s <- s[as.logical(s %% i)] > > c(pr,s) > > } > > > > However, execution time is *quite* different. > > > > library(microbenchmark) > > > >> microbenchmark(l1 <- sieve1(1e5), times =50) > > Unit: milliseconds > > expr min lq mean median uq max > > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751 > > neval > > 50 > > > >> microbenchmark(l2 <- sieve2(1e5), times =50) > > Unit: milliseconds > > expr min lq mean median uq max > > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464 > > neval > > 50 > > > > Now note that: > >> identical(l1, l2) > > [1] FALSE > > > > ## Because: > >> str(l1) > > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > >> str(l2) > > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > > I therefore assume that seq.int(), an internal generic, is dispatching > > to a method that uses integer arithmetic for sieve1 and floating point > > for sieve2. Is this correct? If not, what do I fail to understand? And > > is this indeed the source of the large difference in execution time? > > > > Further, ?seq.int says: > > "The interpretation of the unnamed arguments of seq and seq.int is not > > standard, and it is recommended always to name the arguments when > > programming." > > > > The above suggests that maybe this advice should be qualified, and/or > > adding some comments to the Help file regarding this behavior might be > > useful to na?fs like me. > > > > In case it makes a difference (and it might!): > > > >> sessionInfo() > > R version 4.2.0 (2022-04-22) > > Platform: x86_64-apple-darwin17.0 (64-bit) > > Running under: macOS Monterey 12.3.1 > > > > Matrix products: default > > LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] microbenchmark_1.4.9 > > > > loaded via a namespace (and not attached): > > [1] compiler_4.2.0 tools_4.2.0 > > > > > > Thanks for any enlightenment and again apologies if I am plowing old ground. > > > > Best to all, > > > > Bert Gunter > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Bert Gunter
2022-May-03 05:08 UTC
[R] [External] Somewhat disconcerting behavior of seq.int()
Just confirming that it's %% in integers vs. doubles on my system:> s1 <- seq.int(2, 1e5, by =1) ## doubles > s2 = as.integer(s1)## **Note units below**> microbenchmark( v1 <- s1 %% 2, times = 50) ## floating pointUnit: milliseconds expr min lq mean median uq max neval v1 <- s1%%2 69.28204 69.60496 69.8957 69.81379 70.01729 71.36125 50> microbenchmark( v2 <- s2 %% 2L, times = 50) ## integerUnit: microseconds expr min lq mean median uq max neval v2 <- s2%%2L 166.626 167.042 172.7431 170.5215 177.667 194.334 50 I have no idea why the big difference, but I am pretty sure it's way beyond me. Maybe Mac gurus can figure it out. I may post this on r-sig-mac to see. Bert On Mon, May 2, 2022 at 9:37 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > Well, I'm on an M1 Mac, so that is certainly different than either of > your systems. I installed the precompiled binary, which may also have > something to do with it. Whether these make a difference I have no > clue. > > However, the fact remains that the Help file *does* warn that the type > of the seq.int() value is essentially indeterminate, and when I > explicitly cast it to integer, all is well. So mea culpa. > > I will fool around some tomorrow with more careful profiling to see if > I can learn anything, but the best I say at present is: it is what it > is. Unless, of course, someone provides an answer before then. > > Bert Gunter > > > On Mon, May 2, 2022 at 8:53 PM <luke-tierney at uiowa.edu> wrote: > > > > Something is very different about your system. On my Linux system I get > > > > > microbenchmark(l1 <- sieve1(1e5), times =50) > > Unit: milliseconds > > expr min lq mean median uq max neval > > l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085 50 > > > microbenchmark(l2 <- sieve2(1e5), times =50) > > Unit: milliseconds > > expr min lq mean median uq max neval > > l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338 50 > > > > Similar on an Intel Mac. > > > > Best, > > > > luke > > > > On Tue, 3 May 2022, Bert Gunter wrote: > > > > > ** Disconcerting to me, anyway; perhaps not to others** > > > (Apologies if this has been discussed before. I was a bit nonplussed by > > > it, but maybe I'm just clueless.) Anyway: > > > > > > Here are two almost identical versions of the Sieve of Eratosthenes. > > > The difference between them is only in the call to seq.int() that is > > > highlighted > > > > > > sieve1 <- function(m){ > > > if(m < 2) return(NULL) > > > a <- floor(sqrt(m)) > > > pr <- Recall(a) > > > #################### > > > s <- seq.int(2, to = m) ## Only difference here > > > ###################### > > > for( i in pr) s <- s[as.logical(s %% i)] > > > c(pr,s) > > > } > > > > > > sieve2 <- function(m){ > > > if(m < 2) return(NULL) > > > a <- floor(sqrt(m)) > > > pr <- Recall(a) > > > #################### > > > s <- seq.int(2, to = m, by =1) ## Only difference here > > > ####################### > > > for( i in pr) s <- s[as.logical(s %% i)] > > > c(pr,s) > > > } > > > > > > However, execution time is *quite* different. > > > > > > library(microbenchmark) > > > > > >> microbenchmark(l1 <- sieve1(1e5), times =50) > > > Unit: milliseconds > > > expr min lq mean median uq max > > > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751 > > > neval > > > 50 > > > > > >> microbenchmark(l2 <- sieve2(1e5), times =50) > > > Unit: milliseconds > > > expr min lq mean median uq max > > > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464 > > > neval > > > 50 > > > > > > Now note that: > > >> identical(l1, l2) > > > [1] FALSE > > > > > > ## Because: > > >> str(l1) > > > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > > > >> str(l2) > > > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > > > > I therefore assume that seq.int(), an internal generic, is dispatching > > > to a method that uses integer arithmetic for sieve1 and floating point > > > for sieve2. Is this correct? If not, what do I fail to understand? And > > > is this indeed the source of the large difference in execution time? > > > > > > Further, ?seq.int says: > > > "The interpretation of the unnamed arguments of seq and seq.int is not > > > standard, and it is recommended always to name the arguments when > > > programming." > > > > > > The above suggests that maybe this advice should be qualified, and/or > > > adding some comments to the Help file regarding this behavior might be > > > useful to na?fs like me. > > > > > > In case it makes a difference (and it might!): > > > > > >> sessionInfo() > > > R version 4.2.0 (2022-04-22) > > > Platform: x86_64-apple-darwin17.0 (64-bit) > > > Running under: macOS Monterey 12.3.1 > > > > > > Matrix products: default > > > LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib > > > > > > locale: > > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > > > attached base packages: > > > [1] stats graphics grDevices utils datasets methods base > > > > > > other attached packages: > > > [1] microbenchmark_1.4.9 > > > > > > loaded via a namespace (and not attached): > > > [1] compiler_4.2.0 tools_4.2.0 > > > > > > > > > Thanks for any enlightenment and again apologies if I am plowing old ground. > > > > > > Best to all, > > > > > > Bert Gunter > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > -- > > Luke Tierney > > Ralph E. Wareham Professor of Mathematical Sciences > > University of Iowa Phone: 319-335-3386 > > Department of Statistics and Fax: 319-335-3017 > > Actuarial Science > > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu