> On Jun 8, 2018, at 10:37 AM, Herv? Pag?s <hpages at fredhutch.org> wrote: > > Also the TRUEs cause problems if some dimensions are 0: > > > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] > Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : > (subscript) logical subscript too longOK. But this is easy enough to handle.> > H. > > On 06/08/2018 10:29 AM, Hadley Wickham wrote: >> I suspect this will have suboptimal performance since the TRUEs will >> get recycled. (Maybe there is, or could be, ALTREP, support for >> recycling) >> HadleyAFAICS, it is not an issue. Taking arr <- array(rnorm(2^22),c(2^10,4,4,4)) as a test case and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)': subset_ROW4 <- function(x, i, useLiteral=FALSE) { literal <- quote(x[i,,,,drop=FALSE]) mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) mc[["drop"]] <- FALSE if (useLiteral) eval(literal) else eval(mc) } I get identical times with system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) and with system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) Changing the dimensions to c(2^5, 2^7, 4, 4 ) and running something similar also shows equal times. Chuck>> On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles <ccberry at ucsd.edu> wrote: >>> >>> >>>> On Jun 8, 2018, at 8:45 AM, Hadley Wickham <h.wickham at gmail.com> wrote: >>>> >>>> Hi all, >>>> >>>> Is there a better to way to subset the ROWs (in the sense of NROW) of >>>> an vector, matrix, data frame or array than this? >>> >>> >>> You can use TRUE to fill the subscripts for dimensions 2:nd >>> >>>> >>>> subset_ROW <- function(x, i) { >>>> nd <- length(dim(x)) >>>> if (nd <= 1L) { >>>> x[i] >>>> } else { >>>> dims <- rep(list(quote(expr = )), nd - 1L) >>>> do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) >>>> } >>>> } >>> >>> >>> subset_ROW <- >>> function(x,i) >>> { >>> mc <- quote(x[i]) >>> nd <- max(1L, length(dim(x))) >>> mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L) >>> mc[["drop"]] <- FALSE >>> eval(mc) >>> >>> } >>> >>>> >>>> subset_ROW(1:10, 4:6) >>>> #> [1] 4 5 6 >>>> >>>> str(subset_ROW(array(1:10, c(10)), 2:4)) >>>> #> int [1:3(1d)] 2 3 4 >>>> str(subset_ROW(array(1:10, c(10, 1)), 2:4)) >>>> #> int [1:3, 1] 2 3 4 >>>> str(subset_ROW(array(1:10, c(5, 2)), 2:4)) >>>> #> int [1:3, 1:2] 2 3 4 7 8 9 >>>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) >>>> #> int [1:3, 1, 1] 2 3 4 >>>> >>>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) >>>> #> x y >>>> #> 2 2 9 >>>> #> 3 3 8 >>>> #> 4 4 7 >>>> >>> >>> HTH, >>> >>> Chuck >>> > > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319
On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccberry at ucsd.edu> wrote:> > >> On Jun 8, 2018, at 10:37 AM, Herv? Pag?s <hpages at fredhutch.org> wrote: >> >> Also the TRUEs cause problems if some dimensions are 0: >> >> > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] >> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : >> (subscript) logical subscript too long > > OK. But this is easy enough to handle. > >> >> H. >> >> On 06/08/2018 10:29 AM, Hadley Wickham wrote: >>> I suspect this will have suboptimal performance since the TRUEs will >>> get recycled. (Maybe there is, or could be, ALTREP, support for >>> recycling) >>> Hadley > > > AFAICS, it is not an issue. Taking > > arr <- array(rnorm(2^22),c(2^10,4,4,4)) > > as a test case > > and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)': > > subset_ROW4 <- > function(x, i, useLiteral=FALSE) > { > literal <- quote(x[i,,,,drop=FALSE]) > mc <- quote(x[i]) > nd <- max(1L, length(dim(x))) > mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) > mc[["drop"]] <- FALSE > if (useLiteral) > eval(literal) > else > eval(mc) > } > > I get identical times with > > system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) > > and with > > system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))I think that's because you used a relatively low precision timing mechnaism, and included the index generation in the timing. I see: arr <- array(rnorm(2^22),c(2^10,4,4,4)) i <- seq(1,length = 10, by = 100) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] ) #> # A tibble: 2 x 1 #> expression min mean median max n_gc #> <chr> <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl> #> 1 arr[i, TRUE,? 7.4?s 10.9?s 10.66?s 1.22ms 2 #> 2 arr[i, , , ] 7.06?s 8.8?s 7.85?s 538.09?s 2 So not a huge difference, but it's there. Hadley -- http://hadley.nz
A missing subscript is still preferable to a TRUE though because it carries the meaning "take it all". A TRUE also achieves this but via implicit recycling. For example x[ , , ] and x[TRUE, TRUE, TRUE] achieve the same thing (if length(x) != 0) and are both no-ops but the subsetting code gets a chance to immediately and easily detect the former as a no-op whereas it will probably not be able to do it so easily for the latter. So in this case it will most likely generate a copy of 'x' and fill the new array by taking a full walk on it. H. On 06/08/2018 11:52 AM, Hadley Wickham wrote:> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccberry at ucsd.edu> wrote: >> >> >>> On Jun 8, 2018, at 10:37 AM, Herv? Pag?s <hpages at fredhutch.org> wrote: >>> >>> Also the TRUEs cause problems if some dimensions are 0: >>> >>> > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] >>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : >>> (subscript) logical subscript too long >> >> OK. But this is easy enough to handle. >> >>> >>> H. >>> >>> On 06/08/2018 10:29 AM, Hadley Wickham wrote: >>>> I suspect this will have suboptimal performance since the TRUEs will >>>> get recycled. (Maybe there is, or could be, ALTREP, support for >>>> recycling) >>>> Hadley >> >> >> AFAICS, it is not an issue. Taking >> >> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >> >> as a test case >> >> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)': >> >> subset_ROW4 <- >> function(x, i, useLiteral=FALSE) >> { >> literal <- quote(x[i,,,,drop=FALSE]) >> mc <- quote(x[i]) >> nd <- max(1L, length(dim(x))) >> mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) >> mc[["drop"]] <- FALSE >> if (useLiteral) >> eval(literal) >> else >> eval(mc) >> } >> >> I get identical times with >> >> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) >> >> and with >> >> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) > > I think that's because you used a relatively low precision timing > mechnaism, and included the index generation in the timing. I see: > > arr <- array(rnorm(2^22),c(2^10,4,4,4)) > i <- seq(1,length = 10, by = 100) > > bench::mark( > arr[i, TRUE, TRUE, TRUE], > arr[i, , , ] > ) > #> # A tibble: 2 x 1 > #> expression min mean median max n_gc > #> <chr> <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl> > #> 1 arr[i, TRUE,? 7.4?s 10.9?s 10.66?s 1.22ms 2 > #> 2 arr[i, , , ] 7.06?s 8.8?s 7.85?s 538.09?s 2 > > So not a huge difference, but it's there. > > Hadley > >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
> On Jun 8, 2018, at 11:52 AM, Hadley Wickham <h.wickham at gmail.com> wrote: > > On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccberry at ucsd.edu> wrote: >> >> >>> On Jun 8, 2018, at 10:37 AM, Herv? Pag?s <hpages at fredhutch.org> wrote: >>> >>> Also the TRUEs cause problems if some dimensions are 0: >>> >>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] >>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : >>> (subscript) logical subscript too long >> >> OK. But this is easy enough to handle. >> >>> >>> H. >>> >>> On 06/08/2018 10:29 AM, Hadley Wickham wrote: >>>> I suspect this will have suboptimal performance since the TRUEs will >>>> get recycled. (Maybe there is, or could be, ALTREP, support for >>>> recycling) >>>> Hadley >> >> >> AFAICS, it is not an issue. Taking >> >> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >> >> as a test case >> >> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)': >> >> subset_ROW4 <- >> function(x, i, useLiteral=FALSE) >> { >> literal <- quote(x[i,,,,drop=FALSE]) >> mc <- quote(x[i]) >> nd <- max(1L, length(dim(x))) >> mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) >> mc[["drop"]] <- FALSE >> if (useLiteral) >> eval(literal) >> else >> eval(mc) >> } >> >> I get identical times with >> >> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) >> >> and with >> >> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) > > I think that's because you used a relatively low precision timing > mechnaism, and included the index generation in the timing. I see: > > arr <- array(rnorm(2^22),c(2^10,4,4,4)) > i <- seq(1,length = 10, by = 100) > > bench::mark( > arr[i, TRUE, TRUE, TRUE], > arr[i, , , ] > ) > #> # A tibble: 2 x 1 > #> expression min mean median max n_gc > #> <chr> <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl> > #> 1 arr[i, TRUE,? 7.4?s 10.9?s 10.66?s 1.22ms 2 > #> 2 arr[i, , , ] 7.06?s 8.8?s 7.85?s 538.09?s 2 > > So not a huge difference, but it's there.Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent. But with subset_ROW4 I see no consistent difference. In this example, it runs faster on average using `eval(mc)' to return the result:> arr <- array(rnorm(2^22),c(2^10,4,4,4)) > i <- seq(1,length=10,by=100) > bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]# A tibble: 2 x 8 expression min mean median max `itr/sec` mem_alloc n_gc <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> 1 subset_ROW4(arr, i, FALSE) 28.9?s 34.9?s 32.1?s 1.36ms 28686. 5.05KB 5 2 subset_ROW4(arr, i, TRUE) 28.9?s 35?s 32.4?s 875.11?s 28572. 5.05KB 5>And on subsequent reps the lead switches back and forth. Chuck