thr3ads.net - R devel - [Rd] Subsetting the "ROW"s of an object [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Berry, Charles

2018-Jun-08 21:09 UTC

[Rd] Subsetting the "ROW"s of an object

> On Jun 8, 2018, at 1:49 PM, Hadley Wickham <h.wickham at gmail.com>
wrote:
> 
> Hmmm, yes, there must be some special case in the C code to avoid
> recycling a length-1 logical vector:

Here is a version that (I think) handles Herve's issue of arrays having one
or more 0 dimensions.

subset_ROW <-
    function(x,i)
{
    dims <- dim(x)
    index_list <- which(dims[-1] != 0L) + 3
    mc <- quote(x[i])
    nd <- max(1L, length(dims))
    mc[ index_list ] <- list(TRUE)
    mc[[ nd + 3L ]] <- FALSE
    names( mc )[ nd+3L ] <- "drop"
    eval(mc)
}

Curiously enough the timing is *much* better for this implementation than for
the first version I sent.

Constructing a version of `mc' that looks like `x[i,,,,drop=FALSE]' can
be done with `alist(a=)' in place of `list(TRUE)' in the earlier version
but seems to slow things down noticeably. It requires almost twice (!!) as much
time as the version above.

Best,

Chuck

Hadley Wickham

2018-Jun-08 21:15 UTC

head link

[Rd] Subsetting the "ROW"s of an object

On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles <ccberry at ucsd.edu>
wrote:>
>
>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham <h.wickham at
gmail.com> wrote:
>>
>> Hmmm, yes, there must be some special case in the C code to avoid
>> recycling a length-1 logical vector:
>
>
> Here is a version that (I think) handles Herve's issue of arrays having
one or more 0 dimensions.
>
> subset_ROW <-
>     function(x,i)
> {
>     dims <- dim(x)
>     index_list <- which(dims[-1] != 0L) + 3
>     mc <- quote(x[i])
>     nd <- max(1L, length(dims))
>     mc[ index_list ] <- list(TRUE)
>     mc[[ nd + 3L ]] <- FALSE
>     names( mc )[ nd+3L ] <- "drop"
>     eval(mc)
> }
>
> Curiously enough the timing is *much* better for this implementation than
for the first version I sent.
>
> Constructing a version of `mc' that looks like `x[i,,,,drop=FALSE]'
can be done with `alist(a=)' in place of `list(TRUE)' in the earlier
version but seems to slow things down noticeably. It requires almost twice (!!)
as much time as the version above.
I think that's probably because alist() is a slow way to generate a
missing symbol:

bench::mark(
  alist(x = ),
  list(x = quote(expr = )),
  check = FALSE
)[1:5]
#> # A tibble: 2 x 5
#>   expression                    min     mean   median      max
#>   <chr>                    <bch:tm> <bch:tm>
<bch:tm> <bch:tm>
#> 1 alist(x = )                 2.8?s   3.54?s   3.29?s   34.9?s
#> 2 list(x = quote(expr = ))    169ns 219.38ns    181ns   24.2?s

(note the units)

Hadley


-- 
http://hadley.nz

Berry, Charles

2018-Jun-08 21:49 UTC

head link

[Rd] Subsetting the "ROW"s of an object

> On Jun 8, 2018, at 2:15 PM, Hadley Wickham <h.wickham at gmail.com>
wrote:
> 
> On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles <ccberry at ucsd.edu>
wrote:
>> 
>> 
>>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham <h.wickham at
gmail.com> wrote:
>>> 
>>> Hmmm, yes, there must be some special case in the C code to avoid
>>> recycling a length-1 logical vector:
>> 
>> 
>> Here is a version that (I think) handles Herve's issue of arrays
having one or more 0 dimensions.
>> 
>> subset_ROW <-
>>    function(x,i)
>> {
>>    dims <- dim(x)
>>    index_list <- which(dims[-1] != 0L) + 3
>>    mc <- quote(x[i])
>>    nd <- max(1L, length(dims))
>>    mc[ index_list ] <- list(TRUE)
>>    mc[[ nd + 3L ]] <- FALSE
>>    names( mc )[ nd+3L ] <- "drop"
>>    eval(mc)
>> }
>> 
>> Curiously enough the timing is *much* better for this implementation
than for the first version I sent.
>> 
>> Constructing a version of `mc' that looks like
`x[i,,,,drop=FALSE]' can be done with `alist(a=)' in place of
`list(TRUE)' in the earlier version but seems to slow things down
noticeably. It requires almost twice (!!) as much time as the version above.
> 
> I think that's probably because alist() is a slow way to generate a
> missing symbol:
> 
> bench::mark(
>  alist(x = ),
>  list(x = quote(expr = )),
>  check = FALSE
> )[1:5]
> #> # A tibble: 2 x 5
> #>   expression                    min     mean   median      max
> #>   <chr>                    <bch:tm> <bch:tm>
<bch:tm> <bch:tm>
> #> 1 alist(x = )                 2.8?s   3.54?s   3.29?s   34.9?s
> #> 2 list(x = quote(expr = ))    169ns 219.38ns    181ns   24.2?s
> 
> (note the units)
Yes. That is good for about half the difference. And I guess the rest is getting
rid of seq(). This seems a bit quicker than anything else and satisfies
Herve's objections:

subset_ROW <-
      function(x,i)
  {
      dims <- dim(x)
      nd <- length(dims)
      index_list <- if (nd > 1) 2L + 2L:nd else 0
      mc <- quote(x[i])
      mc[ index_list ] <- list(quote(expr=))
      mc[[ "drop" ]] <- FALSE
      eval(mc)
  }

Chuck

Hervé Pagès

2018-Jun-08 21:58 UTC

head link

[Rd] Subsetting the "ROW"s of an object

On 06/08/2018 02:15 PM, Hadley Wickham wrote:> On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles <ccberry at ucsd.edu>
wrote:
>>
>>
>>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham <h.wickham at
gmail.com> wrote:
>>>
>>> Hmmm, yes, there must be some special case in the C code to avoid
>>> recycling a length-1 logical vector:
>>
>>
>> Here is a version that (I think) handles Herve's issue of arrays
having one or more 0 dimensions.
>>
>> subset_ROW <-
>>      function(x,i)
>> {
>>      dims <- dim(x)
>>      index_list <- which(dims[-1] != 0L) + 3
>>      mc <- quote(x[i])
>>      nd <- max(1L, length(dims))
>>      mc[ index_list ] <- list(TRUE)
>>      mc[[ nd + 3L ]] <- FALSE
>>      names( mc )[ nd+3L ] <- "drop"
>>      eval(mc)
>> }
>>
>> Curiously enough the timing is *much* better for this implementation
than for the first version I sent.
>>
>> Constructing a version of `mc' that looks like
`x[i,,,,drop=FALSE]' can be done with `alist(a=)' in place of
`list(TRUE)' in the earlier version but seems to slow things down
noticeably. It requires almost twice (!!) as much time as the version above.
> 
> I think that's probably because alist() is a slow way to generate a
> missing symbol:
> 
> bench::mark(
>    alist(x = ),
>    list(x = quote(expr = )),
>    check = FALSE
> )[1:5]
> #> # A tibble: 2 x 5
> #>   expression                    min     mean   median      max
> #>   <chr>                    <bch:tm> <bch:tm>
<bch:tm> <bch:tm>
> #> 1 alist(x = )                 2.8?s   3.54?s   3.29?s   34.9?s
> #> 2 list(x = quote(expr = ))    169ns 219.38ns    181ns   24.2?s
> 
> (note the units)
That's a good one. Need to change this in S4Vectors::default_extractROWS()
and other places. Thanks!

H.
> 
> Hadley
> 
> 
-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Maybe Matching Threads

Search for more reasonably related threads

R devel - Jun 2018 - Subsetting the "ROW"s of an object

[Rd] Subsetting the "ROW"s of an object

[Rd] Subsetting the "ROW"s of an object

[Rd] Subsetting the "ROW"s of an object

[Rd] Subsetting the "ROW"s of an object

Maybe Matching Threads