thr3ads.net - R devel - [Rd] RFC: sapply() limitation from vector to matrix, but not further [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Martin Maechler

2010-Dec-01 08:39 UTC

[Rd] RFC: sapply() limitation from vector to matrix, but not further

sapply() stems from S / S+ times and hence has a long tradition.
In spite of that I think that it should be enhanced...

As the subject mentions, sapply() produces a matrix in cases
where the list components of the lapply(.) results are of the
same length (and ...).
However, it unfortunately "stops there".
E.g., if you *nest* two sapply() calls where the inner one
produces a matrix, very often the logical behavior would be for
the outer sapply() to stack these matrices into an array of 
rank 3 ["array rank"(x) := length(dim(x))].
However it does not do that, e.g., an artifical example

p0 <- function(...) paste(..., sep="")
myF <- function(x,y) {
    stopifnot(length(x) <= 3)
    x <- rep(x, length.out=3)
    ny <- length(y)
    r <- outer(x,y)
    dimnames(r) <- list(p0("r",1:3), p0("C",
seq_len(ny)))
    r
}

and
> (v <- structure(10*(5:8), names=LETTERS[1:4])) A  B  C  D 
50 60 70 80 

if we let sapply() not simplify, we see the list of same size
matrices it produes:
> sapply(v, myF, y = 2*(1:5), simplify=FALSE)$A
    C1  C2  C3  C4  C5
r1 100 200 300 400 500
r2 100 200 300 400 500
r3 100 200 300 400 500

$B
    C1  C2  C3  C4  C5
r1 120 240 360 480 600
r2 120 240 360 480 600
r3 120 240 360 480 600

$C
    C1  C2  C3  C4  C5
r1 140 280 420 560 700
r2 140 280 420 560 700
r3 140 280 420 560 700

$D
    C1  C2  C3  C4  C5
r1 160 320 480 640 800
r2 160 320 480 640 800
r3 160 320 480 640 800

However, quite deceptively
> sapply(v, myF, y = 2*(1:5))        A   B   C   D
 [1,] 100 120 140 160
 [2,] 100 120 140 160
 [3,] 100 120 140 160
 [4,] 200 240 280 320
 [5,] 200 240 280 320
 [6,] 200 240 280 320
 [7,] 300 360 420 480
 [8,] 300 360 420 480
 [9,] 300 360 420 480
[10,] 400 480 560 640
[11,] 400 480 560 640
[12,] 400 480 560 640
[13,] 500 600 700 800
[14,] 500 600 700 800
[15,] 500 600 700 800


My proposal -- implemented and "make check" tested --
is to add an optional argument  'ARRAY'
which allows
> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE), , A

    C1  C2  C3  C4  C5
r1 100 200 300 400 500
r2 100 200 300 400 500
r3 100 200 300 400 500

, , B

    C1  C2  C3  C4  C5
r1 120 240 360 480 600
r2 120 240 360 480 600
r3 120 240 360 480 600

, , C

    C1  C2  C3  C4  C5
r1 140 280 420 560 700
r2 140 280 420 560 700
r3 140 280 420 560 700

, , D

    C1  C2  C3  C4  C5
r1 160 320 480 640 800
r2 160 320 480 640 800
r3 160 320 480 640 800
> -----------

In the best of all worlds, the default would be 'ARRAY = TRUE',
but of course, given the long-standing different behavior,
it seem much too "risky", and my proposal includes remaining
back-compatible with default ARRAY = FALSE.

Martin Maechler,
ETH Zurich

Marc Schwartz

2010-Dec-01 13:59 UTC

head link

[Rd] RFC: sapply() limitation from vector to matrix, but not further

On Dec 1, 2010, at 2:39 AM, Martin Maechler wrote:
> sapply() stems from S / S+ times and hence has a long tradition.
> In spite of that I think that it should be enhanced...
> 
> As the subject mentions, sapply() produces a matrix in cases
> where the list components of the lapply(.) results are of the
> same length (and ...).
> However, it unfortunately "stops there".
> E.g., if you *nest* two sapply() calls where the inner one
> produces a matrix, very often the logical behavior would be for
> the outer sapply() to stack these matrices into an array of 
> rank 3 ["array rank"(x) := length(dim(x))].
> However it does not do that, e.g., an artifical example
> 
> p0 <- function(...) paste(..., sep="")
> myF <- function(x,y) {
>    stopifnot(length(x) <= 3)
>    x <- rep(x, length.out=3)
>    ny <- length(y)
>    r <- outer(x,y)
>    dimnames(r) <- list(p0("r",1:3), p0("C",
seq_len(ny)))
>    r
> }
> 
> and
> 
>> (v <- structure(10*(5:8), names=LETTERS[1:4]))
> A  B  C  D 
> 50 60 70 80 
> 
> if we let sapply() not simplify, we see the list of same size
> matrices it produes:
> 
>> sapply(v, myF, y = 2*(1:5), simplify=FALSE)
> $A
>    C1  C2  C3  C4  C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
> 
> $B
>    C1  C2  C3  C4  C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
> 
> $C
>    C1  C2  C3  C4  C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
> 
> $D
>    C1  C2  C3  C4  C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
> 
> However, quite deceptively
> 
>> sapply(v, myF, y = 2*(1:5))
>        A   B   C   D
> [1,] 100 120 140 160
> [2,] 100 120 140 160
> [3,] 100 120 140 160
> [4,] 200 240 280 320
> [5,] 200 240 280 320
> [6,] 200 240 280 320
> [7,] 300 360 420 480
> [8,] 300 360 420 480
> [9,] 300 360 420 480
> [10,] 400 480 560 640
> [11,] 400 480 560 640
> [12,] 400 480 560 640
> [13,] 500 600 700 800
> [14,] 500 600 700 800
> [15,] 500 600 700 800
> 
> 
> My proposal -- implemented and "make check" tested --
> is to add an optional argument  'ARRAY'
> which allows
> 
>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
> , , A
> 
>    C1  C2  C3  C4  C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
> 
> , , B
> 
>    C1  C2  C3  C4  C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
> 
> , , C
> 
>    C1  C2  C3  C4  C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
> 
> , , D
> 
>    C1  C2  C3  C4  C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
> 
>> 
> -----------
> 
> In the best of all worlds, the default would be 'ARRAY = TRUE',
> but of course, given the long-standing different behavior,
> it seem much too "risky", and my proposal includes remaining
> back-compatible with default ARRAY = FALSE.
> 
> Martin Maechler,
> ETH Zurich

Seems to me to be a reasonable proposal Martin, obviously with the proviso that
the current default behavior is unaltered, as you note.

Regards,

Marc

Hadley Wickham

2010-Dec-01 14:26 UTC

head link

[Rd] RFC: sapply() limitation from vector to matrix, but not further

I think an even better approach would be to extract the
"simplification" component out of sapply, so that could write

sapply <- function(...) simplify(lapply(...))

(although obviously some arguments would go to lapply and some to simplify).

The advantage of this would be that you could use the same
simplification algorithm in other places.

Hadley

On Wed, Dec 1, 2010 at 8:39 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:> sapply() stems from S / S+ times and hence has a long tradition.
> In spite of that I think that it should be enhanced...
>
> As the subject mentions, sapply() produces a matrix in cases
> where the list components of the lapply(.) results are of the
> same length (and ...).
> However, it unfortunately "stops there".
> E.g., if you *nest* two sapply() calls where the inner one
> produces a matrix, very often the logical behavior would be for
> the outer sapply() to stack these matrices into an array of
> rank 3 ["array rank"(x) := length(dim(x))].
> However it does not do that, e.g., an artifical example
>
> p0 <- function(...) paste(..., sep="")
> myF <- function(x,y) {
> ? ?stopifnot(length(x) <= 3)
> ? ?x <- rep(x, length.out=3)
> ? ?ny <- length(y)
> ? ?r <- outer(x,y)
> ? ?dimnames(r) <- list(p0("r",1:3), p0("C",
seq_len(ny)))
> ? ?r
> }
>
> and
>
>> (v <- structure(10*(5:8), names=LETTERS[1:4]))
> ?A ?B ?C ?D
> 50 60 70 80
>
> if we let sapply() not simplify, we see the list of same size
> matrices it produes:
>
>> sapply(v, myF, y = 2*(1:5), simplify=FALSE)
> $A
> ? ?C1 ?C2 ?C3 ?C4 ?C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
>
> $B
> ? ?C1 ?C2 ?C3 ?C4 ?C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
>
> $C
> ? ?C1 ?C2 ?C3 ?C4 ?C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
>
> $D
> ? ?C1 ?C2 ?C3 ?C4 ?C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
>
> However, quite deceptively
>
>> sapply(v, myF, y = 2*(1:5))
> ? ? ? ?A ? B ? C ? D
> ?[1,] 100 120 140 160
> ?[2,] 100 120 140 160
> ?[3,] 100 120 140 160
> ?[4,] 200 240 280 320
> ?[5,] 200 240 280 320
> ?[6,] 200 240 280 320
> ?[7,] 300 360 420 480
> ?[8,] 300 360 420 480
> ?[9,] 300 360 420 480
> [10,] 400 480 560 640
> [11,] 400 480 560 640
> [12,] 400 480 560 640
> [13,] 500 600 700 800
> [14,] 500 600 700 800
> [15,] 500 600 700 800
>
>
> My proposal -- implemented and "make check" tested --
> is to add an optional argument ?'ARRAY'
> which allows
>
>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
> , , A
>
> ? ?C1 ?C2 ?C3 ?C4 ?C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
>
> , , B
>
> ? ?C1 ?C2 ?C3 ?C4 ?C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
>
> , , C
>
> ? ?C1 ?C2 ?C3 ?C4 ?C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
>
> , , D
>
> ? ?C1 ?C2 ?C3 ?C4 ?C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
>
>>
> -----------
>
> In the best of all worlds, the default would be 'ARRAY = TRUE',
> but of course, given the long-standing different behavior,
> it seem much too "risky", and my proposal includes remaining
> back-compatible with default ARRAY = FALSE.
>
> Martin Maechler,
> ETH Zurich
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Gabor Grothendieck

2010-Dec-27 22:06 UTC

head link

[Rd] RFC: sapply() limitation from vector to matrix, but not further

On Wed, Dec 1, 2010 at 3:39 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:> My proposal -- implemented and "make check" tested --
> is to add an optional argument ?'ARRAY'
> which allows
>
>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
It would reduce the proliferation of arguments if the simplifyargument were
extended to allow this, e.g. simplify = "array" or
perhaps simplify = n would allow a maximum of n dimensions.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Reasonably Related Threads

Search for more reasonably related threads

R devel - Dec 2010 - RFC: sapply() limitation from vector to matrix, but not further

[Rd] RFC: sapply() limitation from vector to matrix, but not further

[Rd] RFC: sapply() limitation from vector to matrix, but not further

[Rd] RFC: sapply() limitation from vector to matrix, but not further

[Rd] RFC: sapply() limitation from vector to matrix, but not further

Reasonably Related Threads