thr3ads.net - R devel - [Rd] head.matrix can return 1000s of columns -- limit to n or add new argument? [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Martin Maechler

2019-Sep-17 16:17 UTC

[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

>>>>> Fox, John 
>>>>>     on Tue, 17 Sep 2019 12:32:13 +0000 writes:
    > Dear Herve,
    > Sorry, I should have said "matrices" rather than "data
frames" -- brief() has methods for both.

    > Best,
    > John

    > -----------------------------
    > John Fox, Professor Emeritus
    > McMaster University
    > Hamilton, Ontario, Canada
    > Web: http::/socserv.mcmaster.ca/jfox

    >> On Sep 17, 2019, at 8:29 AM, Fox, John <jfox at mcmaster.ca>
wrote:
    >> 
    >> Dear Herve,
    >> 
    >> The brief() generic function in the car package does something very
similar to that for data frames (and has methods for other classes of objects as
well).
    >> 
    >> Best,
    >> John
    >> 
    >> -----------------------------
    >> John Fox, Professor Emeritus
    >> McMaster University
    >> Hamilton, Ontario, Canada
    >> Web: http::/socserv.mcmaster.ca/jfox
    >> 
    >>> On Sep 17, 2019, at 2:52 AM, Pages, Herve <hpages at
fredhutch.org> wrote:
    >>> 
    >>> Hi,
    >>> 
    >>> Alternatively, how about a new glance() generic that would do
something
    >>> like this:
    >>> 
    >>>> library(DelayedArray)
    >>>> glance <- DelayedArray:::show_compact_array
    >>> 
    >>>> M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
    >>>> glance(M)
    >>> <1000 x 2000> matrix object of type "double":
    >>> [,1]        [,2]        [,3] ...    [,1999]    [,2000]
    >>> [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593 
0.4684985
    >>> [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504 
0.4300557
    >>> [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249 
0.2608300
    >>> [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017 
1.3417423
    >>> [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788 
0.1741957
    >>> ...           .           .           .   .          .         
.
    >>> [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639
-0.8130713
    >>> [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923
-1.6287694
    >>> [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962 
0.2552267
    >>> [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121
-1.1695501
    >>> [1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430
-0.1703396
    >>> 
    >>>> A <- array(rnorm(1e6), c(50, 20, 10, 100))
    >>>> glance(A)
    >>> <50 x 20 x 10 x 100> array object of type
"double":
    >>> ,,1,1
    >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
    >>> [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
    >>> [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
    >>> ...          .          .          .   .          .          .
    >>> [49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292 
0.5215745
    >>> [50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076 
0.1437394
    >>> 
    >>> ...
    >>> 
    >>> ,,10,100
    >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
    >>> [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
    >>> [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
    >>> ...          .          .          .   .          .          .
    >>> [49,]  0.6266209  0.3778512  1.4995778   . -0.3820651
-1.4241691
    >>> [50,]  1.9218715  3.5475949  0.5963763   .  0.4005210 
0.4385623
    >>> 
    >>> H.

Thank you, Herv? and John.
Both glance() and brief() are nice, and I think a version of one of
them could also make a nice addition to the 'utils' package.

However, there's a principal difference between them and the
proposed generalized head {or tail} :
The latter really does *return* a sub matrix/array of chosen
dimensions with modified dimnames and that *object* then is
printed if not assigned.

OTOH,  glance() and brief() rather are versions of print()
and I think have a dedicated "display-only" purpose {yes, I see they
do
return something; glance() returning a character object, brief()
returning the principal argument invisibly, the same as any
"correct" print() method..}
>From the above, I think it may make sense to entertain both ageneralization of head() and one such a glance() / brief()
/.. function which for a matrix shows all 4 corners of the
matrix of data frame.

There's another important criterion here:  __Simplicity__ in the
code that's added (and will have to be maintained as part of R
"forever" into the future)...
AFAICS, the DelayedArray stuff is beatifully modular, but
possibly also much entangled in the dependent packages and classes we
cannot require for 'utils'.

The current source for head() and tail() and all their methods
in utils is just 83 lines of code  {file utils/R/head.R minus
the initial mostly copyright comments}.
I am very reluctant to consider blowing that up by factors...


Martin

    >>> On 9/16/19 00:54, Michael Chirico wrote:
    >>>> Awesome. Gabe, since you already have a workshopped
version, would you like
    >>>> to proceed? Feel free to ping me to review the patch once
it's posted.
    >>>> 
    >>>> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler
<maechler at stat.math.ethz.ch>
    >>>> wrote:
    >>>> 
    >>>>>>>>>> Michael Chirico
    >>>>>>>>>> on Sun, 15 Sep 2019 20:52:34 +0800
writes:
    >>>>> >>>>> Finally read in detail your response Gabe. Looks great,
>>>>> and I agree it's quite intuitive, as well as agree
against
>>>>> non-recycling.
    >>>>> >>>>> Once the length(n) == length(dim(x)) behavior is enabled,
>>>>> I don't think there's any need/desire to have
head() do
>>>>> x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
>>>>> those familiar with head(x, 6), it would seem to me.
    >>>>> >>>>> Mike C    >>>>> 
    >>>>> Thank you, Gabe, and Michael.
    >>>>> I did like Gabe's proposal already back in July but
was
    >>>>> busy and/or vacationing then ...
    >>>>> 
    >>>>> If you submit this with a patch (that includes changes
to both
    >>>>> *.R and *.Rd , including some example) as
"wishlist" item to R's
    >>>>> bugzilla, I'm willing/happy to check and commit
this to R-devel.
    >>>>> 
    >>>>> Martin
    >>>>> 
    >>>>> >>>>> On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
>>>>> <gabembecker at gmail.com> wrote:    >>>>> 
    >>>>>>> Hi Michael and Abby,
    >>>>>>> 
    >>>>>>> So one thing that could happen that would be
backwards
    >>>>>>> compatible (with the exception of something
that was an
    >>>>>>> error no longer being an error) is head and
tail could
    >>>>>>> take vectors of length (dim(x)) rather than
integers of
    >>>>>>> length for n, with the default being n=6 being
equivalent
    >>>>>>> to n = c(6, dim(x)[2], <...>, dim(x)[k]),
at least for
    >>>>>>> the deprecation cycle, if not permanently. It
not
    >>>>>>> recycling would be unexpected based on the
behavior of
    >>>>>>> many R functions but would preserve the current
behavior
    >>>>>>> while granting more fine-grained control to
users that
    >>>>>>> feel they need it.
    >>>>>>> 
    >>>>>>> A rapidly thrown-together prototype of such a
method for
    >>>>>>> the head of a matrix case is as follows:
    >>>>>>> 
    >>>>>>> head2 = function(x, n = 6L, ...) { indvecs    
>>>>>>> lapply(seq_along(dim(x)), function(i) {
if(length(n) >    >>>>>>> i) { ni = n[i] } else { ni =
dim(x)[i] } if(ni < 0L) ni     >>>>>>> max(nrow(x) + ni,
0L) else ni = min(ni, dim(x)[i])
    >>>>>>> seq_len(ni) }) lstargs = c(list(x),indvecs,
drop = FALSE)
    >>>>>>> do.call("[", lstargs) }
    >>>>>>> 
    >>>>>>> 
    >>>>>>>> mat = matrix(1:100, 10, 10)
    >>>>>>> 
    >>>>>>>> *head(mat)*
    >>>>>>> 
    >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[,10]
    >>>>>>> 
    >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
    >>>>>>> 
    >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
    >>>>>>> 
    >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
    >>>>>>> 
    >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
    >>>>>>> 
    >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
    >>>>>>> 
    >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
    >>>>>>> 
    >>>>>>>> *head2(mat)*
    >>>>>>> 
    >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[,10]
    >>>>>>> 
    >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
    >>>>>>> 
    >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
    >>>>>>> 
    >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
    >>>>>>> 
    >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
    >>>>>>> 
    >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
    >>>>>>> 
    >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
    >>>>>>> 
    >>>>>>>> *head2(mat, c(2, 3))*
    >>>>>>> 
    >>>>>>> [,1] [,2] [,3]
    >>>>>>> 
    >>>>>>> [1,] 1 11 21
    >>>>>>> 
    >>>>>>> [2,] 2 12 22
    >>>>>>> 
    >>>>>>>> *head2(mat, c(2, -9))*
    >>>>>>> 
    >>>>>>> [,1]
    >>>>>>> 
    >>>>>>> [1,] 1
    >>>>>>> 
    >>>>>>> [2,] 2
    >>>>>>> 
    >>>>>>> 
    >>>>>>> Now one thing to keep in mind here, is that I
think we'd
    >>>>>>> either a) have to make the non-recycling
behavior
    >>>>>>> permanent, or b) have head treat data.frames
and matrices
    >>>>>>> different with respect to the subsets they grab
(which
    >>>>>>> strikes me as a *Bad Plan *(tm)).
    >>>>>>> 
    >>>>>>> So I don't think the default behavior would
ever be
    >>>>>>> mat[1:6, 1:6], not because of backwards
compatibility,
    >>>>>>> but because at least in my intuition that is
just not
    >>>>>>> what head on a data.frame should do by default,
and I
    >>>>>>> think the behaviors for the basic rectangular
datatypes
    >>>>>>> should "stick together". I mean, also
because of
    >>>>>>> backwards compatibility, but that could *in
theory*
    >>>>>>> change across a long enough deprecation cycle,
but the
    >>>>>>> conceptually right thing to do with a
data.frame probably
    >>>>>>> won't.
    >>>>>>> 
    >>>>>>> All of that said, is head(mat, c(6, 6)) really
that much
    >>>>>>> easier to type/better than just mat[1:6, 1:6,
drop=FALSE]
    >>>>>>> (I know this will behave differently if any of
the dims
    >>>>>>> of mat are less than 6, but if so why are you
heading it
    >>>>>>> in the first place ;) )? I don't really
have a strong
    >>>>>>> feeling on the answer to that.
    >>>>>>> 
    >>>>>>> I'm happy to put a patch for head.matrix,
    >>>>>>> head.data.frame, tail.matrix and
tail.data.frame, plus
    >>>>>>> documentation, if people on R-core are
interested in
    >>>>>>> this.
    >>>>>>> 
    >>>>>>> Note, as most here probably know, and as
alluded to
    >>>>>>> above, length(n) > 1 for head or tail
currently give an
    >>>>>>> error, so this would be an extension of the
existing
    >>>>>>> functionality in the mathematical extension
sense, where
    >>>>>>> all existing behavior would remain identical,
but the
    >>>>>>> support/valid parameter space would grow.
    >>>>>>> 
    >>>>>>> Best, ~G
    >>>>>>> 
    >>>>>>> 
    >>>>>>> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
    >>>>>>> <spurdle.a at gmail.com> wrote:
    >>>>>>> 
    >>>>>>>>> I assume there are lots of
backwards-compatibility
    >>>>>>>> issues as well as valid > use cases for
this behavior,
    >>>>>>>> so I guess defaulting to M[1:6, 1:6] is out
of > the
    >>>>>>>> question.
    >>>>>>>> 
    >>>>>>>> Agree.
    >>>>>>>> 
    >>>>>>>>> Is there any scope for adding a new
argument to
    >>>>>>>> head.matrix that would > allow this
flexibility?
    >>>>>>>> 
    >>>>>>>> I agree with what you're trying to
achieve.  However,
    >>>>>>>> I'm not sure this is as simple as
you're suggesting.
    >>>>>>>> 
    >>>>>>>> What if the user wants "head" in
rows but "tail" in
    >>>>>>>> columns.  Or "head" in rows, and
both "head" and "tail"
    >>>>>>>> in columns.  With head and tail alone,
there's a
    >>>>>>>> combinatorial explosion.
    >>>>>>>> 
    >>>>>>>> Also, when using tail on an unnamed matrix,
it may be
    >>>>>>>> desirable to name rows and columns.
    >>>>>>>> 
    >>>>>>>> And all of this assumes standard matrix
objects.  Add in
    >>>>>>>> a matrix subclasses and related objects,
and things get
    >>>>>>>> more complex still.
    >>>>>>>> 
    >>>>>>>> As I suggested in a another thread, a few
days ago, I'm
    >>>>>>>> planning to write an R package for matrices
and
    >>>>>>>> matrix-like objects (possibly extending the
Matrix
    >>>>>>>> package), with an initial emphasis on
subsetting,
    >>>>>>>> printing and formatting.  So, I'm
interested to hear
    >>>>>>>> more suggestions on this topic.
    >>>>>>>> 
    >>>>>>>> [[alternative HTML version deleted]]
    >>>>>>>> 
    >>>>>>>>
______________________________________________
    >>>>>>>> R-devel at r-project.org mailing list
    >>>>>>>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e
>>>>>>>>
    >>>>>>> 
    >>>>> >>>>> [[alternative HTML version deleted]]
    >>>>> >>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e
>>>>>    >>>> 
    >>>> [[alternative HTML version deleted]]
    >>>> 
    >>>> ______________________________________________
    >>>> R-devel at r-project.org mailing list
    >>>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e
>>>>
    >>> 
    >>> -- 
    >>> Herv? Pag?s
    >>> 
    >>> Program in Computational Biology
    >>> Division of Public Health Sciences
    >>> Fred Hutchinson Cancer Research Center
    >>> 1100 Fairview Ave. N, M1-B514
    >>> P.O. Box 19024
    >>> Seattle, WA 98109-1024
    >>> 
    >>> E-mail: hpages at fredhutch.org
    >>> Phone:  (206) 667-5791
    >>> Fax:    (206) 667-1319
    >>> ______________________________________________
    >>> R-devel at r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

Gabriel Becker

2019-Oct-18 18:59 UTC

head link

[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Hi Martin et al.

Sorry for not getting back onto this sooner. I've been pretty well buried
under travel plus being sick for a bit, but I will be happy to roll up a
patch for this, including documentation and put it into a wishlist item.

I'll aim to do that at some point next week.

Thanks @Martin Maechler <maechler at stat.math.ethz.ch> for engaging with
us
and being willing to consider the patch.

Best,
~G

On Tue, Sep 17, 2019 at 9:17 AM Martin Maechler <maechler at
stat.math.ethz.ch>
wrote:
> >>>>> Fox, John
> >>>>>     on Tue, 17 Sep 2019 12:32:13 +0000 writes:
>
>     > Dear Herve,
>     > Sorry, I should have said "matrices" rather than
"data frames" --
> brief() has methods for both.
>
>     > Best,
>     > John
>
>     > -----------------------------
>     > John Fox, Professor Emeritus
>     > McMaster University
>     > Hamilton, Ontario, Canada
>     > Web: http::/socserv.mcmaster.ca/jfox
>
>     >> On Sep 17, 2019, at 8:29 AM, Fox, John <jfox at
mcmaster.ca> wrote:
>     >>
>     >> Dear Herve,
>     >>
>     >> The brief() generic function in the car package does something
very
> similar to that for data frames (and has methods for other classes of
> objects as well).
>     >>
>     >> Best,
>     >> John
>     >>
>     >> -----------------------------
>     >> John Fox, Professor Emeritus
>     >> McMaster University
>     >> Hamilton, Ontario, Canada
>     >> Web: http::/socserv.mcmaster.ca/jfox
>     >>
>     >>> On Sep 17, 2019, at 2:52 AM, Pages, Herve <hpages at
fredhutch.org>
> wrote:
>     >>>
>     >>> Hi,
>     >>>
>     >>> Alternatively, how about a new glance() generic that would
do
> something
>     >>> like this:
>     >>>
>     >>>> library(DelayedArray)
>     >>>> glance <- DelayedArray:::show_compact_array
>     >>>
>     >>>> M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
>     >>>> glance(M)
>     >>> <1000 x 2000> matrix object of type
"double":
>     >>> [,1]        [,2]        [,3] ...    [,1999]    [,2000]
>     >>> [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593 
0.4684985
>     >>> [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504 
0.4300557
>     >>> [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249 
0.2608300
>     >>> [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017 
1.3417423
>     >>> [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788 
0.1741957
>     >>> ...           .           .           .   .          .    
.
>     >>> [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639
> -0.8130713
>     >>> [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923
> -1.6287694
>     >>> [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962
> 0.2552267
>     >>> [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121
> -1.1695501
>     >>> [1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430
> -0.1703396
>     >>>
>     >>>> A <- array(rnorm(1e6), c(50, 20, 10, 100))
>     >>>> glance(A)
>     >>> <50 x 20 x 10 x 100> array object of type
"double":
>     >>> ,,1,1
>     >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
>     >>> [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189 
0.7968574
>     >>> [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161
-1.2768284
>     >>> ...          .          .          .   .          .       
.
>     >>> [49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292 
0.5215745
>     >>> [50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076 
0.1437394
>     >>>
>     >>> ...
>     >>>
>     >>> ,,10,100
>     >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
>     >>> [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435 
0.5640699
>     >>> [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287
-0.2897026
>     >>> ...          .          .          .   .          .       
.
>     >>> [49,]  0.6266209  0.3778512  1.4995778   . -0.3820651
-1.4241691
>     >>> [50,]  1.9218715  3.5475949  0.5963763   .  0.4005210 
0.4385623
>     >>>
>     >>> H.
>
> Thank you, Herv? and John.
> Both glance() and brief() are nice, and I think a version of one of
> them could also make a nice addition to the 'utils' package.
>
> However, there's a principal difference between them and the
> proposed generalized head {or tail} :
> The latter really does *return* a sub matrix/array of chosen
> dimensions with modified dimnames and that *object* then is
> printed if not assigned.
>
> OTOH,  glance() and brief() rather are versions of print()
> and I think have a dedicated "display-only" purpose {yes, I see
they do
> return something; glance() returning a character object, brief()
> returning the principal argument invisibly, the same as any
> "correct" print() method..}
>
> From the above, I think it may make sense to entertain both a
> generalization of head() and one such a glance() / brief()
> /.. function which for a matrix shows all 4 corners of the
> matrix of data frame.
>
> There's another important criterion here:  __Simplicity__ in the
> code that's added (and will have to be maintained as part of R
> "forever" into the future)...
> AFAICS, the DelayedArray stuff is beatifully modular, but
> possibly also much entangled in the dependent packages and classes we
> cannot require for 'utils'.
>
> The current source for head() and tail() and all their methods
> in utils is just 83 lines of code  {file utils/R/head.R minus
> the initial mostly copyright comments}.
> I am very reluctant to consider blowing that up by factors...
>
>
> Martin
>
>     >>> On 9/16/19 00:54, Michael Chirico wrote:
>     >>>> Awesome. Gabe, since you already have a workshopped
version,
> would you like
>     >>>> to proceed? Feel free to ping me to review the patch
once it's
> posted.
>     >>>>
>     >>>> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <
> maechler at stat.math.ethz.ch>
>     >>>> wrote:
>     >>>>
>     >>>>>>>>>> Michael Chirico
>     >>>>>>>>>> on Sun, 15 Sep 2019 20:52:34
+0800 writes:
>     >>>>>
> >>>>> Finally read in detail your response Gabe. Looks
great,
> >>>>> and I agree it's quite intuitive, as well as agree
against
> >>>>> non-recycling.
>     >>>>>
> >>>>> Once the length(n) == length(dim(x)) behavior is
enabled,
> >>>>> I don't think there's any need/desire to have
head() do
> >>>>> x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear
for
> >>>>> those familiar with head(x, 6), it would seem to me.
>     >>>>>
> >>>>> Mike C
>     >>>>>
>     >>>>> Thank you, Gabe, and Michael.
>     >>>>> I did like Gabe's proposal already back in
July but was
>     >>>>> busy and/or vacationing then ...
>     >>>>>
>     >>>>> If you submit this with a patch (that includes
changes to both
>     >>>>> *.R and *.Rd , including some example) as
"wishlist" item to R's
>     >>>>> bugzilla, I'm willing/happy to check and
commit this to R-devel.
>     >>>>>
>     >>>>> Martin
>     >>>>>
>     >>>>>
> >>>>> On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
> >>>>> <gabembecker at gmail.com> wrote:
>     >>>>>
>     >>>>>>> Hi Michael and Abby,
>     >>>>>>>
>     >>>>>>> So one thing that could happen that would
be backwards
>     >>>>>>> compatible (with the exception of
something that was an
>     >>>>>>> error no longer being an error) is head
and tail could
>     >>>>>>> take vectors of length (dim(x)) rather
than integers of
>     >>>>>>> length for n, with the default being n=6
being equivalent
>     >>>>>>> to n = c(6, dim(x)[2], <...>,
dim(x)[k]), at least for
>     >>>>>>> the deprecation cycle, if not permanently.
It not
>     >>>>>>> recycling would be unexpected based on the
behavior of
>     >>>>>>> many R functions but would preserve the
current behavior
>     >>>>>>> while granting more fine-grained control
to users that
>     >>>>>>> feel they need it.
>     >>>>>>>
>     >>>>>>> A rapidly thrown-together prototype of
such a method for
>     >>>>>>> the head of a matrix case is as follows:
>     >>>>>>>
>     >>>>>>> head2 = function(x, n = 6L, ...) { indvecs
>     >>>>>>> lapply(seq_along(dim(x)), function(i) {
if(length(n) >>     >>>>>>> i) { ni = n[i] } else {
ni = dim(x)[i] } if(ni < 0L) ni >     >>>>>>>
max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
>     >>>>>>> seq_len(ni) }) lstargs =
c(list(x),indvecs, drop = FALSE)
>     >>>>>>> do.call("[", lstargs) }
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>> mat = matrix(1:100, 10, 10)
>     >>>>>>>
>     >>>>>>>> *head(mat)*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[,9] [,10]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>     >>>>>>>
>     >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>     >>>>>>>
>     >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>     >>>>>>>
>     >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>     >>>>>>>
>     >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>     >>>>>>>
>     >>>>>>>> *head2(mat)*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[,9] [,10]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>     >>>>>>>
>     >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>     >>>>>>>
>     >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>     >>>>>>>
>     >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>     >>>>>>>
>     >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>     >>>>>>>
>     >>>>>>>> *head2(mat, c(2, 3))*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22
>     >>>>>>>
>     >>>>>>>> *head2(mat, c(2, -9))*
>     >>>>>>>
>     >>>>>>> [,1]
>     >>>>>>>
>     >>>>>>> [1,] 1
>     >>>>>>>
>     >>>>>>> [2,] 2
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> Now one thing to keep in mind here, is
that I think we'd
>     >>>>>>> either a) have to make the non-recycling
behavior
>     >>>>>>> permanent, or b) have head treat
data.frames and matrices
>     >>>>>>> different with respect to the subsets they
grab (which
>     >>>>>>> strikes me as a *Bad Plan *(tm)).
>     >>>>>>>
>     >>>>>>> So I don't think the default behavior
would ever be
>     >>>>>>> mat[1:6, 1:6], not because of backwards
compatibility,
>     >>>>>>> but because at least in my intuition that
is just not
>     >>>>>>> what head on a data.frame should do by
default, and I
>     >>>>>>> think the behaviors for the basic
rectangular datatypes
>     >>>>>>> should "stick together". I mean,
also because of
>     >>>>>>> backwards compatibility, but that could
*in theory*
>     >>>>>>> change across a long enough deprecation
cycle, but the
>     >>>>>>> conceptually right thing to do with a
data.frame probably
>     >>>>>>> won't.
>     >>>>>>>
>     >>>>>>> All of that said, is head(mat, c(6, 6))
really that much
>     >>>>>>> easier to type/better than just mat[1:6,
1:6, drop=FALSE]
>     >>>>>>> (I know this will behave differently if
any of the dims
>     >>>>>>> of mat are less than 6, but if so why are
you heading it
>     >>>>>>> in the first place ;) )? I don't
really have a strong
>     >>>>>>> feeling on the answer to that.
>     >>>>>>>
>     >>>>>>> I'm happy to put a patch for
head.matrix,
>     >>>>>>> head.data.frame, tail.matrix and
tail.data.frame, plus
>     >>>>>>> documentation, if people on R-core are
interested in
>     >>>>>>> this.
>     >>>>>>>
>     >>>>>>> Note, as most here probably know, and as
alluded to
>     >>>>>>> above, length(n) > 1 for head or tail
currently give an
>     >>>>>>> error, so this would be an extension of
the existing
>     >>>>>>> functionality in the mathematical
extension sense, where
>     >>>>>>> all existing behavior would remain
identical, but the
>     >>>>>>> support/valid parameter space would grow.
>     >>>>>>>
>     >>>>>>> Best, ~G
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> On Fri, Jul 12, 2019 at 4:03 PM Abby
Spurdle
>     >>>>>>> <spurdle.a at gmail.com> wrote:
>     >>>>>>>
>     >>>>>>>>> I assume there are lots of
backwards-compatibility
>     >>>>>>>> issues as well as valid > use cases
for this behavior,
>     >>>>>>>> so I guess defaulting to M[1:6, 1:6]
is out of > the
>     >>>>>>>> question.
>     >>>>>>>>
>     >>>>>>>> Agree.
>     >>>>>>>>
>     >>>>>>>>> Is there any scope for adding a
new argument to
>     >>>>>>>> head.matrix that would > allow this
flexibility?
>     >>>>>>>>
>     >>>>>>>> I agree with what you're trying to
achieve.  However,
>     >>>>>>>> I'm not sure this is as simple as
you're suggesting.
>     >>>>>>>>
>     >>>>>>>> What if the user wants
"head" in rows but "tail" in
>     >>>>>>>> columns.  Or "head" in rows,
and both "head" and "tail"
>     >>>>>>>> in columns.  With head and tail alone,
there's a
>     >>>>>>>> combinatorial explosion.
>     >>>>>>>>
>     >>>>>>>> Also, when using tail on an unnamed
matrix, it may be
>     >>>>>>>> desirable to name rows and columns.
>     >>>>>>>>
>     >>>>>>>> And all of this assumes standard
matrix objects.  Add in
>     >>>>>>>> a matrix subclasses and related
objects, and things get
>     >>>>>>>> more complex still.
>     >>>>>>>>
>     >>>>>>>> As I suggested in a another thread, a
few days ago, I'm
>     >>>>>>>> planning to write an R package for
matrices and
>     >>>>>>>> matrix-like objects (possibly
extending the Matrix
>     >>>>>>>> package), with an initial emphasis on
subsetting,
>     >>>>>>>> printing and formatting.  So, I'm
interested to hear
>     >>>>>>>> more suggestions on this topic.
>     >>>>>>>>
>     >>>>>>>> [[alternative HTML version deleted]]
>     >>>>>>>>
>     >>>>>>>>
______________________________________________
>     >>>>>>>> R-devel at r-project.org mailing list
>     >>>>>>>>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e>
>>>>>>>>
>     >>>>>>>
>     >>>>>
> >>>>> [[alternative HTML version deleted]]
>     >>>>>
> >>>>> ______________________________________________
> >>>>> R-devel at r-project.org mailing list
> >>>>>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e>
>>>>>
>     >>>>
>     >>>> [[alternative HTML version deleted]]
>     >>>>
>     >>>> ______________________________________________
>     >>>> R-devel at r-project.org mailing list
>     >>>>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e>
>>>>
>     >>>
>     >>> --
>     >>> Herv? Pag?s
>     >>>
>     >>> Program in Computational Biology
>     >>> Division of Public Health Sciences
>     >>> Fred Hutchinson Cancer Research Center
>     >>> 1100 Fairview Ave. N, M1-B514
>     >>> P.O. Box 19024
>     >>> Seattle, WA 98109-1024
>     >>>
>     >>> E-mail: hpages at fredhutch.org
>     >>> Phone:  (206) 667-5791
>     >>> Fax:    (206) 667-1319
>     >>> ______________________________________________
>     >>> R-devel at r-project.org mailing list
>     >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>
>     >> ______________________________________________
>     >> R-devel at r-project.org mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Gabriel Becker

2019-Oct-29 19:43 UTC

head link

[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Hi all,

So I've started working on this and I ran into something that I didn't
know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
ignore dimension completely, treat x as an atomic vector, and return an
(unclassed) atomic vector:
> x = array(100, c(4, 5, 5))
> dim(x)
[1] 4 5 5
> head(x, 1)
[1] 100
> class(head(x))
[1] "numeric"


(For a 1d array, it does return another 1d array).

When extending head/tail to understand multiple dimensions as discussed in
this thread, then, should the behavior for 2+d arrays be explicitly
retained, or should head and tail do the analogous thing (with a head(<2d
array>) behaving the same as head(<matrix>), which honestly is what I
expected to already be happening)?

Are people using/relying on this behavior in their code, and if so, why/for
what?

Even more generally, one way forward is to have the default methods check
for dimensions, and use length if it is null:

tail.default <- tail.data.frame <- function(x, n = 6L, ...)
{
    if(any(n == 0))
        stop("n must be non-zero or unspecified for all dimensions")
    if(!is.null(dim(x)))
        dimsx <- dim(x)
    else
        dimsx <- length(x)

    ## this returns a list of vectors of indices in each
    ## dimension, regardless of length of the the n
    ## argument
    sel <- lapply(seq_along(dimsx), function(i) {
        dxi <- dimsx[i]
        ## select all indices (full dim) if not specified
        ni <- if(length(n) >= i) n[i] else dxi
        ## handle negative ns
        ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
        seq.int(to = dxi, length.out = ni)
    })
    args <- c(list(x), sel, drop = FALSE)
    do.call("[", args)
}


I think this precludes the need for a separate data.frame method at all,
actually, though (I would think) tail.data.frame would still be defined and
exported for backwards compatibility. (the matrix method has some extra
bits so my current conception of it is still separate, though it might not
NEED to be).

The question then becomes, should head/tail always return something with
the same dimensionally (number of dims) it got, or should data.frame and
matrix be special cased in this regard, as they are now?

What are people's thoughts?
~G

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more apparently analagous threads

R devel - Oct 2019 - head.matrix can return 1000s of columns -- limit to n or add new argument?

[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

Maybe Matching Threads