thr3ads.net - R devel - [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO) [May 2019]

If this information is useful, please help other people find it:
Share via:

Gabriel Becker

2019-May-16 22:47 UTC

[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Hi Hadley,

Thanks for the counterpoint. Response below.

On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at gmail.com>
wrote:
> The existing behaviour seems inutitive to me. I would consider these
> invariants for n vector x_i's each with size m:
>
> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>
Personally, no I wouldn't. I would consider m==0 a degenerate case, where
there is no data, but I personally find matrices (or data.frames) with rows
but no columns a very strange concept. The converse is not true, I
understand the utility of columns but no rows, particularly in the
data.frame case, but rows with no columns are observations we didn't
observe anything about. Strange, imho.

Also, I know that you said *each with size m*, but the generalization would
be

for n vectors with m = max(length(x_i))
nrow(rbind(x_1, ..., x_n)) = m

And that is the behavior now as documented, but *only* when length(x_i) >0
for all i (or, currently, when m == 0, so all vectors are length 0).
> nrow(rbind(1:5, numeric()))
[1] 1

So that is where I was coming from. Length-zero vectors don't add rows
because they contain no observed information.

I do see where you'er coming from, but it does make interrogating
nrow(rbind(x_1, ..., x_n)) NOT mean  (give me the number of observations
for which I have data), which is what it means in non-degenerate contexts,
and that seems pretty important too.

Robin does also have an interesting point below about argument names, but
I'll leave that for another mail.

Best,
~G

	[[alternative HTML version deleted]]

Gabriel Becker

2019-May-16 23:11 UTC

head link

[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

On Thu, May 16, 2019 at 3:47 PM Gabriel Becker <gabembecker at gmail.com>
wrote:
> Hi Hadley,
>
> Thanks for the counterpoint. Response below.
>
> On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at
gmail.com>
> wrote:
>
>> The existing behaviour seems inutitive to me. I would consider these
>> invariants for n vector x_i's each with size m:
>>
>> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>>
>
> Personally, no I wouldn't. I would consider m==0 a degenerate case,
where
> there is no data, but I personally find matrices (or data.frames) with rows
> but no columns a very strange concept. The converse is not true, I
> understand the utility of columns but no rows, particularly in the
> data.frame case, but rows with no columns are observations we didn't
> observe anything about. Strange, imho.
>
> Also, I know that you said *each with size m*, but the generalization
> would be
>
> for n vectors with m = max(length(x_i))
> nrow(rbind(x_1, ..., x_n)) = m
>
Ugh, obviously that should say ==n, not =m and then we have
ncol(rbind(x_1, ..., x_n)) == m

~G


>
> And that is the behavior now as documented, but *only* when length(x_i)
> >0 for all i (or, currently, when m == 0, so all vectors are length 0).
>
> > nrow(rbind(1:5, numeric()))
>
> [1] 1
>
>
> So that is where I was coming from. Length-zero vectors don't add rows
> because they contain no observed information.
>
> I do see where you'er coming from, but it does make interrogating
> nrow(rbind(x_1, ..., x_n)) NOT mean  (give me the number of observations
> for which I have data), which is what it means in non-degenerate contexts,
> and that seems pretty important too.
>
> Robin does also have an interesting point below about argument names, but
> I'll leave that for another mail.
>
> Best,
> ~G
>
	[[alternative HTML version deleted]]

Martin Maechler

2019-May-17 07:32 UTC

head link

[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

>>>>> Gabriel Becker 
>>>>>     on Thu, 16 May 2019 15:47:57 -0700 writes:
    > Hi Hadley,
    > Thanks for the counterpoint. Response below.

    > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at
gmail.com> wrote:

    >> The existing behaviour seems inutitive to me. I would consider
these
    >> invariants for n vector x_i's each with size m:
    >> 
    >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
    >> 

    > Personally, no I wouldn't. I would consider m==0 a degenerate case,
where
    > there is no data, but I personally find matrices (or data.frames) with
rows
    > but no columns a very strange concept. The converse is not true, I
    > understand the utility of columns but no rows, particularly in the
    > data.frame case, but rows with no columns are observations we
didn't
    > observe anything about. Strange, imho.

Gabe, here I have to very strongly disagree.

Matrices (and higher order Arrays)  are  always definitely to
behave "symmetrically" / "uniformly" with respect to all of
their dimensions.

We (and the S developers before us) have always taken a lot of
care trying to ensure that this is true.

So for the matrix case, if rows and columns behaved differently
that would be a bug "by definition".

Of course there's one thing where this uniformity / symmetry
must be violated: in the coercion from and to atomic vectors:
There, 'by column' (generalized for arrays to "earlier dimensions
vary faster
than later one") has been chosen, not the least because this had
been adapted for Fortran (first, AFAIK) and all related ABIs
dealing with Matrix vector arithmetic for very good (numerical,
performance, known convention) reasons that enabled to know how
fast numerical linear algebra should be implemented.

Martin

Gabriel Becker

2019-May-17 08:06 UTC

head link

[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Hi Martin,

Thanks for chiming in. Responses inline.

On Fri, May 17, 2019 at 12:32 AM Martin Maechler <maechler at
stat.math.ethz.ch>
wrote:
> >>>>> Gabriel Becker
> >>>>>     on Thu, 16 May 2019 15:47:57 -0700 writes:
>
>     > Hi Hadley,
>     > Thanks for the counterpoint. Response below.
>
>     > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at
gmail.com>
> wrote:
>
>     >> The existing behaviour seems inutitive to me. I would consider
these
>     >> invariants for n vector x_i's each with size m:
>     >>
>     >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>     >>
>
>     > Personally, no I wouldn't. I would consider m==0 a degenerate
case,
> where
>     > there is no data, but I personally find matrices (or data.frames)
> with rows
>     > but no columns a very strange concept. The converse is not true, I
>     > understand the utility of columns but no rows, particularly in the
>     > data.frame case, but rows with no columns are observations we
didn't
>     > observe anything about. Strange, imho.
>
> Gabe, here I have to very strongly disagree.
>
> Matrices (and higher order Arrays)  are  always definitely to
> behave "symmetrically" / "uniformly" with respect to
all of their
> dimensions.
>
> We (and the S developers before us) have always taken a lot of
> care trying to ensure that this is true.
>
> So for the matrix case, if rows and columns behaved differently
> that would be a bug "by definition".
>
I realize now I could have been  clearer/more  explicit about this, but I
wasn't  arguing that the behavior should be different between columns and
rows, just that the behavior in the rows case didn't necessarily make a ton
of sense to me.  I was arguing that a change to both rbind and cbind be
considered when all length zero vectors are passed, not that rbind change
without cbind also changing. I will admit even here to feeling much more
strongly about the data.frame case.

That said, I do see that the cbind/columns argument seems harder (though
not impossible) for me to make. And maybe that's a good enough reason not
to consider such a change, because as I say, I agree the symmetry is
important, and would (also) want  cbind to change the same way rbind did if
such a change  happened, and that might bother many? more people than the
rbind case would. Maybe not though, based on the other responses in the
thread.

Honestly,  the most intuitive thing for me if you rbind or cbind a bunch of
length zero vectors together would be a  0x0 matrix, at  the very least in
the non-named arguments case. Its  a matrix with 0 elements in it, after
all. It seems perhaps that my intuition  is just somewhat  non-standard
though.

> Of course there's one thing where this uniformity / symmetry
> must be violated: in the coercion from and to atomic vectors:
> There, 'by column' (generalized for arrays to "earlier
dimensions vary
> faster
> than later one") has been chosen, not the least because this had
> been adapted for Fortran (first, AFAIK) and all related ABIs
> dealing with Matrix vector arithmetic for very good (numerical,
> performance, known convention) reasons that enabled to know how
> fast numerical linear algebra should be implemented.
>
I do understand here, and would never suggest anything  that could damage
numerical linear algebra capabilities, in R or more broadly. That said, can
numerical algebra routines operate meaningfully in the degerate
one/both/all dimensions are 0 case anyway? Maybe they do, I'd be somewhat
surprised but not my area of expertise.

 Best,
~G
>
> Martin
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more maybe matching threads

R devel - May 2019 - nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Maybe Matching Threads