Gabriel Becker
2019-May-16  22:47 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Hi Hadley, Thanks for the counterpoint. Response below. On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at gmail.com> wrote:> The existing behaviour seems inutitive to me. I would consider these > invariants for n vector x_i's each with size m: > > * nrow(rbind(x_1, x_2, ..., x_n)) equals n >Personally, no I wouldn't. I would consider m==0 a degenerate case, where there is no data, but I personally find matrices (or data.frames) with rows but no columns a very strange concept. The converse is not true, I understand the utility of columns but no rows, particularly in the data.frame case, but rows with no columns are observations we didn't observe anything about. Strange, imho. Also, I know that you said *each with size m*, but the generalization would be for n vectors with m = max(length(x_i)) nrow(rbind(x_1, ..., x_n)) = m And that is the behavior now as documented, but *only* when length(x_i) >0 for all i (or, currently, when m == 0, so all vectors are length 0).> nrow(rbind(1:5, numeric()))[1] 1 So that is where I was coming from. Length-zero vectors don't add rows because they contain no observed information. I do see where you'er coming from, but it does make interrogating nrow(rbind(x_1, ..., x_n)) NOT mean (give me the number of observations for which I have data), which is what it means in non-degenerate contexts, and that seems pretty important too. Robin does also have an interesting point below about argument names, but I'll leave that for another mail. Best, ~G [[alternative HTML version deleted]]
Gabriel Becker
2019-May-16  23:11 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
On Thu, May 16, 2019 at 3:47 PM Gabriel Becker <gabembecker at gmail.com> wrote:> Hi Hadley, > > Thanks for the counterpoint. Response below. > > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at gmail.com> > wrote: > >> The existing behaviour seems inutitive to me. I would consider these >> invariants for n vector x_i's each with size m: >> >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n >> > > Personally, no I wouldn't. I would consider m==0 a degenerate case, where > there is no data, but I personally find matrices (or data.frames) with rows > but no columns a very strange concept. The converse is not true, I > understand the utility of columns but no rows, particularly in the > data.frame case, but rows with no columns are observations we didn't > observe anything about. Strange, imho. > > Also, I know that you said *each with size m*, but the generalization > would be > > for n vectors with m = max(length(x_i)) > nrow(rbind(x_1, ..., x_n)) = m >Ugh, obviously that should say ==n, not =m and then we have ncol(rbind(x_1, ..., x_n)) == m ~G> > And that is the behavior now as documented, but *only* when length(x_i) > >0 for all i (or, currently, when m == 0, so all vectors are length 0). > > > nrow(rbind(1:5, numeric())) > > [1] 1 > > > So that is where I was coming from. Length-zero vectors don't add rows > because they contain no observed information. > > I do see where you'er coming from, but it does make interrogating > nrow(rbind(x_1, ..., x_n)) NOT mean (give me the number of observations > for which I have data), which is what it means in non-degenerate contexts, > and that seems pretty important too. > > Robin does also have an interesting point below about argument names, but > I'll leave that for another mail. > > Best, > ~G >[[alternative HTML version deleted]]
Martin Maechler
2019-May-17  07:32 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
>>>>> Gabriel Becker >>>>> on Thu, 16 May 2019 15:47:57 -0700 writes:> Hi Hadley, > Thanks for the counterpoint. Response below. > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at gmail.com> wrote: >> The existing behaviour seems inutitive to me. I would consider these >> invariants for n vector x_i's each with size m: >> >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n >> > Personally, no I wouldn't. I would consider m==0 a degenerate case, where > there is no data, but I personally find matrices (or data.frames) with rows > but no columns a very strange concept. The converse is not true, I > understand the utility of columns but no rows, particularly in the > data.frame case, but rows with no columns are observations we didn't > observe anything about. Strange, imho. Gabe, here I have to very strongly disagree. Matrices (and higher order Arrays) are always definitely to behave "symmetrically" / "uniformly" with respect to all of their dimensions. We (and the S developers before us) have always taken a lot of care trying to ensure that this is true. So for the matrix case, if rows and columns behaved differently that would be a bug "by definition". Of course there's one thing where this uniformity / symmetry must be violated: in the coercion from and to atomic vectors: There, 'by column' (generalized for arrays to "earlier dimensions vary faster than later one") has been chosen, not the least because this had been adapted for Fortran (first, AFAIK) and all related ABIs dealing with Matrix vector arithmetic for very good (numerical, performance, known convention) reasons that enabled to know how fast numerical linear algebra should be implemented. Martin
Gabriel Becker
2019-May-17  08:06 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Hi Martin, Thanks for chiming in. Responses inline. On Fri, May 17, 2019 at 12:32 AM Martin Maechler <maechler at stat.math.ethz.ch> wrote:> >>>>> Gabriel Becker > >>>>> on Thu, 16 May 2019 15:47:57 -0700 writes: > > > Hi Hadley, > > Thanks for the counterpoint. Response below. > > > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at gmail.com> > wrote: > > >> The existing behaviour seems inutitive to me. I would consider these > >> invariants for n vector x_i's each with size m: > >> > >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n > >> > > > Personally, no I wouldn't. I would consider m==0 a degenerate case, > where > > there is no data, but I personally find matrices (or data.frames) > with rows > > but no columns a very strange concept. The converse is not true, I > > understand the utility of columns but no rows, particularly in the > > data.frame case, but rows with no columns are observations we didn't > > observe anything about. Strange, imho. > > Gabe, here I have to very strongly disagree. > > Matrices (and higher order Arrays) are always definitely to > behave "symmetrically" / "uniformly" with respect to all of their > dimensions. > > We (and the S developers before us) have always taken a lot of > care trying to ensure that this is true. > > So for the matrix case, if rows and columns behaved differently > that would be a bug "by definition". >I realize now I could have been clearer/more explicit about this, but I wasn't arguing that the behavior should be different between columns and rows, just that the behavior in the rows case didn't necessarily make a ton of sense to me. I was arguing that a change to both rbind and cbind be considered when all length zero vectors are passed, not that rbind change without cbind also changing. I will admit even here to feeling much more strongly about the data.frame case. That said, I do see that the cbind/columns argument seems harder (though not impossible) for me to make. And maybe that's a good enough reason not to consider such a change, because as I say, I agree the symmetry is important, and would (also) want cbind to change the same way rbind did if such a change happened, and that might bother many? more people than the rbind case would. Maybe not though, based on the other responses in the thread. Honestly, the most intuitive thing for me if you rbind or cbind a bunch of length zero vectors together would be a 0x0 matrix, at the very least in the non-named arguments case. Its a matrix with 0 elements in it, after all. It seems perhaps that my intuition is just somewhat non-standard though.> Of course there's one thing where this uniformity / symmetry > must be violated: in the coercion from and to atomic vectors: > There, 'by column' (generalized for arrays to "earlier dimensions vary > faster > than later one") has been chosen, not the least because this had > been adapted for Fortran (first, AFAIK) and all related ABIs > dealing with Matrix vector arithmetic for very good (numerical, > performance, known convention) reasons that enabled to know how > fast numerical linear algebra should be implemented. >I do understand here, and would never suggest anything that could damage numerical linear algebra capabilities, in R or more broadly. That said, can numerical algebra routines operate meaningfully in the degerate one/both/all dimensions are 0 case anyway? Maybe they do, I'd be somewhat surprised but not my area of expertise. Best, ~G> > Martin >[[alternative HTML version deleted]]
Seemingly Similar Threads
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)