Gabriel Becker
2019-May-16 20:17 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Hi all, Apologies if this has been asked before (a quick google didn't find it for me),and I know this is a case of behaving as documented but its so unintuitive (to me at least) that I figured I'd bring it up here anyway. I figure its probably going to not be changed, but I'm happy to submit a patch if this is something R-core feels can/should change. So I recently got bitten by the fact that> nrow(rbind(character(), character()))[1] 2 I was checking whether the result of an rbind call had more than one row, and that unexpected returned true, causing all sorts of shenanigans downstream as I'm sure you can imagine. Now I know that from ?rbind For ?cbind? (?rbind?), vectors of zero length (including ?NULL?)> > are ignored unless the result would have zero rows (columns), for > > S compatibility. (Zero-extent matrices do not occur in S3 and are > > not ignored in R.) >But there's a couple of things here. First, for the rowbind case this reads as "if there would be zero columns, the vectors will not be ignored". This wording implies to me that not ignoring the vectors is a remedy to the "problem" of the potential for a zero-column return, but thats not the case. The result still has 0 columns, it just does not also have zero rows. So even if the behavior is not changed, perhaps this wording can be massaged for clarity? The other issue, which I admit is likely a problem with my intuition, but which I don't think I'm alone in having, is that even if I can't have a 0x0 matrix (which is what I'd prefer) I would have expected/preferred a 1x0 matrix, the reasoning being that if we must avoid a 0x0 return value, we would do the minimum required to avoid, which is to not ignore the first length 0 vector, to ensure a non-zero-extent matrix, but then ignore the remaining ones as they contain information for 0 new rows. Of course I can program around this now that I know the behavior, but again, its so unintuitive (even for someone with a fairly well developed intuition for R's sometimes "quirky" behavior) that I figured I'd bring it up. Thoughts? Best, ~G [[alternative HTML version deleted]]
Hadley Wickham
2019-May-16 20:59 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
The existing behaviour seems inutitive to me. I would consider these invariants for n vector x_i's each with size m: * nrow(rbind(x_1, x_2, ..., x_n)) equals n * ncol(rbind(x_1, x_2, ..., x_n)) equals m Additionally, wouldn't you expect rbind(x_1[i], x_2[i]) to equal rbind(x_1, x_2)[, i, drop = FALSE] ? Hadley On Thu, May 16, 2019 at 3:26 PM Gabriel Becker <gabembecker at gmail.com> wrote:> > Hi all, > > Apologies if this has been asked before (a quick google didn't find it for > me),and I know this is a case of behaving as documented but its so > unintuitive (to me at least) that I figured I'd bring it up here anyway. I > figure its probably going to not be changed, but I'm happy to submit a > patch if this is something R-core feels can/should change. > > So I recently got bitten by the fact that > > > nrow(rbind(character(), character())) > > [1] 2 > > > I was checking whether the result of an rbind call had more than one row, > and that unexpected returned true, causing all sorts of shenanigans > downstream as I'm sure you can imagine. > > Now I know that from ?rbind > > For ?cbind? (?rbind?), vectors of zero length (including ?NULL?) > > > > are ignored unless the result would have zero rows (columns), for > > > > S compatibility. (Zero-extent matrices do not occur in S3 and are > > > > not ignored in R.) > > > > But there's a couple of things here. First, for the rowbind case this > reads as "if there would be zero columns, the vectors will not be > ignored". This wording implies to me that not ignoring the vectors is a > remedy to the "problem" of the potential for a zero-column return, but > thats not the case. The result still has 0 columns, it just does not also > have zero rows. So even if the behavior is not changed, perhaps this > wording can be massaged for clarity? > > The other issue, which I admit is likely a problem with my intuition, but > which I don't think I'm alone in having, is that even if I can't have a 0x0 > matrix (which is what I'd prefer) I would have expected/preferred a 1x0 > matrix, the reasoning being that if we must avoid a 0x0 return value, we > would do the minimum required to avoid, which is to not ignore the first > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the > remaining ones as they contain information for 0 new rows. > > Of course I can program around this now that I know the behavior, but > again, its so unintuitive (even for someone with a fairly well developed > intuition for R's sometimes "quirky" behavior) that I figured I'd bring it > up. > > Thoughts? > > Best, > ~G > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- http://hadley.nz
robin hankin
2019-May-16 21:25 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Gabriel, you ask an insightful and instructive question. One of R's great strengths is that we have a forum where this kind of edge-case can be fruitfully discussed. My interest in this would be the names of the arguments; in the magic package I make heavy use of the dimnames of zero-extent arrays.> rbind(a='x',b='y')[,1] a "x" b "y"> rbind(a='x',b=character())[,1] a "x"> rbind(a=character(),b=character())a b The first and third idiom are fine. The result of the second one, in which we rbind() a length-one to a length-zero vector, is desirable IMO on the grounds that the content of a two-row matrix cannot be defined sensibly, so R takes the perfectly reasonable stance of deciding to ignore the second argument...which carries with it the implication that the name ('b') be ignored too. If the second argument *could* be recycled, I would want the name, otherwise I wouldn't. And this is what R does. best wishes, hankin.robin at gmail.com hankin.robin at gmail.com On Fri, May 17, 2019 at 9:06 AM Hadley Wickham <h.wickham at gmail.com> wrote:> > The existing behaviour seems inutitive to me. I would consider these > invariants for n vector x_i's each with size m: > > * nrow(rbind(x_1, x_2, ..., x_n)) equals n > * ncol(rbind(x_1, x_2, ..., x_n)) equals m > > Additionally, wouldn't you expect rbind(x_1[i], x_2[i]) to equal > rbind(x_1, x_2)[, i, drop = FALSE] ? > > Hadley > > On Thu, May 16, 2019 at 3:26 PM Gabriel Becker <gabembecker at gmail.com> wrote: > > > > Hi all, > > > > Apologies if this has been asked before (a quick google didn't find it for > > me),and I know this is a case of behaving as documented but its so > > unintuitive (to me at least) that I figured I'd bring it up here anyway. I > > figure its probably going to not be changed, but I'm happy to submit a > > patch if this is something R-core feels can/should change. > > > > So I recently got bitten by the fact that > > > > > nrow(rbind(character(), character())) > > > > [1] 2 > > > > > > I was checking whether the result of an rbind call had more than one row, > > and that unexpected returned true, causing all sorts of shenanigans > > downstream as I'm sure you can imagine. > > > > Now I know that from ?rbind > > > > For ?cbind? (?rbind?), vectors of zero length (including ?NULL?) > > > > > > are ignored unless the result would have zero rows (columns), for > > > > > > S compatibility. (Zero-extent matrices do not occur in S3 and are > > > > > > not ignored in R.) > > > > > > > But there's a couple of things here. First, for the rowbind case this > > reads as "if there would be zero columns, the vectors will not be > > ignored". This wording implies to me that not ignoring the vectors is a > > remedy to the "problem" of the potential for a zero-column return, but > > thats not the case. The result still has 0 columns, it just does not also > > have zero rows. So even if the behavior is not changed, perhaps this > > wording can be massaged for clarity? > > > > The other issue, which I admit is likely a problem with my intuition, but > > which I don't think I'm alone in having, is that even if I can't have a 0x0 > > matrix (which is what I'd prefer) I would have expected/preferred a 1x0 > > matrix, the reasoning being that if we must avoid a 0x0 return value, we > > would do the minimum required to avoid, which is to not ignore the first > > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the > > remaining ones as they contain information for 0 new rows. > > > > Of course I can program around this now that I know the behavior, but > > again, its so unintuitive (even for someone with a fairly well developed > > intuition for R's sometimes "quirky" behavior) that I figured I'd bring it > > up. > > > > Thoughts? > > > > Best, > > ~G > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > http://hadley.nz > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Gabriel Becker
2019-May-16 22:47 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Hi Hadley, Thanks for the counterpoint. Response below. On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham at gmail.com> wrote:> The existing behaviour seems inutitive to me. I would consider these > invariants for n vector x_i's each with size m: > > * nrow(rbind(x_1, x_2, ..., x_n)) equals n >Personally, no I wouldn't. I would consider m==0 a degenerate case, where there is no data, but I personally find matrices (or data.frames) with rows but no columns a very strange concept. The converse is not true, I understand the utility of columns but no rows, particularly in the data.frame case, but rows with no columns are observations we didn't observe anything about. Strange, imho. Also, I know that you said *each with size m*, but the generalization would be for n vectors with m = max(length(x_i)) nrow(rbind(x_1, ..., x_n)) = m And that is the behavior now as documented, but *only* when length(x_i) >0 for all i (or, currently, when m == 0, so all vectors are length 0).> nrow(rbind(1:5, numeric()))[1] 1 So that is where I was coming from. Length-zero vectors don't add rows because they contain no observed information. I do see where you'er coming from, but it does make interrogating nrow(rbind(x_1, ..., x_n)) NOT mean (give me the number of observations for which I have data), which is what it means in non-degenerate contexts, and that seems pretty important too. Robin does also have an interesting point below about argument names, but I'll leave that for another mail. Best, ~G [[alternative HTML version deleted]]
Pages, Herve
2019-May-16 23:45 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Hi Gabe, ? ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) ? # [1] 2 ? ncol(data.frame(aa="a", AA="A")) ? # [1] 2 ? ncol(data.frame(aa=character(0), AA=character(0))) ? # [1] 2 ? ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) ? # [1] 2 ? ncol(cbind(aa="a", AA="A")) ? # [1] 2 ? ncol(cbind(aa=character(0), AA=character(0))) ? # [1] 2 ? nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) ? # [1] 2 ? nrow(rbind(aa="a", AA="A")) ? # [1] 2 ? nrow(rbind(aa=character(0), AA=character(0))) ? # [1] 2 hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or nrow(rbind(aa=character(0), AA=character(0))) should do anything different from what they do. In my experience, and more generally speaking, the desire to treat 0-length vectors as a special case that deviates from the non-zero-length case has never been productive. H. On 5/16/19 13:17, Gabriel Becker wrote:> Hi all, > > Apologies if this has been asked before (a quick google didn't find it for > me),and I know this is a case of behaving as documented but its so > unintuitive (to me at least) that I figured I'd bring it up here anyway. I > figure its probably going to not be changed, but I'm happy to submit a > patch if this is something R-core feels can/should change. > > So I recently got bitten by the fact that > >> nrow(rbind(character(), character())) > [1] 2 > > > I was checking whether the result of an rbind call had more than one row, > and that unexpected returned true, causing all sorts of shenanigans > downstream as I'm sure you can imagine. > > Now I know that from ?rbind > > For ?cbind? (?rbind?), vectors of zero length (including ?NULL?) >> are ignored unless the result would have zero rows (columns), for >> >> S compatibility. (Zero-extent matrices do not occur in S3 and are >> >> not ignored in R.) >> > But there's a couple of things here. First, for the rowbind case this > reads as "if there would be zero columns, the vectors will not be > ignored". This wording implies to me that not ignoring the vectors is a > remedy to the "problem" of the potential for a zero-column return, but > thats not the case. The result still has 0 columns, it just does not also > have zero rows. So even if the behavior is not changed, perhaps this > wording can be massaged for clarity? > > The other issue, which I admit is likely a problem with my intuition, but > which I don't think I'm alone in having, is that even if I can't have a 0x0 > matrix (which is what I'd prefer) I would have expected/preferred a 1x0 > matrix, the reasoning being that if we must avoid a 0x0 return value, we > would do the minimum required to avoid, which is to not ignore the first > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the > remaining ones as they contain information for 0 new rows. > > Of course I can program around this now that I know the behavior, but > again, its so unintuitive (even for someone with a fairly well developed > intuition for R's sometimes "quirky" behavior) that I figured I'd bring it > up. > > Thoughts? > > Best, > ~G > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Gabriel Becker
2019-May-17 00:48 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Hi Herve, Inline. On Thu, May 16, 2019 at 4:45 PM Pages, Herve <hpages at fredhutch.org> wrote:> Hi Gabe, > > ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > ncol(data.frame(aa="a", AA="A")) > # [1] 2 > > ncol(data.frame(aa=character(0), AA=character(0))) > # [1] 2 > > ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > ncol(cbind(aa="a", AA="A")) > # [1] 2 > > ncol(cbind(aa=character(0), AA=character(0))) > # [1] 2 > > nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > nrow(rbind(aa="a", AA="A")) > # [1] 2 > > nrow(rbind(aa=character(0), AA=character(0))) > # [1] 2 >Sure, but> nrow(rbind(aa = c("a", "b", "c"), AA = c("a", "b", "c")))[1] 2> nrow(rbind(aa = c("a", "b", "c"), AA = "a"))[1] 2> nrow(rbind(aa = c("a", "b", "c"), AA = character()))[1] 1 So even if I ultimately "lose" this debate (which really wouldn't shock me, even if R-core did agree with me there's backwards compatibility to consider), you have to concede that the current behavior is more complicated than the above is acknowledging. By rights of the invariance that you and Hadley are advocating, as far as I understand it, the last should give 2 rows, one of which is all NAs, rather than giving only one row as it currently does (and, I assume?, always has). So there are two different behavior patterns that could coherently (and internally-consistently) be generalized to apply to the rbind(character(), character()) case, not just one. I'm making the case that the other one (that length 0 vectors do not add rows because they don't contain data) would be equally valid, and to N>1 people, at least equally intuitive. Best, ~G> > hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or > nrow(rbind(aa=character(0), AA=character(0))) should do anything > different from what they do. > > In my experience, and more generally speaking, the desire to treat > 0-length vectors as a special case that deviates from the > non-zero-length case has never been productive. > > H. > > > On 5/16/19 13:17, Gabriel Becker wrote: > > Hi all, > > > > Apologies if this has been asked before (a quick google didn't find it > for > > me),and I know this is a case of behaving as documented but its so > > unintuitive (to me at least) that I figured I'd bring it up here anyway. > I > > figure its probably going to not be changed, but I'm happy to submit a > > patch if this is something R-core feels can/should change. > > > > So I recently got bitten by the fact that > > > >> nrow(rbind(character(), character())) > > [1] 2 > > > > > > I was checking whether the result of an rbind call had more than one row, > > and that unexpected returned true, causing all sorts of shenanigans > > downstream as I'm sure you can imagine. > > > > Now I know that from ?rbind > > > > For ?cbind? (?rbind?), vectors of zero length (including ?NULL?) > >> are ignored unless the result would have zero rows (columns), for > >> > >> S compatibility. (Zero-extent matrices do not occur in S3 and are > >> > >> not ignored in R.) > >> > > But there's a couple of things here. First, for the rowbind case this > > reads as "if there would be zero columns, the vectors will not be > > ignored". This wording implies to me that not ignoring the vectors is a > > remedy to the "problem" of the potential for a zero-column return, but > > thats not the case. The result still has 0 columns, it just does not > also > > have zero rows. So even if the behavior is not changed, perhaps this > > wording can be massaged for clarity? > > > > The other issue, which I admit is likely a problem with my intuition, but > > which I don't think I'm alone in having, is that even if I can't have a > 0x0 > > matrix (which is what I'd prefer) I would have expected/preferred a 1x0 > > matrix, the reasoning being that if we must avoid a 0x0 return value, we > > would do the minimum required to avoid, which is to not ignore the first > > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the > > remaining ones as they contain information for 0 new rows. > > > > Of course I can program around this now that I know the behavior, but > > again, its so unintuitive (even for someone with a fairly well developed > > intuition for R's sometimes "quirky" behavior) that I figured I'd bring > it > > up. > > > > Thoughts? > > > > Best, > > ~G > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e> > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > >[[alternative HTML version deleted]]
Apparently Analagous Threads
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)