Gabriel Becker
2019-May-17 00:48 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Hi Herve, Inline. On Thu, May 16, 2019 at 4:45 PM Pages, Herve <hpages at fredhutch.org> wrote:> Hi Gabe, > > ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > ncol(data.frame(aa="a", AA="A")) > # [1] 2 > > ncol(data.frame(aa=character(0), AA=character(0))) > # [1] 2 > > ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > ncol(cbind(aa="a", AA="A")) > # [1] 2 > > ncol(cbind(aa=character(0), AA=character(0))) > # [1] 2 > > nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > nrow(rbind(aa="a", AA="A")) > # [1] 2 > > nrow(rbind(aa=character(0), AA=character(0))) > # [1] 2 >Sure, but> nrow(rbind(aa = c("a", "b", "c"), AA = c("a", "b", "c")))[1] 2> nrow(rbind(aa = c("a", "b", "c"), AA = "a"))[1] 2> nrow(rbind(aa = c("a", "b", "c"), AA = character()))[1] 1 So even if I ultimately "lose" this debate (which really wouldn't shock me, even if R-core did agree with me there's backwards compatibility to consider), you have to concede that the current behavior is more complicated than the above is acknowledging. By rights of the invariance that you and Hadley are advocating, as far as I understand it, the last should give 2 rows, one of which is all NAs, rather than giving only one row as it currently does (and, I assume?, always has). So there are two different behavior patterns that could coherently (and internally-consistently) be generalized to apply to the rbind(character(), character()) case, not just one. I'm making the case that the other one (that length 0 vectors do not add rows because they don't contain data) would be equally valid, and to N>1 people, at least equally intuitive. Best, ~G> > hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or > nrow(rbind(aa=character(0), AA=character(0))) should do anything > different from what they do. > > In my experience, and more generally speaking, the desire to treat > 0-length vectors as a special case that deviates from the > non-zero-length case has never been productive. > > H. > > > On 5/16/19 13:17, Gabriel Becker wrote: > > Hi all, > > > > Apologies if this has been asked before (a quick google didn't find it > for > > me),and I know this is a case of behaving as documented but its so > > unintuitive (to me at least) that I figured I'd bring it up here anyway. > I > > figure its probably going to not be changed, but I'm happy to submit a > > patch if this is something R-core feels can/should change. > > > > So I recently got bitten by the fact that > > > >> nrow(rbind(character(), character())) > > [1] 2 > > > > > > I was checking whether the result of an rbind call had more than one row, > > and that unexpected returned true, causing all sorts of shenanigans > > downstream as I'm sure you can imagine. > > > > Now I know that from ?rbind > > > > For ?cbind? (?rbind?), vectors of zero length (including ?NULL?) > >> are ignored unless the result would have zero rows (columns), for > >> > >> S compatibility. (Zero-extent matrices do not occur in S3 and are > >> > >> not ignored in R.) > >> > > But there's a couple of things here. First, for the rowbind case this > > reads as "if there would be zero columns, the vectors will not be > > ignored". This wording implies to me that not ignoring the vectors is a > > remedy to the "problem" of the potential for a zero-column return, but > > thats not the case. The result still has 0 columns, it just does not > also > > have zero rows. So even if the behavior is not changed, perhaps this > > wording can be massaged for clarity? > > > > The other issue, which I admit is likely a problem with my intuition, but > > which I don't think I'm alone in having, is that even if I can't have a > 0x0 > > matrix (which is what I'd prefer) I would have expected/preferred a 1x0 > > matrix, the reasoning being that if we must avoid a 0x0 return value, we > > would do the minimum required to avoid, which is to not ignore the first > > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the > > remaining ones as they contain information for 0 new rows. > > > > Of course I can program around this now that I know the behavior, but > > again, its so unintuitive (even for someone with a fairly well developed > > intuition for R's sometimes "quirky" behavior) that I figured I'd bring > it > > up. > > > > Thoughts? > > > > Best, > > ~G > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e> > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > >[[alternative HTML version deleted]]
Pages, Herve
2019-May-17 02:41 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
On 5/16/19 17:48, Gabriel Becker wrote: Hi Herve, Inline. On Thu, May 16, 2019 at 4:45 PM Pages, Herve <hpages at fredhutch.org<mailto:hpages at fredhutch.org>> wrote: Hi Gabe, ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) # [1] 2 ncol(data.frame(aa="a", AA="A")) # [1] 2 ncol(data.frame(aa=character(0), AA=character(0))) # [1] 2 ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) # [1] 2 ncol(cbind(aa="a", AA="A")) # [1] 2 ncol(cbind(aa=character(0), AA=character(0))) # [1] 2 nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) # [1] 2 nrow(rbind(aa="a", AA="A")) # [1] 2 nrow(rbind(aa=character(0), AA=character(0))) # [1] 2 Sure, but> nrow(rbind(aa = c("a", "b", "c"), AA = c("a", "b", "c")))[1] 2> nrow(rbind(aa = c("a", "b", "c"), AA = "a"))[1] 2> nrow(rbind(aa = c("a", "b", "c"), AA = character()))[1] 1 Ah, I see now. But: > data.frame(aa = c("a", "b", "c"), AA = character()) Error in data.frame(aa = c("a", "b", "c"), AA = character()) : arguments imply differing number of rows: 3, 0 and > mapply(`*`, 1:5, integer(0)) Error in mapply(`*`, 1:5, integer(0)) : zero-length inputs cannot be mixed with those of non-zero length So I would declare rbind(aa = c("a", "b", "c"), AA = character()) inconsistent rather than making the case that rbind(aa = character(), AA = character()) needs to change. Cheers, H. So even if I ultimately "lose" this debate (which really wouldn't shock me, even if R-core did agree with me there's backwards compatibility to consider), you have to concede that the current behavior is more complicated than the above is acknowledging. By rights of the invariance that you and Hadley are advocating, as far as I understand it, the last should give 2 rows, one of which is all NAs, rather than giving only one row as it currently does (and, I assume?, always has). So there are two different behavior patterns that could coherently (and internally-consistently) be generalized to apply to the rbind(character(), character()) case, not just one. I'm making the case that the other one (that length 0 vectors do not add rows because they don't contain data) would be equally valid, and to N>1 people, at least equally intuitive. Best, ~G hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or nrow(rbind(aa=character(0), AA=character(0))) should do anything different from what they do. In my experience, and more generally speaking, the desire to treat 0-length vectors as a special case that deviates from the non-zero-length case has never been productive. H. On 5/16/19 13:17, Gabriel Becker wrote:> Hi all, > > Apologies if this has been asked before (a quick google didn't find it for > me),and I know this is a case of behaving as documented but its so > unintuitive (to me at least) that I figured I'd bring it up here anyway. I > figure its probably going to not be changed, but I'm happy to submit a > patch if this is something R-core feels can/should change. > > So I recently got bitten by the fact that > >> nrow(rbind(character(), character())) > [1] 2 > > > I was checking whether the result of an rbind call had more than one row, > and that unexpected returned true, causing all sorts of shenanigans > downstream as I'm sure you can imagine. > > Now I know that from ?rbind > > For ?cbind? (?rbind?), vectors of zero length (including ?NULL?) >> are ignored unless the result would have zero rows (columns), for >> >> S compatibility. (Zero-extent matrices do not occur in S3 and are >> >> not ignored in R.) >> > But there's a couple of things here. First, for the rowbind case this > reads as "if there would be zero columns, the vectors will not be > ignored". This wording implies to me that not ignoring the vectors is a > remedy to the "problem" of the potential for a zero-column return, but > thats not the case. The result still has 0 columns, it just does not also > have zero rows. So even if the behavior is not changed, perhaps this > wording can be massaged for clarity? > > The other issue, which I admit is likely a problem with my intuition, but > which I don't think I'm alone in having, is that even if I can't have a 0x0 > matrix (which is what I'd prefer) I would have expected/preferred a 1x0 > matrix, the reasoning being that if we must avoid a 0x0 return value, we > would do the minimum required to avoid, which is to not ignore the first > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the > remaining ones as they contain information for 0 new rows. > > Of course I can program around this now that I know the behavior, but > again, its so unintuitive (even for someone with a fairly well developed > intuition for R's sometimes "quirky" behavior) that I figured I'd bring it > up. > > Thoughts? > > Best, > ~G > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org> Phone: (206) 667-5791 Fax: (206) 667-1319 -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org> Phone: (206) 667-5791 Fax: (206) 667-1319 [[alternative HTML version deleted]]
Jan Gorecki
2019-May-17 03:48 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Hi Gabriel> Personally, no I wouldn't. I would consider m==0 a degenerate case, wherethere is no data, but I personally find matrices (or data.frames) with rows but no columns a very strange concept. This distinction between matrix and data.frames is the crux in this case.>From the dimensional modelling point of view, matrix can have non-zerorows and zero columns, but data.frame (assuming it maps to database table structure) should never have non-zero rows and zero columns. This kind of issue was raised before in our issue tracker: https://github.com/Rdatatable/data.table/issues/2422 You should find that discussion useful. Best, Jan Gorecki On Fri, May 17, 2019 at 8:11 AM Pages, Herve <hpages at fredhutch.org> wrote:> > On 5/16/19 17:48, Gabriel Becker wrote: > > Hi Herve, > > Inline. > > > > On Thu, May 16, 2019 at 4:45 PM Pages, Herve <hpages at fredhutch.org<mailto:hpages at fredhutch.org>> wrote: > Hi Gabe, > > ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > ncol(data.frame(aa="a", AA="A")) > # [1] 2 > > ncol(data.frame(aa=character(0), AA=character(0))) > # [1] 2 > > ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > ncol(cbind(aa="a", AA="A")) > # [1] 2 > > ncol(cbind(aa=character(0), AA=character(0))) > # [1] 2 > > nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C"))) > # [1] 2 > > nrow(rbind(aa="a", AA="A")) > # [1] 2 > > nrow(rbind(aa=character(0), AA=character(0))) > # [1] 2 > > Sure, but > > > > nrow(rbind(aa = c("a", "b", "c"), AA = c("a", "b", "c"))) > > [1] 2 > > > nrow(rbind(aa = c("a", "b", "c"), AA = "a")) > > [1] 2 > > > nrow(rbind(aa = c("a", "b", "c"), AA = character())) > > [1] 1 > > > Ah, I see now. > > But: > > > data.frame(aa = c("a", "b", "c"), AA = character()) > Error in data.frame(aa = c("a", "b", "c"), AA = character()) : > arguments imply differing number of rows: 3, 0 > > and > > > mapply(`*`, 1:5, integer(0)) > Error in mapply(`*`, 1:5, integer(0)) : > zero-length inputs cannot be mixed with those of non-zero length > > So I would declare rbind(aa = c("a", "b", "c"), AA = character()) inconsistent rather than making the case that rbind(aa = character(), AA = character()) needs to change. > > Cheers, > > H. > > > So even if I ultimately "lose" this debate (which really wouldn't shock me, even if R-core did agree with me there's backwards compatibility to consider), you have to concede that the current behavior is more complicated than the above is acknowledging. > > By rights of the invariance that you and Hadley are advocating, as far as I understand it, the last should give 2 rows, one of which is all NAs, rather than giving only one row as it currently does (and, I assume?, always has). > > So there are two different behavior patterns that could coherently (and internally-consistently) be generalized to apply to the rbind(character(), character()) case, not just one. I'm making the case that the other one (that length 0 vectors do not add rows because they don't contain data) would be equally valid, and to N>1 people, at least equally intuitive. > > Best, > ~G > > hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or > nrow(rbind(aa=character(0), AA=character(0))) should do anything > different from what they do. > > In my experience, and more generally speaking, the desire to treat > 0-length vectors as a special case that deviates from the > non-zero-length case has never been productive. > > H. > > > On 5/16/19 13:17, Gabriel Becker wrote: > > Hi all, > > > > Apologies if this has been asked before (a quick google didn't find it for > > me),and I know this is a case of behaving as documented but its so > > unintuitive (to me at least) that I figured I'd bring it up here anyway. I > > figure its probably going to not be changed, but I'm happy to submit a > > patch if this is something R-core feels can/should change. > > > > So I recently got bitten by the fact that > > > >> nrow(rbind(character(), character())) > > [1] 2 > > > > > > I was checking whether the result of an rbind call had more than one row, > > and that unexpected returned true, causing all sorts of shenanigans > > downstream as I'm sure you can imagine. > > > > Now I know that from ?rbind > > > > For ?cbind? (?rbind?), vectors of zero length (including ?NULL?) > >> are ignored unless the result would have zero rows (columns), for > >> > >> S compatibility. (Zero-extent matrices do not occur in S3 and are > >> > >> not ignored in R.) > >> > > But there's a couple of things here. First, for the rowbind case this > > reads as "if there would be zero columns, the vectors will not be > > ignored". This wording implies to me that not ignoring the vectors is a > > remedy to the "problem" of the potential for a zero-column return, but > > thats not the case. The result still has 0 columns, it just does not also > > have zero rows. So even if the behavior is not changed, perhaps this > > wording can be massaged for clarity? > > > > The other issue, which I admit is likely a problem with my intuition, but > > which I don't think I'm alone in having, is that even if I can't have a 0x0 > > matrix (which is what I'd prefer) I would have expected/preferred a 1x0 > > matrix, the reasoning being that if we must avoid a 0x0 return value, we > > would do the minimum required to avoid, which is to not ignore the first > > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the > > remaining ones as they contain information for 0 new rows. > > > > Of course I can program around this now that I know the behavior, but > > again, its so unintuitive (even for someone with a fairly well developed > > intuition for R's sometimes "quirky" behavior) that I figured I'd bring it > > up. > > > > Thoughts? > > > > Best, > > ~G > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e> > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org> > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org> > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Abby Spurdle
2019-May-17 05:09 UTC
[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
Herve Pages wrote:> In my experience, and more generally speaking, the desire to treat > 0-length vectors as a special case that deviates from the > non-zero-length case has never been productive.Good idea. Gabriel Becker Wrote:> > nrow(rbind(aa = c("a", "b", "c"), AA = character())) > [1] 1> By rights of the invariance that you and Hadley are advocating, as far as > I understand it, the last should give 2 rows, one of which is all NAs, > rather than giving only one row as it currently does (and, I assume?, > always has).I think, ideally, this example should generate an error or a warning. [[alternative HTML version deleted]]
Possibly Parallel Threads
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)
- nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)