Hervé Pagès
2020-May-24 21:22 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
On 5/24/20 00:26, Gabriel Becker wrote:> > > On Sat, May 23, 2020 at 9:59 PM Herv? Pag?s <hpages at fredhutch.org > <mailto:hpages at fredhutch.org>> wrote: > > On 5/23/20 17:45, Gabriel Becker wrote: > > Maybe my intuition is just > > different?but when I collapse multiple character vectors together, I > > expect?all the characters from each of those vectors to be in the > > resulting collapsed one. > > Yes I'd expect that too. But the **collapse** operation in paste() has > never been about collapsing **multiple** character vectors together. > What it does is collapse the **single** character vector that comes out > of the 'sep' operation. > > > I understand what it does, I broke ti down the?same way in my post > earlier in?the thread. the fact remains?is that it is a single function > which significantly muddies the waters. so you can say > > paste0(x,y, collapse=",", recycle0=TRUE) > > is not a collapse operation on multiple?vectors, and of course there's a > sense in which?you're not wrong (again I understand what these functions > do), but it sure looks like one in the invocation, doesn't it? > > Honestly the thing that this whole discussion has shown me most clearly > is that, imho, collapse (accepting ONLY one data vector) and > paste(accepting multiple) should never have been a single function to > begin with.? But that ship sailed long long ago.Yes :-(> > So > > ? ?paste(x, y, z, sep="", collapse=",") > > is analogous to > > ? ?sum(x + y + z) > > > Honestly, I'd be significantly more comfortable?if > > 1:10?+ integer(0)?+ 5 > > were an error too.This is actually the recycling scheme used by mapply(): > mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5) Error in mapply(FUN = FUN, ...) : zero-length inputs cannot be mixed with those of non-zero length AFAIK base R uses 3 different recycling schemes for n-ary operations: (1) The recycling scheme used by arithmetic and comparison operations (Arith, Compare, Logic group generics). (2) The recycling scheme used by classic paste(). (3) The recycling scheme used by mapply(). Having such a core mechanism like recycling being inconsistent across base R is sad. It makes it really hard to predict how a given n-ary function will recycle its arguments unless you spend some time trying it yourself with several combinations of vector lengths. It is of course the source of numerous latent bugs. I wish there was only one but that's just a dream. None of these 3 recycling schemes is perfect. IMO (2) is by far the worst. (3) is too restrictive and would need to be refined if we wanted to make it a good universal recycling scheme. Anyway I don't think it makes sense to introduce a 4th recycling scheme at this point even though it would be a nice item to put on the wish list for R 7.0.0 with the ultimate goal that it will universally adopted in R 11.0.0 ;-) So if we have to do with what we have IMO (1) is the scheme that makes most sense although I agree that it can do some surprising things for some unusual combinations of vector lengths. It's the scheme I adhere to in my own binary operations e.g. in S4Vector::pcompare(). The modest proposal of the 'recycle0' argument is only to let the user switch from recycling scheme (2) to (1) if they're not happy with scheme (2) (I'm one of them). Switching to scheme (3) or to a new custom scheme would be a completely different proposal.> > At least I'm consistent right?Yes :-) Anyway discussing recycling schemes is interesting but not directly related with what the OP brought up (behavior of the 'collapse' operation). Cheers, H.> > ~G-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Martin Maechler
2020-May-26 13:24 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
>>>>> Herv? Pag?s >>>>> on Sun, 24 May 2020 14:22:37 -0700 writes:> On 5/24/20 00:26, Gabriel Becker wrote: >> >> >> On Sat, May 23, 2020 at 9:59 PM Herv? Pag?s <hpages at fredhutch.org >> <mailto:hpages at fredhutch.org>> wrote: >> >> On 5/23/20 17:45, Gabriel Becker wrote: >> > Maybe my intuition is just >> > different?but when I collapse multiple character vectors together, I >> > expect?all the characters from each of those vectors to be in the >> > resulting collapsed one. >> >> Yes I'd expect that too. But the **collapse** operation in paste() has >> never been about collapsing **multiple** character vectors together. >> What it does is collapse the **single** character vector that comes out >> of the 'sep' operation. >> >> >> I understand what it does, I broke ti down the?same way in my post >> earlier in?the thread. the fact remains?is that it is a single function >> which significantly muddies the waters. so you can say >> >> paste0(x,y, collapse=",", recycle0=TRUE) >> >> is not a collapse operation on multiple?vectors, and of course there's a >> sense in which?you're not wrong (again I understand what these functions >> do), but it sure looks like one in the invocation, doesn't it? >> >> Honestly the thing that this whole discussion has shown me most clearly >> is that, imho, collapse (accepting ONLY one data vector) and >> paste(accepting multiple) should never have been a single function to >> begin with.? But that ship sailed long long ago. > Yes :-( >> >> So >> >> ? ?paste(x, y, z, sep="", collapse=",") >> >> is analogous to >> >> ? ?sum(x + y + z) >> >> >> Honestly, I'd be significantly more comfortable?if >> >> 1:10?+ integer(0)?+ 5 >> >> were an error too. > This is actually the recycling scheme used by mapply(): >> mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5) > Error in mapply(FUN = FUN, ...) : > zero-length inputs cannot be mixed with those of non-zero length > AFAIK base R uses 3 different recycling schemes for n-ary operations: > (1) The recycling scheme used by arithmetic and comparison operations > (Arith, Compare, Logic group generics). > (2) The recycling scheme used by classic paste(). > (3) The recycling scheme used by mapply(). > Having such a core mechanism like recycling being inconsistent across > base R is sad. It makes it really hard to predict how a given n-ary > function will recycle its arguments unless you spend some time trying it > yourself with several combinations of vector lengths. It is of course > the source of numerous latent bugs. I wish there was only one but that's > just a dream. > None of these 3 recycling schemes is perfect. IMO (2) is by far the > worst. (3) is too restrictive and would need to be refined if we wanted > to make it a good universal recycling scheme. > Anyway I don't think it makes sense to introduce a 4th recycling scheme > at this point even though it would be a nice item to put on the wish > list for R 7.0.0 with the ultimate goal that it will universally adopted > in R 11.0.0 ;-) > So if we have to do with what we have IMO (1) is the scheme that makes > most sense although I agree that it can do some surprising things for > some unusual combinations of vector lengths. It's the scheme I adhere to > in my own binary operations e.g. in S4Vector::pcompare(). > The modest proposal of the 'recycle0' argument is only to let the user > switch from recycling scheme (2) to (1) if they're not happy with scheme > (2) (I'm one of them). Yes, indeed. This was the purpose of introducing 'recycle0'. Now, with collapse = <string>, {in R "string" := character vector of length 1}. we clearly see different interpretations on what is desirable for recycle0 = TRUE, all of you (Suharto, Bill, Herv?, Gabe) assert that the behavior should be different than now, and should either error (possibly, by Gabe), or return a single string (possibly with a warning), i.e., collapse = <string> behavior should not be influenced (or possibly be conflicting with) by recycle0=TRUE. Within R core, some believe the current recyle0=TRUE behavior to be the correct one. Personally, I see reasons for both.. What about remaining back-compatible, not only to R 3.y.z with default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUE *and* add a new option for the Suharto-Bill-Herv?-Gabe behavior, e.g., recycle0="sep.only" or just recycle0="sep" ? As (for back-compatibility reasons) you have to specify 'recycle0 = ..' anyway, you would get what makes most sense to you by using such a third option. ? (WDYT ?) Martin > Switching to scheme (3) or to a new custom scheme > would be a completely different proposal. >> >> At least I'm consistent right? > Yes :-) > Anyway discussing recycling schemes is interesting but not directly > related with what the OP brought up (behavior of the 'collapse' operation). > Cheers, > H. >> >> ~G
Hervé Pagès
2020-May-26 19:38 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Hi Martin, On 5/26/20 06:24, Martin Maechler wrote: ...> > What about remaining back-compatible, not only to R 3.y.z with > default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUEWhat back-compatibility with R 4.0.0 are we talking about? The 'recycle0' arg was added **after** the R 4.0.0 release and has never been part of an official release yet. This is the time to fix it.> *and* add a new option for the Suharto-Bill-Herv?-Gabe behavior, > e.g., recycle0="sep.only" or just recycle0="sep" ?OMG!> > As (for back-compatibility reasons) you have to specify > 'recycle0 = ..' anyway, you would get what makes most sense to > you by using such a third option. > > ? (WDYT ?)Don't bother. I'd rather use paste(paste(x, y, z, sep="#", recycle0=TRUE), collapse=",") i.e. explicitly break down the 2 operations (sep and collapse). Might be slightly less efficient but I find it way more readable than paste(x, y, z, sep="#", collapse=",", recycle0="sep.only") BTW I appreciate you trying to accomodate everybody's taste. That doesn't sound like an easy task ;-) I'll just reiterate my earlier comment that controlling the collapse operation via an argument named 'recycle0' doesn't make sense (collapse involves NO recycling). So I don't know if the current 'recyle0=TRUE' behavior is "the correct one" but at the very least the name of the argument is a misnomer and misleading. More generally speaking using the same argument to control 2 distinct operations is not good API design. A better design is to use 2 arguments. Then the 2 arguments can generally be made orthogonal (like in this case) i.e. all possible combinations are valid (4 combinations in this case). Thanks, H.> > Martin > > > Switching to scheme (3) or to a new custom scheme > > would be a completely different proposal. > > >> > >> At least I'm consistent right? > > > Yes :-) > > > Anyway discussing recycling schemes is interesting but not directly > > related with what the OP brought up (behavior of the 'collapse' operation). > > > Cheers, > > H. > > >> > >> ~G >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Possibly Parallel Threads
- paste(character(0), collapse="", recycle0=FALSE) should be ""
- paste(character(0), collapse="", recycle0=FALSE) should be ""
- paste(character(0), collapse="", recycle0=FALSE) should be ""
- paste(character(0), collapse="", recycle0=FALSE) should be ""
- paste(character(0), collapse="", recycle0=FALSE) should be ""