Hervé Pagès
2020-May-22 22:16 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Gabe, It's the current behavior of paste() that is a major source of bugs: ## Add "rs" prefix to SNP ids and collapse them in a ## comma-separated string. collapse_snp_ids <- function(snp_ids) paste("rs", snp_ids, sep="", collapse=",") snp_groups <- list( group1=c(55, 22, 200), group2=integer(0), group3=c(99, 550) ) vapply(snp_groups, collapse_snp_ids, character(1)) # group1 group2 group3 # "rs55,rs22,rs200" "rs" "rs99,rs550" This has hit me so many times! Now with 'collapse0=TRUE', we finally have the opportunity to make it do the right thing. Let's not miss that opportunity. Cheers, H. On 5/22/20 11:26, Gabriel Becker wrote:> I understand that this is consistent but it also strikes me as an > enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth > over at this point in user-facing R space. > > For the record I'm not suggesting it should return something other than > "", and in particular I'm not arguing that any call to paste /that does > not return an error/?with non-NULL collapse should return a character > vector of length one. > > Rather I'm pointing out that it could (perhaps should, imo) simply be an > error, which is also consistent, in the strict sense, with > previous?behavior in that it is the developer simply?declining to extend > the recycle0 argument to the full parameter?space (there is no rule?that > says we must do so, arguments whose use is incompatible with other > arguments can be reasonable and called for). > > I don't feel feel?super strongly that reeturning?"" in this and similar > cases horrible?and should never happen, but i'd bet dollars to donuts > that to the extent that behavior occurs it will be a disproportionately > major source of bugs, and i think thats?at least worth considering in > addition to pure consistency. > > ~G > > On Fri, May 22, 2020 at 9:50 AM William Dunlap <wdunlap at tibco.com > <mailto:wdunlap at tibco.com>> wrote: > > I agree?with Herve, processing collapse happens last so > collapse=non-NULL always leads to a single character string being > returned, the same as paste(collapse="").? See the altPaste function > I posted yesterday. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=> > > > On Fri, May 22, 2020 at 9:12 AM Herv? Pag?s <hpages at fredhutch.org > <mailto:hpages at fredhutch.org>> wrote: > > I think that > > ? ? paste(c("a", "b"), NULL, c("c",? "d"),? sep = " ", collapse > = ",", > recycle0=TRUE) > > should just return an empty string and don't see why it needs to > emit a > warning or raise an error. To me it does exactly what the user > is asking > for, which is to change how the 3 arguments are recycled > **before** the > 'sep' operation. > > The 'recycle0' argument has no business in the 'collapse' operation > (which comes after the 'sep' operation): this operation still > behaves > like it always had. > > That's all there is to it. > > H. > > > On 5/22/20 03:00, Gabriel Becker wrote: > > Hi Martin et al, > > > > > > > > On Thu, May 21, 2020 at 9:42 AM Martin Maechler > > <maechler at stat.math.ethz.ch > <mailto:maechler at stat.math.ethz.ch> > <mailto:maechler at stat.math.ethz.ch > <mailto:maechler at stat.math.ethz.ch>>> wrote: > > > >? ? ? >>>>> Herv? Pag?s > >? ? ? >>>>>? ? ?on Fri, 15 May 2020 13:44:28 -0700 writes: > > > >? ? ? ? ? > There is still the situation where **both** 'sep' and > >? ? ?'collapse' are > >? ? ? ? ? > specified: > > > >? ? ? ? ? >> paste(integer(0), "nth", sep="", collapse=",") > >? ? ? ? ? > [1] "nth" > > > >? ? ? ? ? > In that case 'recycle0' should **not** be ignored i.e. > > > >? ? ? ? ? > paste(integer(0), "nth", sep="", collapse=",", > recycle0=TRUE) > > > >? ? ? ? ? > should return the empty string (and not > character(0) like it > >? ? ?does at the > >? ? ? ? ? > moment). > > > >? ? ? ? ? > In other words, 'recycle0' should only control the > first > >? ? ?operation (the > >? ? ? ? ? > operation controlled by 'sep'). Which makes plenty > of sense: > >? ? ?the 1st > >? ? ? ? ? > operation is binary (or n-ary) while the collapse > operation > >? ? ?is unary. > >? ? ? ? ? > There is no concept of recycling in the context of > unary > >? ? ?operations. > > > >? ? ?Interesting, ..., and sounding somewhat convincing. > > > >? ? ? ? ? > On 5/15/20 11:25, Gabriel Becker wrote: > >? ? ? ? ? >> Hi all, > >? ? ? ? ? >> > >? ? ? ? ? >> This makes sense to me, but I would think that > recycle0 and > >? ? ?collapse > >? ? ? ? ? >> should actually be incompatible and paste should > throw an > >? ? ?error if > >? ? ? ? ? >> recycle0 were TRUE and collapse were declared in > the same > >? ? ?call. I don't > >? ? ? ? ? >> think the value of recycle0 should be silently > ignored if it > >? ? ?is actively > >? ? ? ? ? >> specified. > >? ? ? ? ? >> > >? ? ? ? ? >> ~G > > > >? ? ?Just to summarize what I think we should know and agree > (or be > >? ? ?be "disproven") and where this comes from ... > > > >? ? ?1) recycle0 is a new R 4.0.0 option in paste() / paste0() > which by > >? ? ?default > >? ? ? ? ?(recycle0 = FALSE) should (and *does* AFAIK) not > change anything, > >? ? ? ? ?hence? paste() / paste0() behave completely > back-compatible > >? ? ? ? ?if recycle0 is kept to FALSE. > > > >? ? ?2) recycle0 = TRUE is meant to give different behavior, > notably > >? ? ? ? ?0-length arguments (among '...') should result in > 0-length results. > > > >? ? ? ? ?The above does not specify what this means in detail, > see 3) > > > >? ? ?3) The current R 4.0.0 implementation (for which I'm > primarily > >? ? ?responsible) > >? ? ? ? ?and help(paste)? are in accordance. > >? ? ? ? ?Notably the help page (Arguments -> 'recycle0' ; > Details 1st > >? ? ?para ; Examples) > >? ? ? ? ?says and shows how the 4.0.0 implementation has been > meant to work. > > > >? ? ?4) Several provenly smart members of the R community > argue that > >? ? ? ? ?both the implementation and the documentation of > 'recycle0 > >? ? ? ? ?TRUE'? should be changed to be more logical / > coherent / sensical .. > > > >? ? ?Is the above all correct in your view? > > > >? ? ?Assuming yes,? I read basically two proposals, both agreeing > >? ? ?that? recycle0 = TRUE? should only ever apply to the > action of 'sep' > >? ? ?but not the action of 'collapse'. > > > >? ? ?1) Bill and Herv? (I think) propose that 'recycle0' > should have > >? ? ? ? ?no effect whenever? 'collapse = <string>' > > > >? ? ?2) Gabe proposes that 'collapse = <string>' and 'recycle0 > = TRUE' > >? ? ? ? ?should be declared incompatible and error. If going > in that > >? ? ? ? ?direction, I could also see them to give a warning (and > >? ? ? ? ?continue as if recycle = FALSE). > > > > > > Herve makes a good point about when sep and collapse are both > set. That > > said, if the user explicitly sets recycle0, Personally, I > don't think it > > should be silently ignored under any configuration of other > arguments. > > > > If all of the arguments are to go into effect, the question > then becomes > > one of ordering, I think. > > > > Consider > > > >? ? ?paste(c("a", "b"), NULL, c("c",? "d"),? sep = " ", > collapse = ",", > >? ? ?recycle0=TRUE) > > > > Currently that returns character(0), becuase?the logic is > > essenttially?(in pseudo-code) > > > >? ? ?collapse(paste(c("a", "b"), NULL, c("c",? "d"),? sep = " ", > >? ? ?recycle0=TRUE), collapse = ", ", recycle0=TRUE) > > > >? ? ? ?-> collapse(character(0), collapse = ", "?recycle0=TRUE) > > > >? ? ?-> character(0) > > > > Now Bill Dunlap argued, fairly convincingly I think, that > paste(..., > > collapse=<string>) should /always/?return a character vector > of length > > exactly one. With recycle0, though,??it will return "" via > the progression > > > >? ? ?paste(c("a", "b"), NULL, c("c",? "d"),? sep = " ", > collapse = ",", > >? ? ?recycle0=TRUE) > > > >? ? ? ?-> collapse(character(0),?collapse = ", ") > > > >? ? ?-> "" > > > > > > because recycle0 is still applied to the sep-based operation > which > > occurs before collapse, thus leaving a vector of length?0 to > collapse. > > > > That is consistent but seems unlikely to be what the user > wanted, imho. > > I think if it does this there should be at least a warning > when paste > > collapses to "" this way, if it is allowed at all (ie if mixing > > collapse=<string>and recycle0=TRUEis not simply made an error). > > > > I would like to hear others' thoughts as well though. @Pages, > Herve > > <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>> > @William Dunlap > > <mailto:wdunlap at tibco.com <mailto:wdunlap at tibco.com>>?is "" > what you envision as thee desired and > > useful behavior there? > > > > Best, > > ~G > > > > > > > >? ? ?I have not yet my mind up but would tend to agree to "you > guys", > >? ? ?but I think that other R Core members should chime in, too. > > > >? ? ?Martin > > > >? ? ? ? ? >> On Fri, May 15, 2020 at 11:05 AM Herv? Pag?s > >? ? ?<hpages at fredhutch.org <mailto:hpages at fredhutch.org> > <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>> > >? ? ? ? ? >> <mailto:hpages at fredhutch.org > <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org > <mailto:hpages at fredhutch.org>>>> > >? ? ?wrote: > >? ? ? ? ? >> > >? ? ? ? ? >> Totally agree with that. > >? ? ? ? ? >> > >? ? ? ? ? >> H. > >? ? ? ? ? >> > >? ? ? ? ? >> On 5/15/20 10:34, William Dunlap via R-devel wrote: > >? ? ? ? ? >> > I agree: paste(collapse="something", ...) > should always > >? ? ?return a > >? ? ? ? ? >> single > >? ? ? ? ? >> > character string, regardless of the value of > recycle0. > >? ? ?This would be > >? ? ? ? ? >> > similar to when there are no non-NULL arguments > to paste; > >? ? ? ? ? >> collapse="." > >? ? ? ? ? >> > gives a single empty string and collapse=NULL > gives a zero > >? ? ?long > >? ? ? ? ? >> character > >? ? ? ? ? >> > vector. > >? ? ? ? ? >> >> paste() > >? ? ? ? ? >> > character(0) > >? ? ? ? ? >> >> paste(collapse=", ") > >? ? ? ? ? >> > [1] "" > >? ? ? ? ? >> > > >? ? ? ? ? >> > Bill Dunlap > >? ? ? ? ? >> > TIBCO Software > >? ? ? ? ? >> > wdunlap tibco.com > <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=> > > > ?<https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=o9ozvxBK-kVvAUFro7U1RrI5w0U8EPb0uyjQwMvOpt8&e=> > >? ? ? ? ? >> > > > ?<https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=rXIwWqf4U4HZS_bjUT3KfA9ARaV5YTb_kEcXWHnkt-c&e=> > >? ? ? ? ? >> > > >? ? ? ? ? >> > > >? ? ? ? ? >> > On Thu, Apr 30, 2020 at 9:56 PM > suharto_anggono--- via > >? ? ?R-devel < > >? ? ? ? ? >> > r-devel at r-project.org > <mailto:r-devel at r-project.org> <mailto:r-devel at r-project.org > <mailto:r-devel at r-project.org>> > >? ? ?<mailto:r-devel at r-project.org > <mailto:r-devel at r-project.org> <mailto:r-devel at r-project.org > <mailto:r-devel at r-project.org>>>> wrote: > >? ? ? ? ? >> > > >? ? ? ? ? >> >> Without 'collapse', 'paste' pastes > (concatenates) its > >? ? ?arguments > >? ? ? ? ? >> >> elementwise (separated by 'sep', " " by > default). New in > >? ? ?R devel > >? ? ? ? ? >> and R > >? ? ? ? ? >> >> patched, specifying recycle0 = FALSE makes mixing > >? ? ?zero-length and > >? ? ? ? ? >> >> nonzero-length arguments results in length > zero. The > >? ? ?result of > >? ? ? ? ? >> paste(n, > >? ? ? ? ? >> >> "th", sep = "", recycle0 = FALSE) always have > the same > >? ? ?length as > >? ? ? ? ? >> 'n'. > >? ? ? ? ? >> >> Previously, the result is still as long as the > longest > >? ? ?argument, > >? ? ? ? ? >> with the > >? ? ? ? ? >> >> zero-length argument like "". If all og the > arguments have > >? ? ? ? ? >> length zero, > >? ? ? ? ? >> >> 'recycle0' doesn't matter. > >? ? ? ? ? >> >> > >? ? ? ? ? >> >> As far as I understand, 'paste' with > 'collapse' as a > >? ? ?character > >? ? ? ? ? >> string is > >? ? ? ? ? >> >> supposed to put together elements of a vector > into a single > >? ? ? ? ? >> character > >? ? ? ? ? >> >> string. I think 'recycle0' shouldn't change it. > >? ? ? ? ? >> >> > >? ? ? ? ? >> >> In current R devel and R patched, > paste(character(0), > >? ? ?collapse = "", > >? ? ? ? ? >> >> recycle0 = FALSE) is character(0). I think it > should be > >? ? ?"", like > >? ? ? ? ? >> >> paste(character(0), collapse=""). > >? ? ? ? ? >> >> > >? ? ? ? ? >> >> paste(c("4", "5"), "th", sep = "", collapse > ", ", > >? ? ?recycle0 > >? ? ? ? ? >> FALSE) > >? ? ? ? ? >> >> is > >? ? ? ? ? >> >> "4th, 5th". > >? ? ? ? ? >> >> paste(c("4"? ? ?), "th", sep = "", collapse > ", ", > >? ? ?recycle0 > >? ? ? ? ? >> FALSE) > >? ? ? ? ? >> >> is > >? ? ? ? ? >> >> "4th". > >? ? ? ? ? >> >> I think > >? ? ? ? ? >> >> paste(c(? ? ? ? ), "th", sep = "", collapse > ", ", > >? ? ?recycle0 > >? ? ? ? ? >> FALSE) > >? ? ? ? ? >> >> should be > >? ? ? ? ? >> >> "", > >? ? ? ? ? >> >> not character(0). > >? ? ? ? ? >> >> > >? ? ? ? ? >> >> ______________________________________________ > >? ? ? ? ? >> >> R-devel at r-project.org > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org>> > >? ? ?<mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org>>> > >? ? ?mailing list > >? ? ? ? ? >> >> > >? ? ? ? ? >> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e> >? ? ? ? ? >> >> > >? ? ? ? ? >> > > >? ? ? ? ? >> >? ? ? ?[[alternative HTML version deleted]] > >? ? ? ? ? >> > > >? ? ? ? ? >> > ______________________________________________ > >? ? ? ? ? >> > R-devel at r-project.org > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org>> > >? ? ?<mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org>>> > >? ? ?mailing list > >? ? ? ? ? >> > > >? ? ? ? ? >> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e> >? ? ? ? ? >> > > >? ? ? ? ? >> > >? ? ? ? ? >> -- > >? ? ? ? ? >> Herv? Pag?s > >? ? ? ? ? >> > >? ? ? ? ? >> Program in Computational Biology > >? ? ? ? ? >> Division of Public Health Sciences > >? ? ? ? ? >> Fred Hutchinson Cancer Research Center > >? ? ? ? ? >> 1100 Fairview Ave. N, M1-B514 > >? ? ? ? ? >> P.O. Box 19024 > >? ? ? ? ? >> Seattle, WA 98109-1024 > >? ? ? ? ? >> > >? ? ? ? ? >> E-mail: hpages at fredhutch.org > <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org > <mailto:hpages at fredhutch.org>> > >? ? ?<mailto:hpages at fredhutch.org > <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org > <mailto:hpages at fredhutch.org>>> > >? ? ? ? ? >> Phone:? (206) 667-5791 > >? ? ? ? ? >> Fax:? ? (206) 667-1319 > >? ? ? ? ? >> > >? ? ? ? ? >> ______________________________________________ > >? ? ? ? ? >> R-devel at r-project.org > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org>> > >? ? ?<mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org>>> > >? ? ?mailing list > >? ? ? ? ? >> https://stat.ethz.ch/mailman/listinfo/r-devel > <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e=> > > > ?<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e=> > >? ? ? ? ? >> > > > ?<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=COnDeGgHNnHJlLLZOznMlhcaFU1nIRlkaSbssvlrMvw&e=> > >? ? ? ? ? >> > > > >? ? ? ? ? > -- > >? ? ? ? ? > Herv? Pag?s > > > >? ? ? ? ? > Program in Computational Biology > >? ? ? ? ? > Division of Public Health Sciences > >? ? ? ? ? > Fred Hutchinson Cancer Research Center > >? ? ? ? ? > 1100 Fairview Ave. N, M1-B514 > >? ? ? ? ? > P.O. Box 19024 > >? ? ? ? ? > Seattle, WA 98109-1024 > > > >? ? ? ? ? > E-mail: hpages at fredhutch.org > <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org > <mailto:hpages at fredhutch.org>> > >? ? ? ? ? > Phone:? (206) 667-5791 > >? ? ? ? ? > Fax:? ? (206) 667-1319 > > > >? ? ? ? ? > ______________________________________________ > >? ? ? ? ? > R-devel at r-project.org > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > <mailto:R-devel at r-project.org>> mailing list > >? ? ? ? ? > https://stat.ethz.ch/mailman/listinfo/r-devel > <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e=> > > > ?<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e=> > > > > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org> > Phone:? (206) 667-5791 > Fax:? ? (206) 667-1319 >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
brodie gaslam
2020-May-23 01:12 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
> On Friday, May 22, 2020, 6:16:45 PM EDT, Herv? Pag?s <hpages at fredhutch.org> wrote: > > Gabe, > > It's the current behavior of paste() that is a major source of bugs: > >?? ## Add "rs" prefix to SNP ids and collapse them in a >?? ## comma-separated string. >?? collapse_snp_ids <- function(snp_ids) >?????? paste("rs", snp_ids, sep="", collapse=",") > >?? snp_groups <- list( >???? group1=c(55, 22, 200), >???? group2=integer(0), >???? group3=c(99, 550) >?? ) > >?? vapply(snp_groups, collapse_snp_ids, character(1)) >?? #??????????? group1??????????? group2??????????? group3 >?? # "rs55,rs22,rs200"????????????? "rs"????? "rs99,rs550" > > This has hit me so many times! > > Now with 'collapse0=TRUE', we finally have the opportunity to make it do > the right thing. Let's not miss that opportunity. > > Cheers, > H.FWIW what convinces me is consistency with other aggregating functions applied to zero length inputs: sum(numeric(0)) ## [1] 0> > > On 5/22/20 11:26, Gabriel Becker wrote: > > I understand that this is consistent but it also strikes me as an > > enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth > > over at this point in user-facing R space. > > > > For the record I'm not suggesting it should return something other than > > "", and in particular I'm not arguing that any call to paste /that does > > not return an error/ with non-NULL collapse should return a character > > vector of length one.
Hervé Pagès
2020-May-23 08:57 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
On 5/22/20 18:12, brodie gaslam wrote:> > FWIW what convinces me is consistency with other aggregating functions applied > to zero length inputs: > > sum(numeric(0)) > ## [1] 0Right. And 1 is the identity element of multiplication: > prod(numeric(0)) [1] 1 And the empty string is the identity element of string aggregation by concatenation. H.
Gabriel Becker
2020-May-24 00:45 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Herve (et al.), On Fri, May 22, 2020 at 3:16 PM Herv? Pag?s <hpages at fredhutch.org> wrote:> Gabe, > > It's the current behavior of paste() that is a major source of bugs: > > ## Add "rs" prefix to SNP ids and collapse them in a > ## comma-separated string. > collapse_snp_ids <- function(snp_ids) > paste("rs", snp_ids, sep="", collapse=",") > > snp_groups <- list( > group1=c(55, 22, 200), > group2=integer(0), > group3=c(99, 550) > ) > > vapply(snp_groups, collapse_snp_ids, character(1)) > # group1 group2 group3 > # "rs55,rs22,rs200" "rs" "rs99,rs550" > > This has hit me so many times! > > Now with 'collapse0=TRUE', we finally have the opportunity to make it do > the right thing. Let's not miss that opportunity. >I see what you're saying, but I don' know. Maybe my intuition is just different but when I collapse multiple character vectors together, I expect all the characters from each of those vectors to be in the resulting collapsed one. In your example its a string literal tot be added elementwise to the prefix, but what if it is another vector of length > 1. Wouldn't it be strange that all those values are wiped and absent from the resulting string? Maybe it's just me. like for paste(x,y,z, sep ="", collapse = ", ", recycle0=TRUE) if length(y) is 0, it literally makes no difference when x and z are. I seem to be being largely outvoted anyway though, so we will see what Martin and others who may pop up might think, but I raised the points I wanted to raise so we'll see where things ultimately fall. ~G> > Cheers, > H. > > > On 5/22/20 11:26, Gabriel Becker wrote: > > I understand that this is consistent but it also strikes me as an > > enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth > > over at this point in user-facing R space. > > > > For the record I'm not suggesting it should return something other than > > "", and in particular I'm not arguing that any call to paste /that does > > not return an error/ with non-NULL collapse should return a character > > vector of length one. > > > > Rather I'm pointing out that it could (perhaps should, imo) simply be an > > error, which is also consistent, in the strict sense, with > > previous behavior in that it is the developer simply declining to extend > > the recycle0 argument to the full parameter space (there is no rule that > > says we must do so, arguments whose use is incompatible with other > > arguments can be reasonable and called for). > > > > I don't feel feel super strongly that reeturning "" in this and similar > > cases horrible and should never happen, but i'd bet dollars to donuts > > that to the extent that behavior occurs it will be a disproportionately > > major source of bugs, and i think thats at least worth considering in > > addition to pure consistency. > > > > ~G > > > > On Fri, May 22, 2020 at 9:50 AM William Dunlap <wdunlap at tibco.com > > <mailto:wdunlap at tibco.com>> wrote: > > > > I agree with Herve, processing collapse happens last so > > collapse=non-NULL always leads to a single character string being > > returned, the same as paste(collapse=""). See the altPaste function > > I posted yesterday. > > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com > > < > https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e> > > > > > > > On Fri, May 22, 2020 at 9:12 AM Herv? Pag?s <hpages at fredhutch.org > > <mailto:hpages at fredhutch.org>> wrote: > > > > I think that > > > > paste(c("a", "b"), NULL, c("c", "d"), sep = " ", collapse > > = ",", > > recycle0=TRUE) > > > > should just return an empty string and don't see why it needs to > > emit a > > warning or raise an error. To me it does exactly what the user > > is asking > > for, which is to change how the 3 arguments are recycled > > **before** the > > 'sep' operation. > > > > The 'recycle0' argument has no business in the 'collapse' > operation > > (which comes after the 'sep' operation): this operation still > > behaves > > like it always had. > > > > That's all there is to it. > > > > H. > > > > > > On 5/22/20 03:00, Gabriel Becker wrote: > > > Hi Martin et al, > > > > > > > > > > > > On Thu, May 21, 2020 at 9:42 AM Martin Maechler > > > <maechler at stat.math.ethz.ch > > <mailto:maechler at stat.math.ethz.ch> > > <mailto:maechler at stat.math.ethz.ch > > <mailto:maechler at stat.math.ethz.ch>>> wrote: > > > > > > >>>>> Herv? Pag?s > > > >>>>> on Fri, 15 May 2020 13:44:28 -0700 writes: > > > > > > > There is still the situation where **both** 'sep' > and > > > 'collapse' are > > > > specified: > > > > > > >> paste(integer(0), "nth", sep="", collapse=",") > > > > [1] "nth" > > > > > > > In that case 'recycle0' should **not** be ignored > i.e. > > > > > > > paste(integer(0), "nth", sep="", collapse=",", > > recycle0=TRUE) > > > > > > > should return the empty string (and not > > character(0) like it > > > does at the > > > > moment). > > > > > > > In other words, 'recycle0' should only control the > > first > > > operation (the > > > > operation controlled by 'sep'). Which makes plenty > > of sense: > > > the 1st > > > > operation is binary (or n-ary) while the collapse > > operation > > > is unary. > > > > There is no concept of recycling in the context of > > unary > > > operations. > > > > > > Interesting, ..., and sounding somewhat convincing. > > > > > > > On 5/15/20 11:25, Gabriel Becker wrote: > > > >> Hi all, > > > >> > > > >> This makes sense to me, but I would think that > > recycle0 and > > > collapse > > > >> should actually be incompatible and paste should > > throw an > > > error if > > > >> recycle0 were TRUE and collapse were declared in > > the same > > > call. I don't > > > >> think the value of recycle0 should be silently > > ignored if it > > > is actively > > > >> specified. > > > >> > > > >> ~G > > > > > > Just to summarize what I think we should know and agree > > (or be > > > be "disproven") and where this comes from ... > > > > > > 1) recycle0 is a new R 4.0.0 option in paste() / paste0() > > which by > > > default > > > (recycle0 = FALSE) should (and *does* AFAIK) not > > change anything, > > > hence paste() / paste0() behave completely > > back-compatible > > > if recycle0 is kept to FALSE. > > > > > > 2) recycle0 = TRUE is meant to give different behavior, > > notably > > > 0-length arguments (among '...') should result in > > 0-length results. > > > > > > The above does not specify what this means in detail, > > see 3) > > > > > > 3) The current R 4.0.0 implementation (for which I'm > > primarily > > > responsible) > > > and help(paste) are in accordance. > > > Notably the help page (Arguments -> 'recycle0' ; > > Details 1st > > > para ; Examples) > > > says and shows how the 4.0.0 implementation has been > > meant to work. > > > > > > 4) Several provenly smart members of the R community > > argue that > > > both the implementation and the documentation of > > 'recycle0 > > > TRUE' should be changed to be more logical / > > coherent / sensical .. > > > > > > Is the above all correct in your view? > > > > > > Assuming yes, I read basically two proposals, both > agreeing > > > that recycle0 = TRUE should only ever apply to the > > action of 'sep' > > > but not the action of 'collapse'. > > > > > > 1) Bill and Herv? (I think) propose that 'recycle0' > > should have > > > no effect whenever 'collapse = <string>' > > > > > > 2) Gabe proposes that 'collapse = <string>' and 'recycle0 > > = TRUE' > > > should be declared incompatible and error. If going > > in that > > > direction, I could also see them to give a warning > (and > > > continue as if recycle = FALSE). > > > > > > > > > Herve makes a good point about when sep and collapse are both > > set. That > > > said, if the user explicitly sets recycle0, Personally, I > > don't think it > > > should be silently ignored under any configuration of other > > arguments. > > > > > > If all of the arguments are to go into effect, the question > > then becomes > > > one of ordering, I think. > > > > > > Consider > > > > > > paste(c("a", "b"), NULL, c("c", "d"), sep = " ", > > collapse = ",", > > > recycle0=TRUE) > > > > > > Currently that returns character(0), becuase the logic is > > > essenttially (in pseudo-code) > > > > > > collapse(paste(c("a", "b"), NULL, c("c", "d"), sep = " > ", > > > recycle0=TRUE), collapse = ", ", recycle0=TRUE) > > > > > > -> collapse(character(0), collapse = ", " recycle0=TRUE) > > > > > > -> character(0) > > > > > > Now Bill Dunlap argued, fairly convincingly I think, that > > paste(..., > > > collapse=<string>) should /always/ return a character vector > > of length > > > exactly one. With recycle0, though, it will return "" via > > the progression > > > > > > paste(c("a", "b"), NULL, c("c", "d"), sep = " ", > > collapse = ",", > > > recycle0=TRUE) > > > > > > -> collapse(character(0), collapse = ", ") > > > > > > -> "" > > > > > > > > > because recycle0 is still applied to the sep-based operation > > which > > > occurs before collapse, thus leaving a vector of length 0 to > > collapse. > > > > > > That is consistent but seems unlikely to be what the user > > wanted, imho. > > > I think if it does this there should be at least a warning > > when paste > > > collapses to "" this way, if it is allowed at all (ie if > mixing > > > collapse=<string>and recycle0=TRUEis not simply made an > error). > > > > > > I would like to hear others' thoughts as well though. @Pages, > > Herve > > > <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>> > > @William Dunlap > > > <mailto:wdunlap at tibco.com <mailto:wdunlap at tibco.com>> is "" > > what you envision as thee desired and > > > useful behavior there? > > > > > > Best, > > > ~G > > > > > > > > > > > > I have not yet my mind up but would tend to agree to "you > > guys", > > > but I think that other R Core members should chime in, > too. > > > > > > Martin > > > > > > >> On Fri, May 15, 2020 at 11:05 AM Herv? Pag?s > > > <hpages at fredhutch.org <mailto:hpages at fredhutch.org> > > <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>> > > > >> <mailto:hpages at fredhutch.org > > <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org > > <mailto:hpages at fredhutch.org>>>> > > > wrote: > > > >> > > > >> Totally agree with that. > > > >> > > > >> H. > > > >> > > > >> On 5/15/20 10:34, William Dunlap via R-devel > wrote: > > > >> > I agree: paste(collapse="something", ...) > > should always > > > return a > > > >> single > > > >> > character string, regardless of the value of > > recycle0. > > > This would be > > > >> > similar to when there are no non-NULL arguments > > to paste; > > > >> collapse="." > > > >> > gives a single empty string and collapse=NULL > > gives a zero > > > long > > > >> character > > > >> > vector. > > > >> >> paste() > > > >> > character(0) > > > >> >> paste(collapse=", ") > > > >> > [1] "" > > > >> > > > > >> > Bill Dunlap > > > >> > TIBCO Software > > > >> > wdunlap tibco.com > > < > https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e> > > > > > > < > https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=o9ozvxBK-kVvAUFro7U1RrI5w0U8EPb0uyjQwMvOpt8&e> > > > > >> > > > > > < > https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=rXIwWqf4U4HZS_bjUT3KfA9ARaV5YTb_kEcXWHnkt-c&e> > > > > >> > > > > >> > > > > >> > On Thu, Apr 30, 2020 at 9:56 PM > > suharto_anggono--- via > > > R-devel < > > > >> > r-devel at r-project.org > > <mailto:r-devel at r-project.org> <mailto:r-devel at r-project.org > > <mailto:r-devel at r-project.org>> > > > <mailto:r-devel at r-project.org > > <mailto:r-devel at r-project.org> <mailto:r-devel at r-project.org > > <mailto:r-devel at r-project.org>>>> wrote: > > > >> > > > > >> >> Without 'collapse', 'paste' pastes > > (concatenates) its > > > arguments > > > >> >> elementwise (separated by 'sep', " " by > > default). New in > > > R devel > > > >> and R > > > >> >> patched, specifying recycle0 = FALSE makes > mixing > > > zero-length and > > > >> >> nonzero-length arguments results in length > > zero. The > > > result of > > > >> paste(n, > > > >> >> "th", sep = "", recycle0 = FALSE) always have > > the same > > > length as > > > >> 'n'. > > > >> >> Previously, the result is still as long as the > > longest > > > argument, > > > >> with the > > > >> >> zero-length argument like "". If all og the > > arguments have > > > >> length zero, > > > >> >> 'recycle0' doesn't matter. > > > >> >> > > > >> >> As far as I understand, 'paste' with > > 'collapse' as a > > > character > > > >> string is > > > >> >> supposed to put together elements of a vector > > into a single > > > >> character > > > >> >> string. I think 'recycle0' shouldn't change it. > > > >> >> > > > >> >> In current R devel and R patched, > > paste(character(0), > > > collapse = "", > > > >> >> recycle0 = FALSE) is character(0). I think it > > should be > > > "", like > > > >> >> paste(character(0), collapse=""). > > > >> >> > > > >> >> paste(c("4", "5"), "th", sep = "", collapse > > ", ", > > > recycle0 > > > >> FALSE) > > > >> >> is > > > >> >> "4th, 5th". > > > >> >> paste(c("4" ), "th", sep = "", collapse > > ", ", > > > recycle0 > > > >> FALSE) > > > >> >> is > > > >> >> "4th". > > > >> >> I think > > > >> >> paste(c( ), "th", sep = "", collapse > > ", ", > > > recycle0 > > > >> FALSE) > > > >> >> should be > > > >> >> "", > > > >> >> not character(0). > > > >> >> > > > >> >> ______________________________________________ > > > >> >> R-devel at r-project.org > > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org>> > > > <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org>>> > > > mailing list > > > >> >> > > > >> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e> > > >> >> > > > >> > > > > >> > [[alternative HTML version deleted]] > > > >> > > > > >> > ______________________________________________ > > > >> > R-devel at r-project.org > > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org>> > > > <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org>>> > > > mailing list > > > >> > > > > >> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e> > > >> > > > > >> > > > >> -- > > > >> Herv? Pag?s > > > >> > > > >> Program in Computational Biology > > > >> Division of Public Health Sciences > > > >> Fred Hutchinson Cancer Research Center > > > >> 1100 Fairview Ave. N, M1-B514 > > > >> P.O. Box 19024 > > > >> Seattle, WA 98109-1024 > > > >> > > > >> E-mail: hpages at fredhutch.org > > <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org > > <mailto:hpages at fredhutch.org>> > > > <mailto:hpages at fredhutch.org > > <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org > > <mailto:hpages at fredhutch.org>>> > > > >> Phone: (206) 667-5791 > > > >> Fax: (206) 667-1319 > > > >> > > > >> ______________________________________________ > > > >> R-devel at r-project.org > > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org>> > > > <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org>>> > > > mailing list > > > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e> > > > > > > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e> > > > > >> > > > > > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=COnDeGgHNnHJlLLZOznMlhcaFU1nIRlkaSbssvlrMvw&e> > > > > >> > > > > > > > -- > > > > Herv? Pag?s > > > > > > > Program in Computational Biology > > > > Division of Public Health Sciences > > > > Fred Hutchinson Cancer Research Center > > > > 1100 Fairview Ave. N, M1-B514 > > > > P.O. Box 19024 > > > > Seattle, WA 98109-1024 > > > > > > > E-mail: hpages at fredhutch.org > > <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org > > <mailto:hpages at fredhutch.org>> > > > > Phone: (206) 667-5791 > > > > Fax: (206) 667-1319 > > > > > > > ______________________________________________ > > > > R-devel at r-project.org > > <mailto:R-devel at r-project.org> <mailto:R-devel at r-project.org > > <mailto:R-devel at r-project.org>> mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e> > > > > > > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e> > > > > > > > > -- > > Herv? Pag?s > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org> > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > > > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 >[[alternative HTML version deleted]]
Gabriel Becker
2020-May-24 00:48 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
Brodie, A good point, but more analogous to what I'm concerned with is> sum(5, numeric(0))[1] 5 Not 0 (the analogu of Herve's desired behavior). Best, ~G PS Brodie sorry for the double. On Fri, May 22, 2020 at 6:12 PM brodie gaslam <brodie.gaslam at yahoo.com> wrote:> > On Friday, May 22, 2020, 6:16:45 PM EDT, Herv? Pag?s < > hpages at fredhutch.org> wrote: > > > > Gabe, > > > > It's the current behavior of paste() that is a major source of bugs: > > > > ## Add "rs" prefix to SNP ids and collapse them in a > > ## comma-separated string. > > collapse_snp_ids <- function(snp_ids) > > paste("rs", snp_ids, sep="", collapse=",") > > > > snp_groups <- list( > > group1=c(55, 22, 200), > > group2=integer(0), > > group3=c(99, 550) > > ) > > > > vapply(snp_groups, collapse_snp_ids, character(1)) > > # group1 group2 group3 > > # "rs55,rs22,rs200" "rs" "rs99,rs550" > > > > This has hit me so many times! > > > > Now with 'collapse0=TRUE', we finally have the opportunity to make it do > > the right thing. Let's not miss that opportunity. > > > > Cheers, > > H. > > FWIW what convinces me is consistency with other aggregating functions > applied > to zero length inputs: > > sum(numeric(0)) > ## [1] 0 > > > > > > > On 5/22/20 11:26, Gabriel Becker wrote: > > > I understand that this is consistent but it also strikes me as an > > > enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth > > > over at this point in user-facing R space. > > > > > > For the record I'm not suggesting it should return something other than > > > "", and in particular I'm not arguing that any call to paste /that does > > > not return an error/ with non-NULL collapse should return a character > > > vector of length one. >[[alternative HTML version deleted]]
Hervé Pagès
2020-May-24 04:59 UTC
[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
On 5/23/20 17:45, Gabriel Becker wrote:> Maybe my intuition is just > different?but when I collapse multiple character vectors together, I > expect?all the characters from each of those vectors to be in the > resulting collapsed one.Yes I'd expect that too. But the **collapse** operation in paste() has never been about collapsing **multiple** character vectors together. What it does is collapse the **single** character vector that comes out of the 'sep' operation. So paste(x, y, z, sep="", collapse=",") is analogous to sum(x + y + z) The element-wise addition is analog to the 'sep' operation. The sum() operation is analog to the 'collapse' operation. H.
Possibly Parallel Threads
- paste(character(0), collapse="", recycle0=FALSE) should be ""
- paste(character(0), collapse="", recycle0=FALSE) should be ""
- paste(character(0), collapse="", recycle0=FALSE) should be ""
- paste(character(0), collapse="", recycle0=FALSE) should be ""
- paste(character(0), collapse="", recycle0=FALSE) should be ""