On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com> wrote:>> ... adding the ability to concat >> strings with '+' would be a relatively simple addition (no pun intended) > to >> the code base I believe. With a lot of other languages supporting this > kind >> of concatenation, this is what surprised me most when first learning R. > > Wow! R has a lot of surprising features and I would have thought > this would be quite a way down the list.Well, it is hard to guess what users and people in general find surprising. As '+' is used for string concatenation in essentially all major scripting (and many other) languages, personally I am not surprised that this is surprising for people. :)> How would this new '+' deal with factors, as paste does or as the current > '+' > does?The same as before. It would not change the behavior for other classes, only basic characters.> Would number+string and string+number cause errors (as in current > '+' in R and python) or coerce both to strings (as in current R:paste and > in perl's '+').Would cause errors, exactly as it does right now.> Having '+' work on all types of data can let improperly imported data > get further into the system before triggering an error.Nobody is asking for this. Only characters, not all types of data.> I see lots of > errors > reported on this list that are due to read.table interpreting text as > character > strings instead of the numbers that the user expected. Detecting that > error as early as possible is good.Isn't that a problem with read.table then? Detecting it there would be the earliest possible, no? Gabor [...]
if '+' and paste don't change their behavior with respect to factors but you encourage people to use '+' instead of paste then you will run into problems with data.frame columns because many people don't notice whether a character-like column is character or factor. With paste() this is not a problem but with '+' it is. I think it is good not to make people worry about this much. As for the recycling issue, consider calls involving NULL arguments, > f <- function(n)paste0(n, " test", if(n!=1)"s", " failed") > f(1) [1] "1 test failed" > f(0) [1] "0 tests failed" If paste0 followed the same recycling rules as "+" then f(1) would return character(0). There is a fair bit of code like that on CRAN. Consider using sprintf() to get the sort of recycling rules that "+" uses > sprintf("%s is %d", c("One","Two"), numeric(0)) character(0) > sprintf("%s is %d", c("One","Two"), 17) [1] "One is 17" "Two is 17" > sprintf("%s is %d", c("One","Two"), 26:27) [1] "One is 26" "Two is 27" Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jun 17, 2015 at 9:56 AM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:> On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com> > wrote: > >> ... adding the ability to concat > >> strings with '+' would be a relatively simple addition (no pun intended) > > to > >> the code base I believe. With a lot of other languages supporting this > > kind > >> of concatenation, this is what surprised me most when first learning R. > > > > Wow! R has a lot of surprising features and I would have thought > > this would be quite a way down the list. > > Well, it is hard to guess what users and people in general find > surprising. As '+' is used for string concatenation in essentially all > major scripting (and many other) languages, personally I am not > surprised that this is surprising for people. :) > > > How would this new '+' deal with factors, as paste does or as the current > > '+' > > does? > > The same as before. It would not change the behavior for other > classes, only basic characters. > > > Would number+string and string+number cause errors (as in current > > '+' in R and python) or coerce both to strings (as in current R:paste and > > in perl's '+'). > > Would cause errors, exactly as it does right now. > > > Having '+' work on all types of data can let improperly imported data > > get further into the system before triggering an error. > > Nobody is asking for this. Only characters, not all types of data. > > > I see lots of > > errors > > reported on this list that are due to read.table interpreting text as > > character > > strings instead of the numbers that the user expected. Detecting that > > error as early as possible is good. > > Isn't that a problem with read.table then? Detecting it there would be > the earliest possible, no? > > Gabor > > [...] >[[alternative HTML version deleted]]
Hi Bill, On 06/17/2015 12:36 PM, William Dunlap wrote:> if '+' and paste don't change their behavior with respect to > factors but you encourage people to use '+' instead of paste > then you will run into problems with data.frame columns because > many people don't notice whether a character-like column is > character or factor. With paste() this is not a problem but with '+' > it is. I think it is good not to make people worry about this much. > > As for the recycling issue, consider calls involving NULL arguments, > > f <- function(n)paste0(n, " test", if(n!=1)"s", " failed") > > f(1) > [1] "1 test failed" > > f(0) > [1] "0 tests failed" > If paste0 followed the same recycling rules as "+" then f(1) would return > character(0). There is a fair bit of code like that on CRAN.OTOH a very common use case is to use paste (or paste0) to add a given prefix (or suffix) to a bunch of strings: paste0("ID", x) # buggy! (won't do the right thing if length(x) is 0) This is like "adding" something to 'x' so it's conceptually no different from doing: x + 5 which does the right thing when 'x' is a numeric(0). Anyway, I don't think anybody suggested to change the recycling rules of paste() or paste0() (which would of course break some existing code that relies on it, but that's a very generic statement right?), only to adopt the recycling rules of `+` and other binary arithmetic and comparison operators if `+` was used to concatenate strings. Cheers, H.> > Consider using sprintf() to get the sort of recycling rules that "+" uses > > sprintf("%s is %d", c("One","Two"), numeric(0)) > character(0) > > sprintf("%s is %d", c("One","Two"), 17) > [1] "One is 17" "Two is 17" > > sprintf("%s is %d", c("One","Two"), 26:27) > [1] "One is 26" "Two is 27" > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Jun 17, 2015 at 9:56 AM, G?bor Cs?rdi <csardi.gabor at gmail.com> > wrote: > >> On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com> >> wrote: >>>> ... adding the ability to concat >>>> strings with '+' would be a relatively simple addition (no pun intended) >>> to >>>> the code base I believe. With a lot of other languages supporting this >>> kind >>>> of concatenation, this is what surprised me most when first learning R. >>> >>> Wow! R has a lot of surprising features and I would have thought >>> this would be quite a way down the list. >> >> Well, it is hard to guess what users and people in general find >> surprising. As '+' is used for string concatenation in essentially all >> major scripting (and many other) languages, personally I am not >> surprised that this is surprising for people. :) >> >>> How would this new '+' deal with factors, as paste does or as the current >>> '+' >>> does? >> >> The same as before. It would not change the behavior for other >> classes, only basic characters. >> >>> Would number+string and string+number cause errors (as in current >>> '+' in R and python) or coerce both to strings (as in current R:paste and >>> in perl's '+'). >> >> Would cause errors, exactly as it does right now. >> >>> Having '+' work on all types of data can let improperly imported data >>> get further into the system before triggering an error. >> >> Nobody is asking for this. Only characters, not all types of data. >> >>> I see lots of >>> errors >>> reported on this list that are due to read.table interpreting text as >>> character >>> strings instead of the numbers that the user expected. Detecting that >>> error as early as possible is good. >> >> Isn't that a problem with read.table then? Detecting it there would be >> the earliest possible, no? >> >> Gabor >> >> [...] >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
At the risk of unnecessarily (annoyingly?) prolonging a conversation that has died down... I don't think I've seen the sep or collapse arguments to paste mentioned as aspects to consider. I don't see any way in which this version of '+' could offer those arguments. Hence I would consider this version of '+' to be a just convenience function, i.e., a function that, for convenience, implements a special case of a more general function. It would not be a different type of concatenation, nor would it improve the current methods of string concatenation. There is precedent in R for convenience functions. Indeed, I consider paste0 to be a convenience function for paste with sep=''. read.csv and several others are convenience functions that implement special cases of read.table. Viewed that way, I see no intrinsic conceptual impediment to introducing a version of '+' that does string concatenation. Of course, those who did the work would have to decide how it would handle recycling and other issues that have been raised. However, whether or not it would be a good idea to do so, or worth the effort, is not clear. I've never felt that ... it would be nice if R did something the same way as language X ... is by itself a strong argument for introducing a new function or capability. Speaking as a long-time user, I wouldn't ask R core to spend time on it. Would I use it if it were available? Possibly over time I might migrate toward using it in simple situations. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 6/17/15, 12:36 PM, "R-devel on behalf of William Dunlap" <r-devel-bounces at r-project.org on behalf of wdunlap at tibco.com> wrote:>if '+' and paste don't change their behavior with respect to >factors but you encourage people to use '+' instead of paste >then you will run into problems with data.frame columns because >many people don't notice whether a character-like column is >character or factor. With paste() this is not a problem but with '+' >it is. I think it is good not to make people worry about this much. > >As for the recycling issue, consider calls involving NULL arguments, > > f <- function(n)paste0(n, " test", if(n!=1)"s", " failed") > > f(1) > [1] "1 test failed" > > f(0) > [1] "0 tests failed" >If paste0 followed the same recycling rules as "+" then f(1) would return >character(0). There is a fair bit of code like that on CRAN. > >Consider using sprintf() to get the sort of recycling rules that "+" uses > > sprintf("%s is %d", c("One","Two"), numeric(0)) > character(0) > > sprintf("%s is %d", c("One","Two"), 17) > [1] "One is 17" "Two is 17" > > sprintf("%s is %d", c("One","Two"), 26:27) > [1] "One is 26" "Two is 27" > > > >Bill Dunlap >TIBCO Software >wdunlap tibco.com > >On Wed, Jun 17, 2015 at 9:56 AM, G?bor Cs?rdi <csardi.gabor at gmail.com> >wrote: > >> On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap <wdunlap at tibco.com> >> wrote: >> >> ... adding the ability to concat >> >> strings with '+' would be a relatively simple addition (no pun >>intended) >> > to >> >> the code base I believe. With a lot of other languages supporting >>this >> > kind >> >> of concatenation, this is what surprised me most when first learning >>R. >> > >> > Wow! R has a lot of surprising features and I would have thought >> > this would be quite a way down the list. >> >> Well, it is hard to guess what users and people in general find >> surprising. As '+' is used for string concatenation in essentially all >> major scripting (and many other) languages, personally I am not >> surprised that this is surprising for people. :) >> >> > How would this new '+' deal with factors, as paste does or as the >>current >> > '+' >> > does? >> >> The same as before. It would not change the behavior for other >> classes, only basic characters. >> >> > Would number+string and string+number cause errors (as in current >> > '+' in R and python) or coerce both to strings (as in current R:paste >>and >> > in perl's '+'). >> >> Would cause errors, exactly as it does right now. >> >> > Having '+' work on all types of data can let improperly imported data >> > get further into the system before triggering an error. >> >> Nobody is asking for this. Only characters, not all types of data. >> >> > I see lots of >> > errors >> > reported on this list that are due to read.table interpreting text as >> > character >> > strings instead of the numbers that the user expected. Detecting that >> > error as early as possible is good. >> >> Isn't that a problem with read.table then? Detecting it there would be >> the earliest possible, no? >> >> Gabor >> >> [...] >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-devel at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel