I fully agree! General string interpolation opens a gaping security hole and is
accompanied by all kinds of problems and decisions. What I envision instead is
something like this:
f?hello {name}?
Which gets parsed by R to this:
(STRINTERPSXP (CHARSXP (PROMISE nil)))
Basically, a new type of R language construct that still can be processed by
packages (for customized interpolation like in cli etc.), with a default eval
which is basically paste0(). The benefit here would be that this is eagerly
parsed and syntactically checked, and that the promise code could carry a
srcref. And of course, that you could pass an interpolated string expression
lazily between frames without losing the environment etc? For more advanced
applications, a low level string interpolation expression constructor could be
provided (that could either parse a general string ? at the user?s risk, or
build it directly from expressions).
? Taras
> On 7 Dec 2021, at 12:06, Simon Urbanek <simon.urbanek at
R-project.org> wrote:
>
>
>
>> On Dec 7, 2021, at 22:09, Taras Zakharko <taras.zakharko at uzh.ch
<mailto:taras.zakharko at uzh.ch>> wrote:
>>
>> Great summary, Avi.
>>
>> String concatenation cold be trivially added to R, but it probably
should not be. You will notice that modern languages tend not to use ?+? to do
string concatenation (they either have
>> a custom operator or a special kind of pattern to do it) due to
practical issues such an approach brings (implicit type casting, lack of
commutativity, performance etc.). These issues will be felt even more so in R
with it?s weak typing, idiosyncratic casting behavior and NAs.
>>
>> As other?s have pointed out, any kind of behavior one wants from string
concatenation can be implemented by custom operators as needed. This is not
something that needs to be in the base R. I would rather like the efforts to be
directed on improving string formatting (such as glue-style built-in string
interpolation).
>>
>
> This is getting OT, but there is a very good reason why string
interpolation is not in core R. As I recall it has been considered some time
ago, but it is very dangerous as it implies evaluation on constants which opens
a huge security hole and has questionable semantics (where you evaluate etc).
Hence it's much easier to ban a package than to hack it out of R ;).
>
> Cheers,
> Simon
>
>
>> ? Taras
>>
>>
>>> On 7 Dec 2021, at 02:27, Avi Gross via R-devel <r-devel at
r-project.org> wrote:
>>>
>>> After seeing what others are saying, it is clear that you need to
carefully
>>> think things out before designing any implementation of a more
native
>>> concatenation operator whether it is called "+' or
anything else. There may
>>> not be any ONE right solution but unlike a function version like
paste()
>>> there is nowhere to place any options that specify what you mean.
>>>
>>> You can obviously expand paste() to accept arguments like
replace.NA="" or
>>> replace.NA="<NA>" and similar arguments on what to
do if you see a NaN, and
>>> Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you
might tell
>>> to make other substitutions as in substitute=list(100=99, D=F) or
any other
>>> nonsense you can come up with.
>>>
>>> But you have nowhere to put options when saying:
>>>
>>> c <- a + b
>>>
>>> Sure, you could set various global options before the addition and
maybe
>>> rest them after, but that is not a way I like to go for something
this
>>> basic.
>>>
>>> And enough such tinkering makes me wonder if it is easier to ask a
user to
>>> use a slightly different function like this:
>>>
>>> paste.no.na <- function(...) do.call(paste,
Filter(Negate(is.na),
>>> list(...)))
>>>
>>> The above one-line function removes any NA from the argument list
to make a
>>> potentially shorter list before calling the real paste() using it.
>>>
>>> Variations can, of course, be made that allow functionality as
above.
>>>
>>> If R was a true object-oriented language in the same sense as
others like
>>> Python, operator overloading of "+" might be doable in
more complex ways but
>>> we can only work with what we have. I tend to agree with others
that in some
>>> places R is so lenient that all kinds of errors can happen because
it makes
>>> a guess on how to correct it. Generally, if you really want to mix
numeric
>>> and character, many languages require you to transform any
arguments to make
>>> all of compatible types. The paste() function is clearly stated to
coerce
>>> all arguments to be of type character for you. Whereas a+b makes no
such
>>> promises and also is not properly defined even if a and b are both
of type
>>> character. Sure, we can expand the language but it may still do
things some
>>> find not to be quite what they wanted as in
"2"+"3" becoming "23" rather
>>> than 5. Right now, I can use
as.numeric("2")+as.numeric("3") and get the
>>> intended result after making very clear to anyone reading the code
that I
>>> wanted strings converted to floating point before the addition.
>>>
>>> As has been pointed out, the plus operator if used to concatenate
does not
>>> have a cognate for other operations like -*/ and R has used most
other
>>> special symbols for other purposes. So, sure, we can use something
like ....
>>> (4 periods) if it is not already being used for something but using
+ here
>>> is a tad confusing. Having said that, the makers of Python did make
that
>>> choice.
>>>
>>> -----Original Message-----
>>> From: R-devel <r-devel-bounces at r-project.org> On Behalf Of
Gabriel Becker
>>> Sent: Monday, December 6, 2021 7:21 PM
>>> To: Bill Dunlap <williamwdunlap at gmail.com>
>>> Cc: Radford Neal <radford at cs.toronto.edu>; r-devel
<r-devel at r-project.org>
>>> Subject: Re: [Rd] string concatenation operator (revisited)
>>>
>>> As I recall, there was a large discussion related to that which
resulted in
>>> the recycle0 argument being added (but defaulting to FALSE) for
>>> paste/paste0.
>>>
>>> I think a lot of these things ultimately mean that if there were to
be a
>>> string concatenation operator, it probably shouldn't have
behavior identical
>>> to paste0. Was that what you were getting at as well, Bill?
>>>
>>> ~G
>>>
>>> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap <williamwdunlap at
gmail.com> wrote:
>>>
>>>> Should paste0(character(0), c("a","b"))
give character(0)?
>>>> There is a fair bit of code that assumes that
paste("X",NULL) gives "X"
>>>> but c(1,2)+NULL gives numeric(0).
>>>>
>>>> -Bill
>>>>
>>>> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch
>>>> <murdoch.duncan at gmail.com>
>>>> wrote:
>>>>
>>>>> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
>>>>>> Gabe, I agree that missingness is important to factor
in. To
>>>>>> somewhat
>>>>> abuse
>>>>>> the terminology, NA is often used to represent
missingness. Perhaps
>>>>>> concatenating character something with character
something missing
>>>>> should
>>>>>> result in the original character?
>>>>>
>>>>> I think that's a bad idea. If you wanted to represent
an empty
>>>>> string, you should use "" or NULL, not NA.
>>>>>
>>>>> I'd agree with Gabe, paste0("abc", NA)
shouldn't give "abcNA", it
>>>>> should give NA.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>>
>>>>>> Avi
>>>>>>
>>>>>> On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker
>>>>>> <gabembecker at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Seeing this and the other thread (and admittedly
not having
>>>>>>> clicked
>>>>> through
>>>>>>> to the linked r-help thread), I wonder about NAs.
>>>>>>>
>>>>>>> Should NA <concat> "hi there" not
result in NA_character_? This
>>>>>>> is not what any of the paste functions do, but in
my opinoin, NA +
>>>>> <non_na_value>
>>>>>>> seems like it should be NA (not "NA"),
particularly if we are
>>>>>>> talking about `+` overloading, but potentially even
in the case of
>>>>>>> a distinct concatenation operator?
>>>>>>>
>>>>>>> I guess what I'm saying is that in my head
missingness propagation
>>>>> rules
>>>>>>> should take priority in such an operator (ie NA +
<anything>
>>>>>>> should *always * be NA).
>>>>>>>
>>>>>>> Is that something others disagree with, or has it
just not come up
>>>>>>> yet
>>>>> in
>>>>>>> (the parts I have read) of this discussion?
>>>>>>>
>>>>>>> Best,
>>>>>>> ~G
>>>>>>>
>>>>>>> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal
>>>>>>> <radford at cs.toronto.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>>> In pqR (see pqR-project.org), I have
implemented ! and !! as
>>>>>>>>>> binary string concatenation operators,
equivalent to paste0 and
>>>>>>>>>> paste, respectively.
>>>>>>>>>>
>>>>>>>>>> For instance,
>>>>>>>>>>
>>>>>>>>>>> "hello" !
"world"
>>>>>>>>>> [1] "helloworld"
>>>>>>>>>>> "hello" !!
"world"
>>>>>>>>>> [1] "hello world"
>>>>>>>>>>> "hello" !! 1:4
>>>>>>>>>> [1] "hello 1" "hello
2" "hello 3" "hello 4"
>>>>>>>>>
>>>>>>>>> I'm curious about the details:
>>>>>>>>>
>>>>>>>>> Would `1 ! 2` convert both to strings?
>>>>>>>>
>>>>>>>> They're equivalent to paste0 and paste, so
1 ! 2 produces "12",
>>>>>>>> just like paste0(1,2) does. Of course, they
wouldn't have to be
>>>>>>>> exactly equivalent to paste0 and paste - one
could impose
>>>>>>>> stricter requirements if that seemed better for
error detection.
>>>>>>>> Off hand, though, I think automatically
converting is more in
>>>>>>>> keeping with the rest of R. Explicitly
converting with as.character
>>> could be tedious.
>>>>>>>>
>>>>>>>> I suppose disallowing logical arguments might
make sense to guard
>>>>>>>> against typos where ! was meant to be the
unary-not operator, but
>>>>>>>> ended up being a binary operator, after some
sort of typo. I
>>>>>>>> doubt that this would be a common error,
though.
>>>>>>>>
>>>>>>>> (Note that there's no ambiguity when there
are no typos, except
>>>>>>>> that when negation is involved a space may be
needed - so, for
>>>>>>>> example, "x" ! !TRUE is
"xFALSE", but "x"!!TRUE is "x TRUE".
>>>>>>>> Existing uses of double negation are still fine
- eg, a <- !!TRUE
>>> still sets a to TRUE.
>>>>>>>> Parsing of operators is greedy, so
"x"!!!TRUE is "x FALSE", not
>>>>> "xTRUE".)
>>>>>>>>
>>>>>>>>> Where does the binary ! fit in the operator
priority? E.g. how
>>>>>>>>> is
>>>>>>>>>
>>>>>>>>> a ! b > c
>>>>>>>>>
>>>>>>>>> parsed?
>>>>>>>>
>>>>>>>> As (a ! b) > c.
>>>>>>>>
>>>>>>>> Their precedence is between that of + and - and
that of < and >.
>>>>>>>> So "x" ! 1+2 evalates to
"x3" and "x" ! 1+2 < "x4" is TRUE.
>>>>>>>>
>>>>>>>> (Actually, pqR also has a .. operator that
fixes the problems
>>>>>>>> with generating sequences with the : operator,
and it has
>>>>>>>> precedence lower than + and - and higher than !
and !!, but
>>>>>>>> that's not relevant if you don't have
the .. operator.)
>>>>>>>>
>>>>>>>> Radford Neal
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org <mailto:R-devel at r-project.org>
mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>
[[alternative HTML version deleted]]