Andreas Kersting
2017-Jun-14  11:45 UTC
[Rd] [WISH / PATCH] possibility to split string literals across multiple lines
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 14/06/2017 5:58 AM, Andreas Kersting wrote: > > Hi, > > > > I would really like to have a way to split long string literals across > > multiple lines in R. > > I don't understand why you require the string to be a literal. Why not > construct the long string in an expression like > > paste0("aaa", > "bbb") > > ? Surely the execution time of the paste0 call is negligible. > > Duncan MurdochActually "execution time" is precisely one of the reasons why I would like to see this feature as - depending on the context (e.g. in a tight loop) - the execution time of paste0 (or probably also glue, thanks Gabor) is not necessarily insignificant. The other reason is style: I think it is cleaner if we can construct such a long string literal without the need for a function call. Andreas> > > > Currently, if a string literal spans multiple lines, there is no way to > > inhibit the introduction of newline characters: > > > > > "aaa > > + bbb" > > [1] "aaa\nbbb" > > > > > > If a line ends with a backslash, it is just ignored: > > > > > "aaa\ > > + bbb" > > [1] "aaa\nbbb" > > > > > > We could use this fact to implement string splitting in a fairly > > backward-compatible way, since currently such trailing backslashes > > should hardly be used as they do not have any effect. The attached patch > > makes the parser ignore a newline character directly following a backslash: > > > > > "aaa\ > > + bbb" > > [1] "aaabbb" > > > > > > I personally would also prefer if leading blanks (spaces and tabs) in > > the second line are ignored to allow for proper indentation: > > > > > "aaa \ > > + bbb" > > [1] "aaa bbb" > > > > > "aaa\ > > + \ bbb" > > [1] "aaa bbb" > > > > This is also implemented by this patch. > > > > > > An alternative approach could be to have something like > > > > ("aaa " > > "bbb") > > > > or > > > > ("aaa ", > > "bbb") > > > > be interpreted as "aaa bbb". > > > > I don't know the ins and outs of the parser of R (hence: please very > > carefully review the attached patch), but I guess this would be more > > work to implement!? > > > > > > What do you think? Is there anybody else who is missing this feature in > > the first place? > > > > Regards, > > Andreas > > > > > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > >
Mark van der Loo
2017-Jun-14  12:00 UTC
[Rd] [WISH / PATCH] possibility to split string literals across multiple lines
Having some line-breaking character for string literals would have benefits as string literals can then be constructed parse-time rather than run-time. I have run into this myself a few times as well. One way to at least emulate something like that is the following. `%+%` <- function(x,y) paste0(x,y) "hello" %+% " pretty" %+% " world" -Mark Op wo 14 jun. 2017 om 13:53 schreef Andreas Kersting <r-devel at akersting.de>:> On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch < > murdoch.duncan at gmail.com> wrote: > > > On 14/06/2017 5:58 AM, Andreas Kersting wrote: > > > Hi, > > > > > > I would really like to have a way to split long string literals across > > > multiple lines in R. > > > > I don't understand why you require the string to be a literal. Why not > > construct the long string in an expression like > > > > paste0("aaa", > > "bbb") > > > > ? Surely the execution time of the paste0 call is negligible. > > > > Duncan Murdoch > > Actually "execution time" is precisely one of the reasons why I would like > to see this feature as - depending on the context (e.g. in a tight loop) - > the execution time of paste0 (or probably also glue, thanks Gabor) is not > necessarily insignificant. > > The other reason is style: I think it is cleaner if we can construct such > a long string literal without the need for a function call. > > Andreas > > > > > > > Currently, if a string literal spans multiple lines, there is no way to > > > inhibit the introduction of newline characters: > > > > > > > "aaa > > > + bbb" > > > [1] "aaa\nbbb" > > > > > > > > > If a line ends with a backslash, it is just ignored: > > > > > > > "aaa\ > > > + bbb" > > > [1] "aaa\nbbb" > > > > > > > > > We could use this fact to implement string splitting in a fairly > > > backward-compatible way, since currently such trailing backslashes > > > should hardly be used as they do not have any effect. The attached > patch > > > makes the parser ignore a newline character directly following a > backslash: > > > > > > > "aaa\ > > > + bbb" > > > [1] "aaabbb" > > > > > > > > > I personally would also prefer if leading blanks (spaces and tabs) in > > > the second line are ignored to allow for proper indentation: > > > > > > > "aaa \ > > > + bbb" > > > [1] "aaa bbb" > > > > > > > "aaa\ > > > + \ bbb" > > > [1] "aaa bbb" > > > > > > This is also implemented by this patch. > > > > > > > > > An alternative approach could be to have something like > > > > > > ("aaa " > > > "bbb") > > > > > > or > > > > > > ("aaa ", > > > "bbb") > > > > > > be interpreted as "aaa bbb". > > > > > > I don't know the ins and outs of the parser of R (hence: please very > > > carefully review the attached patch), but I guess this would be more > > > work to implement!? > > > > > > > > > What do you think? Is there anybody else who is missing this feature in > > > the first place? > > > > > > Regards, > > > Andreas > > > > > > > > > > > > ______________________________________________ > > > R-devel at r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Joris Meys
2017-Jun-14  12:18 UTC
[Rd] [WISH / PATCH] possibility to split string literals across multiple lines
Mark, that's actually a fair statement, although your extra operator
doesn't cause construction at parse time. You still call paste0(), but just
add an extra layer on top of it.
I also doubt that even in gigantic loops the benefit is going to be
significant. Take following example:
atestfun <- function(x){
  y <- paste0("a very long",
         "string for testing")
  grep(x, y)
}
atestfun2 <- function(x){
  y <- "a very long
string for testing"
  grep(x,y)
}
cfun <- cmpfun(atestfun)
cfun2 <- cmpfun(atestfun2)
require(rbenchmark)
benchmark(atestfun("a"),
          atestfun2("a"),
          cfun("a"),
          cfun2("a"),
          replications = 100000)
Which gives after 100,000 replications:
            test replications elapsed relative
1  atestfun("a")       100000    0.83    1.339
2 atestfun2("a")       100000    0.62    1.000
3      cfun("a")       100000    0.81    1.306
4     cfun2("a")       100000    0.62    1.000
The patch can in principle make similar code marginally faster, but I'm not
convinced the patch is going to make any real difference except for in some
very specific and exotic cases. Even more, calling a function like the
examples inside the loop is the only way I can come up with where this
might be a problem. If you just construct the string inside the loop,
there's two possibilities:
- the string does not need to change, and then you better construct it
outside of the loop
- the string does need to change, and then you need paste() or paste0()
anyway
I'm not against incorporating the patch, as it would eliminate a few
keystrokes. It's a neat idea, but I don't expect any other noticeable
advantage from it.
my humble 2 cents
Cheers
Joris
On Wed, Jun 14, 2017 at 2:00 PM, Mark van der Loo <mark.vanderloo at
gmail.com>
wrote:
> Having some line-breaking character for string literals would have benefits
> as string literals can then be constructed parse-time rather than run-time.
> I have run into this myself a few times as well. One way to at least
> emulate something like that is the following.
>
> `%+%` <- function(x,y) paste0(x,y)
>
> "hello" %+%
>   " pretty" %+%
>   " world"
>
>
> -Mark
>
>
>
> Op wo 14 jun. 2017 om 13:53 schreef Andreas Kersting <r-devel at
akersting.de
> >:
>
> > On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch <
> > murdoch.duncan at gmail.com> wrote:
> >
> > > On 14/06/2017 5:58 AM, Andreas Kersting wrote:
> > > > Hi,
> > > >
> > > > I would really like to have a way to split long string
literals
> across
> > > > multiple lines in R.
> > >
> > > I don't understand why you require the string to be a
literal.  Why not
> > > construct the long string in an expression like
> > >
> > >   paste0("aaa",
> > >          "bbb")
> > >
> > > ?  Surely the execution time of the paste0 call is negligible.
> > >
> > > Duncan Murdoch
> >
> > Actually "execution time" is precisely one of the reasons
why I would
> like
> > to see this feature as - depending on the context (e.g. in a tight
loop)
> -
> > the execution time of paste0 (or probably also glue, thanks Gabor) is
not
> > necessarily insignificant.
> >
> > The other reason is style: I think it is cleaner if we can construct
such
> > a long string literal without the need for a function call.
> >
> > Andreas
> >
> > > >
> > > > Currently, if a string literal spans multiple lines, there
is no way
> to
> > > > inhibit the introduction of newline characters:
> > > >
> > > >  > "aaa
> > > > + bbb"
> > > > [1] "aaa\nbbb"
> > > >
> > > >
> > > > If a line ends with a backslash, it is just ignored:
> > > >
> > > >  > "aaa\
> > > > + bbb"
> > > > [1] "aaa\nbbb"
> > > >
> > > >
> > > > We could use this fact to implement string splitting in a
fairly
> > > > backward-compatible way, since currently such trailing
backslashes
> > > > should hardly be used as they do not have any effect. The
attached
> > patch
> > > > makes the parser ignore a newline character directly
following a
> > backslash:
> > > >
> > > >  > "aaa\
> > > > + bbb"
> > > > [1] "aaabbb"
> > > >
> > > >
> > > > I personally would also prefer if leading blanks (spaces and
tabs) in
> > > > the second line are ignored to allow for proper indentation:
> > > >
> > > >  >   "aaa \
> > > > +    bbb"
> > > > [1] "aaa bbb"
> > > >
> > > >  >   "aaa\
> > > > +    \ bbb"
> > > > [1] "aaa bbb"
> > > >
> > > > This is also implemented by this patch.
> > > >
> > > >
> > > > An alternative approach could be to have something like
> > > >
> > > > ("aaa "
> > > > "bbb")
> > > >
> > > > or
> > > >
> > > > ("aaa ",
> > > > "bbb")
> > > >
> > > > be interpreted as "aaa bbb".
> > > >
> > > > I don't know the ins and outs of the parser of R (hence:
please very
> > > > carefully review the attached patch), but I guess this would
be more
> > > > work to implement!?
> > > >
> > > >
> > > > What do you think? Is there anybody else who is missing this
feature
> in
> > > > the first place?
> > > >
> > > > Regards,
> > > > Andreas
> > > >
> > > >
> > > >
> > > > ______________________________________________
> > > > R-devel at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > >
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics
tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
	[[alternative HTML version deleted]]
Duncan Murdoch
2017-Jun-14  13:36 UTC
[Rd] [WISH / PATCH] possibility to split string literals across multiple lines
On 14/06/2017 6:45 AM, Andreas Kersting wrote:> On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > >> On 14/06/2017 5:58 AM, Andreas Kersting wrote: >>> Hi, >>> >>> I would really like to have a way to split long string literals across >>> multiple lines in R. >> >> I don't understand why you require the string to be a literal. Why not >> construct the long string in an expression like >> >> paste0("aaa", >> "bbb") >> >> ? Surely the execution time of the paste0 call is negligible. >> >> Duncan Murdoch > > Actually "execution time" is precisely one of the reasons why I would like to see this feature as - depending on the context (e.g. in a tight loop) - the execution time of paste0 (or probably also glue, thanks Gabor) is not necessarily insignificant.You also need to consider implementation time. This is not just changes to R itself; trailing backslashes *are* used in some packages (e.g. geoparser), so those packages would need to be identified and modified and resubmitted to CRAN. Core changes to existing behaviour need really strong arguments, and I'm just not seeing those here. Duncan Murdoch> The other reason is style: I think it is cleaner if we can construct such a long string literal without the need for a function call. > > Andreas > >>> >>> Currently, if a string literal spans multiple lines, there is no way to >>> inhibit the introduction of newline characters: >>> >>> > "aaa >>> + bbb" >>> [1] "aaa\nbbb" >>> >>> >>> If a line ends with a backslash, it is just ignored: >>> >>> > "aaa\ >>> + bbb" >>> [1] "aaa\nbbb" >>> >>> >>> We could use this fact to implement string splitting in a fairly >>> backward-compatible way, since currently such trailing backslashes >>> should hardly be used as they do not have any effect. The attached patch >>> makes the parser ignore a newline character directly following a backslash: >>> >>> > "aaa\ >>> + bbb" >>> [1] "aaabbb" >>> >>> >>> I personally would also prefer if leading blanks (spaces and tabs) in >>> the second line are ignored to allow for proper indentation: >>> >>> > "aaa \ >>> + bbb" >>> [1] "aaa bbb" >>> >>> > "aaa\ >>> + \ bbb" >>> [1] "aaa bbb" >>> >>> This is also implemented by this patch. >>> >>> >>> An alternative approach could be to have something like >>> >>> ("aaa " >>> "bbb") >>> >>> or >>> >>> ("aaa ", >>> "bbb") >>> >>> be interpreted as "aaa bbb". >>> >>> I don't know the ins and outs of the parser of R (hence: please very >>> carefully review the attached patch), but I guess this would be more >>> work to implement!? >>> >>> >>> What do you think? Is there anybody else who is missing this feature in >>> the first place? >>> >>> Regards, >>> Andreas >>> >>> >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> > >
Andreas Kersting
2017-Jun-14  15:05 UTC
[Rd] [WISH / PATCH] possibility to split string literals across multiple lines
-------- Original Message -------- From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] Sent: Wednesday, Jun 14, 2017 1:36 PM GMT To: Andreas Kersting Cc: r-devel Subject: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines> On 14/06/2017 6:45 AM, Andreas Kersting wrote: >> On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch >> <murdoch.duncan at gmail.com> wrote: >> >>> On 14/06/2017 5:58 AM, Andreas Kersting wrote: >>>> Hi, >>>> >>>> I would really like to have a way to split long string literals across >>>> multiple lines in R. >>> >>> I don't understand why you require the string to be a literal. Why not >>> construct the long string in an expression like >>> >>> paste0("aaa", >>> "bbb") >>> >>> ? Surely the execution time of the paste0 call is negligible. >>> >>> Duncan Murdoch >> >> Actually "execution time" is precisely one of the reasons why I would >> like to see this feature as - depending on the context (e.g. in a >> tight loop) - the execution time of paste0 (or probably also glue, >> thanks Gabor) is not necessarily insignificant. > > You also need to consider implementation time. This is not just changes > to R itself; trailing backslashes *are* used in some packages (e.g. > geoparser), so those packages would need to be identified and modified > and resubmitted to CRAN.I am totally with you on this "runtime vs. implementation-time"-issue. That is why I proposed the patch as I did: It seemed to require only minor changes to base R and I didn't see how it could be incompatible with existing code. Actually I can still not see how a package could have potentially *used* backslashes immediately followed by newlines up to now, since those backslashes were just ignored by the parser (And changes to the function StringValue are just about the parser, aren't they?). Of course I cannot rule out the possibility that there is code like var <- "aaa\ bbb" around, but this would be based on the undocumented(?) features that "backslash newline" is a valid escape sequence and that it is treated as "newline". Maybe its a good idea to show some more examples how the patched parser behaves. There should only be difference to the current implementation if a string literal spans multiple lines and a line ends in an odd number of backslashes (see last example): > "aaa\\ + bbb" [1] "aaa\\\nbbb" > "aaa\\nbbb" [1] "aaa\\nbbb" > "aaa\\\nbbb" [1] "aaa\\\nbbb" > "aaa\\" [1] "aaa\\" > "aaa\\\n" [1] "aaa\\\n" > "aaa\\\\" [1] "aaa\\\\" > "aaa\\\\\n" [1] "aaa\\\\\n" > "aaa\\\\ + bbb" [1] "aaa\\\\\nbbb" > "aaa\\\ + bbb" [1] "aaa\\bbb" Andreas> Core changes to existing behaviour need really strong arguments, and I'm > just not seeing those here. > > Duncan Murdoch > >> The other reason is style: I think it is cleaner if we can construct >> such a long string literal without the need for a function call. >> >> Andreas >> >>>> >>>> Currently, if a string literal spans multiple lines, there is no way to >>>> inhibit the introduction of newline characters: >>>> >>>> > "aaa >>>> + bbb" >>>> [1] "aaa\nbbb" >>>> >>>> >>>> If a line ends with a backslash, it is just ignored: >>>> >>>> > "aaa\ >>>> + bbb" >>>> [1] "aaa\nbbb" >>>> >>>> >>>> We could use this fact to implement string splitting in a fairly >>>> backward-compatible way, since currently such trailing backslashes >>>> should hardly be used as they do not have any effect. The attached >>>> patch >>>> makes the parser ignore a newline character directly following a >>>> backslash: >>>> >>>> > "aaa\ >>>> + bbb" >>>> [1] "aaabbb" >>>> >>>> >>>> I personally would also prefer if leading blanks (spaces and tabs) in >>>> the second line are ignored to allow for proper indentation: >>>> >>>> > "aaa \ >>>> + bbb" >>>> [1] "aaa bbb" >>>> >>>> > "aaa\ >>>> + \ bbb" >>>> [1] "aaa bbb" >>>> >>>> This is also implemented by this patch. >>>> >>>> >>>> An alternative approach could be to have something like >>>> >>>> ("aaa " >>>> "bbb") >>>> >>>> or >>>> >>>> ("aaa ", >>>> "bbb") >>>> >>>> be interpreted as "aaa bbb". >>>> >>>> I don't know the ins and outs of the parser of R (hence: please very >>>> carefully review the attached patch), but I guess this would be more >>>> work to implement!? >>>> >>>> >>>> What do you think? Is there anybody else who is missing this feature in >>>> the first place? >>>> >>>> Regards, >>>> Andreas >>>> >>>> >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >> >> > >
Possibly Parallel Threads
- [WISH / PATCH] possibility to split string literals across multiple lines
- [WISH / PATCH] possibility to split string literals across multiple lines
- [WISH / PATCH] possibility to split string literals across multiple lines
- [WISH / PATCH] possibility to split string literals across multiple lines
- [WISH / PATCH] possibility to split string literals across multiple lines