Hi all I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0. In R 2.3.0, this is what happens:> gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)[[1]] [1] 1 3 5 7 9 attr(,"match.length") [1] 5 5 5 5 5 ... while in R 2.4.0, this is what happens:> gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)[[1]] [1] 1 7 attr(,"match.length") [1] 5 5 Looking at the archives, I came across these sites where the reverse issue has been discussed before: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75843.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76815.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75846.html>From there, it seems as if the first result has been considered undesirable (apparently because it differs from Perl's output if not also for other reasons) and R. Gentleman wrote that "[t]his has been reverted in R-devel, so you should get the old behavior in it." However,(i) I could not find any announcement of that change in the change log (the news file at <https://svn.r-project.org/R/trunk/NEWS> or at <http://cran.r-project.org/src/base/NEWS>) so I am still not sure whether this change of behavior is in fact due to changes by the R Development Core Team or not. So, first question: is this change intended or not? (My system has not changed otherwise.) (ii) Since for some applications of mine the first behavior above was exactly what I needed, I now have the same (second) question as Thomas Girke before: is there a way to get the first of the two results now in R 2.4.0 (on a Windows XP machine)? Thanks a lot, STG
You can get that by using zero width lookahead assertions. They must match but are not consuming so the next match will not be forced to start past them. See ?regex and http://www.regular-expressions.info/lookaround.html for more. gregexpr(" [a-z](?= [a-z] )", " a b c d e f ", perl = TRUE) On 10/6/06, Stefan Th. Gries <stgries_lists at arcor.de> wrote:> Hi all > > I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0. > > In R 2.3.0, this is what happens: > > > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) > [[1]] > [1] 1 3 5 7 9 > attr(,"match.length") > [1] 5 5 5 5 5 > > > ... while in R 2.4.0, this is what happens: > > > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) > [[1]] > [1] 1 7 > attr(,"match.length") > [1] 5 5 > > > > Looking at the archives, I came across these sites where the reverse issue has been discussed before: > > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75843.html > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76815.html > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75846.html > > >From there, it seems as if the first result has been considered undesirable (apparently because it differs from Perl's output if not also for other reasons) and R. Gentleman wrote that "[t]his has been reverted in R-devel, so you should get the old behavior in it." However, > > (i) I could not find any announcement of that change in the change log (the news file at <https://svn.r-project.org/R/trunk/NEWS> or at <http://cran.r-project.org/src/base/NEWS>) so I am still not sure whether this change of behavior is in fact due to changes by the R Development Core Team or not. So, first question: is this change intended or not? (My system has not changed otherwise.) > > (ii) Since for some applications of mine the first behavior above was exactly what I needed, I now have the same (second) question as Thomas Girke before: is there a way to get the first of the two results now in R 2.4.0 (on a Windows XP machine)? > > Thanks a lot, > STG > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 10/6/2006 10:00 PM, Stefan Th. Gries wrote:> Hi all > > I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0. > > In R 2.3.0, this is what happens: > >> gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) > [[1]] > [1] 1 3 5 7 9 > attr(,"match.length") > [1] 5 5 5 5 5 > > > ... while in R 2.4.0, this is what happens: > >> gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) > [[1]] > [1] 1 7 > attr(,"match.length") > [1] 5 5 > > > > Looking at the archives, I came across these sites where the reverse issue has been discussed before: > > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75843.html > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76815.html > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75846.html > >>From there, it seems as if the first result has been considered undesirable (apparently because it differs from Perl's output if not also for other reasons) and R. Gentleman wrote that "[t]his has been reverted in R-devel, so you should get the old behavior in it." However, > > (i) I could not find any announcement of that change in the change log (the news file at <https://svn.r-project.org/R/trunk/NEWS> or at <http://cran.r-project.org/src/base/NEWS>) so I am still not sure whether this change of behavior is in fact due to changes by the R Development Core Team or not. So, first question: is this change intended or not? (My system has not changed otherwise.)If you really want to be sure to see where a change occurred, you should look in the Subversion log (on developer.r-project.org). I think the changes here were likely made in revisions 37228 on February 1 2006 and 38145 on May 20 2006. Both were made to the trunk, but in the case of the first one, that was 2.3.0, and in the second it was 2.4.0. If these changed behaviour there should have been an entry in the NEWS file, but apparently that was overlooked.> > (ii) Since for some applications of mine the first behavior above was exactly what I needed, I now have the same (second) question as Thomas Girke before: is there a way to get the first of the two results now in R 2.4.0 (on a Windows XP machine)?I don't know. Duncan Murdoch