The interpretation of regular expressions with repetition quantifiers in the 'gregexpr' function seems to have changed between R Version 2.2.0 and 2.3.0. The 'gsub' function, however, gives the same results in R Versions 2.2.0 and 2.3.0. Below is an example that demonstrates the version differences of the 'gregexpr' function. I am not sure whether this new behavior is an intended change or represents a bug. Personally, I liked the old behavior of this function more useful, since it is consistent with the Perl regular expressions. Here are my questions: (1) Is there a possibility to obtain from 'gregexpr' the old output of R version 2.2.0 when using regular expressions with repetition quantifiers. (2) How can one be informed about regular expression changes and the associated functions in new versions of R? Here is the example code to demonstrate the version difference of the 'gregexpr' function between versions 2.3.0 and 2.2.0: # Example string x <- "xaaaaxaaaax" # gregexpr in Version 2.2.0 (2005-10-06 r35749) gregexpr("[a]{1,}", as.character(x), perl=T) [[1]] [1] 2 7 attr(,"match.length") [1] 4 4 # gregexpr in Version 2.3.0 (2006-04-24) gregexpr("[a]{1,}", as.character(x), perl=T) [[1]] [1] 2 3 4 5 7 8 9 10 attr(,"match.length") [1] 4 3 2 1 4 3 2 1 # gsub gives expected output in Versions 2.2.0 & 2.3.0 gsub("[a]{1,}", "_", as.character(x), perl=T) [1] "x_x_x" Thanks in advance for your help. Thomas -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Ph: 951-827-2469 Fax: 951-827-4437
Gabor Grothendieck
2006-May-06 18:21 UTC
[R] regular expression change in R version 2.3.0?
If you need a workaround this works the same in R 2.2.1 and R 2.3.0 patched: x <- "xaaaaxaaaax" gregexpr("[^a]a+", paste("", x)) On 5/6/06, Thomas Girke <thomas.girke at ucr.edu> wrote:> The interpretation of regular expressions with repetition > quantifiers in the 'gregexpr' function seems to have changed > between R Version 2.2.0 and 2.3.0. The 'gsub' function, however, > gives the same results in R Versions 2.2.0 and 2.3.0. Below is > an example that demonstrates the version differences of the > 'gregexpr' function. I am not sure whether this new behavior > is an intended change or represents a bug. Personally, I liked > the old behavior of this function more useful, since it is > consistent with the Perl regular expressions. > > Here are my questions: > (1) Is there a possibility to obtain from 'gregexpr' > the old output of R version 2.2.0 when using regular > expressions with repetition quantifiers. > > (2) How can one be informed about regular expression changes > and the associated functions in new versions of R? > > > Here is the example code to demonstrate the version difference > of the 'gregexpr' function between versions 2.3.0 and 2.2.0: > > # Example string > x <- "xaaaaxaaaax" > > # gregexpr in Version 2.2.0 (2005-10-06 r35749) > gregexpr("[a]{1,}", as.character(x), perl=T) > [[1]] > [1] 2 7 > attr(,"match.length") > [1] 4 4 > > # gregexpr in Version 2.3.0 (2006-04-24) > gregexpr("[a]{1,}", as.character(x), perl=T) > [[1]] > [1] 2 3 4 5 7 8 9 10 > attr(,"match.length") > [1] 4 3 2 1 4 3 2 1 > > # gsub gives expected output in Versions 2.2.0 & 2.3.0 > gsub("[a]{1,}", "_", as.character(x), perl=T) > [1] "x_x_x" > > > Thanks in advance for your help. > > Thomas > > -- > Thomas Girke, Ph.D. > 1008 Noel T. Keen Hall > Center for Plant Cell Biology (CEPCEB) > University of California > Riverside, CA 92521 > > E-mail: thomas.girke at ucr.edu > Ph: 951-827-2469 > Fax: 951-827-4437 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Hi, This has been reverted in R-devel, so you should get the old behavior in it. Gabor Grothendieck posted a work around for R 2.3.x. best wishes Robert Thomas Girke wrote:> The interpretation of regular expressions with repetition > quantifiers in the 'gregexpr' function seems to have changed > between R Version 2.2.0 and 2.3.0. The 'gsub' function, however, > gives the same results in R Versions 2.2.0 and 2.3.0. Below is > an example that demonstrates the version differences of the > 'gregexpr' function. I am not sure whether this new behavior > is an intended change or represents a bug. Personally, I liked > the old behavior of this function more useful, since it is > consistent with the Perl regular expressions. > > Here are my questions: > (1) Is there a possibility to obtain from 'gregexpr' > the old output of R version 2.2.0 when using regular > expressions with repetition quantifiers. > > (2) How can one be informed about regular expression changes > and the associated functions in new versions of R? > > > Here is the example code to demonstrate the version difference > of the 'gregexpr' function between versions 2.3.0 and 2.2.0: > > # Example string > x <- "xaaaaxaaaax" > > # gregexpr in Version 2.2.0 (2005-10-06 r35749) > gregexpr("[a]{1,}", as.character(x), perl=T) > [[1]] > [1] 2 7 > attr(,"match.length") > [1] 4 4 > > # gregexpr in Version 2.3.0 (2006-04-24) > gregexpr("[a]{1,}", as.character(x), perl=T) > [[1]] > [1] 2 3 4 5 7 8 9 10 > attr(,"match.length") > [1] 4 3 2 1 4 3 2 1 > > # gsub gives expected output in Versions 2.2.0 & 2.3.0 > gsub("[a]{1,}", "_", as.character(x), perl=T) > [1] "x_x_x" > > > Thanks in advance for your help. > > Thomas >-- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org