Hello, I think I've found a bug in the C function do_grep located in src/main/grep.c. It seems to affect both the latest revisions of R-2-13-branch and trunk when compiling R without optimizations and with it's own version of pcre located in src/extra, at least on ubuntu 10.04. According to the pcre_exec API (I presume the later versions), the ovecsize argument must be a multiple of 3 , and the ovector argument must point to a location that can hold at least ovecsize integers. All the pcre_exec calls made by do_grep, save one, honors this. That one call seems to overwrite areas of the stack it shouldn't. Here's the smallest example I found that tickles the bug:> grep("[^[:blank][:cntrl]]","\\n",perl=TRUE)Error in grep("[^[:blank][:cntrl]]", "\\n", perl = TRUE) : negative length vectors are not allowed As described above, this error occurs on ubuntu 10.04 when R is compiled without optimizations ( I typically use CFLAGS="-ggdb" CXXFLAGS="-ggdb" FFLAGS="-ggdb" ./configure --enable-R-shlib), and the pcre_exec call executed from do_get overwrites the integer nmatches and sets it to -1. This has the effect of making do_grep try and allocate a results vector of length -1, which of course causes the error message above. I'd be interested to know if this bug happens on other platforms. Below is my simple fix for R-2-13-branch (a similar fix works for trunk as well). Jeff $ svn diff main/grep.c Index: main/grep.c ==================================================================--- main/grep.c (revision 57110) +++ main/grep.c (working copy) @@ -723,7 +723,7 @@ { SEXP pat, text, ind, ans; regex_t reg; - int i, j, n, nmatches = 0, ov, rc; + int i, j, n, nmatches = 0, ov[3], rc; int igcase_opt, value_opt, perl_opt, fixed_opt, useBytes, invert; const char *spat = NULL; pcre *re_pcre = NULL /* -Wall */; @@ -882,7 +882,7 @@ if (fixed_opt) LOGICAL(ind)[i] = fgrep_one(spat, s, useBytes, use_UTF8, NULL) >= 0; else if (perl_opt) { - if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, &ov, 0) >= 0) + if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, ov, 3) >= 0) INTEGER(ind)[i] = 1; } else { if (!use_WC)
On Thu, Sep 29, 2011 at 4:00 PM, Jeffrey Horner <jeffrey.horner at gmail.com> wrote:> Hello, > > I think I've found a bug in the C function do_grep located in > src/main/grep.c. It seems to affect both the latest revisions of > R-2-13-branch and trunk when compiling R without optimizations and > with it's own version of pcre located in src/extra, at least on ubuntu > 10.04. > > ?According to the pcre_exec API (I presume the later versions), the > ovecsize argument must be a multiple of 3 , and the ovector argument > must point to a location that can hold at least ovecsize integers. All > the pcre_exec calls made by do_grep, save one, honors this. That one > call seems to overwrite areas of the stack it shouldn't. Here's the > smallest example I found that tickles the bug: > >> grep("[^[:blank][:cntrl]]","\\n",perl=TRUE) > Error in grep("[^[:blank][:cntrl]]", "\\n", perl = TRUE) : > ?negative length vectors are not allowedAs many of you know, that regex is invalid. It's just the one I happened upon that tickled the bug. It actually came from an error that occurred when building R itself. Here's a snippet of my make log: make[1]: Leaving directory `/home/hornerj/R-sources/branches/R-2-13-branch/po' you should 'make docs' now ... make[1]: Entering directory `/home/hornerj/R-sources/branches/R-2-13-branch/doc' Error in grep("[^[:blank:][:cntrl:]]", unlist(Rd[sections == "TEXT"]), : negative length vectors are not allowed Calls: saveRDS -> <Anonymous> -> prepare2_Rd -> grep Execution halted make[1]: *** [NEWS.rds] Error 1> > As described above, this error occurs on ubuntu 10.04 when R is > compiled without optimizations ( I typically use CFLAGS="-ggdb" > CXXFLAGS="-ggdb" FFLAGS="-ggdb" ./configure --enable-R-shlib), and the > pcre_exec call executed from do_get overwrites the integer nmatches > and sets it to -1. This has the effect of making do_grep try and > allocate a results vector of length -1, which of course causes the > error message above. > > I'd be interested to know if this bug happens on other platforms. > > Below is my simple fix for R-2-13-branch (a similar fix works for > trunk as well). > > Jeff > > $ svn diff main/grep.c > Index: main/grep.c > ==================================================================> --- main/grep.c (revision 57110) > +++ main/grep.c (working copy) > @@ -723,7 +723,7 @@ > ?{ > ? ? SEXP pat, text, ind, ans; > ? ? regex_t reg; > - ? ?int i, j, n, nmatches = 0, ov, rc; > + ? ?int i, j, n, nmatches = 0, ov[3], rc; > ? ? int igcase_opt, value_opt, perl_opt, fixed_opt, useBytes, invert; > ? ? const char *spat = NULL; > ? ? pcre *re_pcre = NULL /* -Wall */; > @@ -882,7 +882,7 @@ > ? ? ? ? ? ?if (fixed_opt) > ? ? ? ? ? ? ? ?LOGICAL(ind)[i] = fgrep_one(spat, s, useBytes, use_UTF8, NULL) >= 0; > ? ? ? ? ? ?else if (perl_opt) { > - ? ? ? ? ? ? ? if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, &ov, 0) >= 0) > + ? ? ? ? ? ? ? if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, ov, 3) >= 0) > ? ? ? ? ? ? ? ? ? ?INTEGER(ind)[i] = 1; > ? ? ? ? ? ?} else { > ? ? ? ? ? ? ? ?if (!use_WC) >
On Thu, Sep 29, 2011 at 2:00 PM, Jeffrey Horner <jeffrey.horner at gmail.com> wrote:> Hello, > > I think I've found a bug in the C function do_grep located in > src/main/grep.c. It seems to affect both the latest revisions of > R-2-13-branch and trunk when compiling R without optimizations and > with it's own version of pcre located in src/extra, at least on ubuntu > 10.04. > > ?According to the pcre_exec API (I presume the later versions), the > ovecsize argument must be a multiple of 3 , and the ovector argument > must point to a location that can hold at least ovecsize integers. All > the pcre_exec calls made by do_grep, save one, honors this. That one > call seems to overwrite areas of the stack it shouldn't. Here's the > smallest example I found that tickles the bug: > >> grep("[^[:blank][:cntrl]]","\\n",perl=TRUE) > Error in grep("[^[:blank][:cntrl]]", "\\n", perl = TRUE) : > ?negative length vectors are not allowed > > As described above, this error occurs on ubuntu 10.04 when R is > compiled without optimizations ( I typically use CFLAGS="-ggdb" > CXXFLAGS="-ggdb" FFLAGS="-ggdb" ./configure --enable-R-shlib), and the > pcre_exec call executed from do_get overwrites the integer nmatches > and sets it to -1. This has the effect of making do_grep try and > allocate a results vector of length -1, which of course causes the > error message above. > > I'd be interested to know if this bug happens on other platforms.With R devel (2011-09-28 r57099) and R v2.13.1 patched (2011-09-05 r56953) on Windows 7 64-bit you get:> grep("[^[:blank][:cntrl]]","\\n",perl=TRUE)integer(0) /Henrik> > Below is my simple fix for R-2-13-branch (a similar fix works for > trunk as well). > > Jeff > > $ svn diff main/grep.c > Index: main/grep.c > ==================================================================> --- main/grep.c (revision 57110) > +++ main/grep.c (working copy) > @@ -723,7 +723,7 @@ > ?{ > ? ? SEXP pat, text, ind, ans; > ? ? regex_t reg; > - ? ?int i, j, n, nmatches = 0, ov, rc; > + ? ?int i, j, n, nmatches = 0, ov[3], rc; > ? ? int igcase_opt, value_opt, perl_opt, fixed_opt, useBytes, invert; > ? ? const char *spat = NULL; > ? ? pcre *re_pcre = NULL /* -Wall */; > @@ -882,7 +882,7 @@ > ? ? ? ? ? ?if (fixed_opt) > ? ? ? ? ? ? ? ?LOGICAL(ind)[i] = fgrep_one(spat, s, useBytes, use_UTF8, NULL) >= 0; > ? ? ? ? ? ?else if (perl_opt) { > - ? ? ? ? ? ? ? if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, &ov, 0) >= 0) > + ? ? ? ? ? ? ? if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, ov, 3) >= 0) > ? ? ? ? ? ? ? ? ? ?INTEGER(ind)[i] = 1; > ? ? ? ? ? ?} else { > ? ? ? ? ? ? ? ?if (!use_WC) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Jeff, this is really a bug in PCRE since the length (0) is a multiple of 3 as documented so PCRE should not be writing anything. Anyway, this has been now fixed (by Brian). Cheers, Simon On Sep 29, 2011, at 5:00 PM, Jeffrey Horner wrote:> Hello, > > I think I've found a bug in the C function do_grep located in > src/main/grep.c. It seems to affect both the latest revisions of > R-2-13-branch and trunk when compiling R without optimizations and > with it's own version of pcre located in src/extra, at least on ubuntu > 10.04. > > According to the pcre_exec API (I presume the later versions), the > ovecsize argument must be a multiple of 3 , and the ovector argument > must point to a location that can hold at least ovecsize integers. All > the pcre_exec calls made by do_grep, save one, honors this. That one > call seems to overwrite areas of the stack it shouldn't. Here's the > smallest example I found that tickles the bug: > >> grep("[^[:blank][:cntrl]]","\\n",perl=TRUE) > Error in grep("[^[:blank][:cntrl]]", "\\n", perl = TRUE) : > negative length vectors are not allowed > > As described above, this error occurs on ubuntu 10.04 when R is > compiled without optimizations ( I typically use CFLAGS="-ggdb" > CXXFLAGS="-ggdb" FFLAGS="-ggdb" ./configure --enable-R-shlib), and the > pcre_exec call executed from do_get overwrites the integer nmatches > and sets it to -1. This has the effect of making do_grep try and > allocate a results vector of length -1, which of course causes the > error message above. > > I'd be interested to know if this bug happens on other platforms. > > Below is my simple fix for R-2-13-branch (a similar fix works for > trunk as well). > > Jeff > > $ svn diff main/grep.c > Index: main/grep.c > ==================================================================> --- main/grep.c (revision 57110) > +++ main/grep.c (working copy) > @@ -723,7 +723,7 @@ > { > SEXP pat, text, ind, ans; > regex_t reg; > - int i, j, n, nmatches = 0, ov, rc; > + int i, j, n, nmatches = 0, ov[3], rc; > int igcase_opt, value_opt, perl_opt, fixed_opt, useBytes, invert; > const char *spat = NULL; > pcre *re_pcre = NULL /* -Wall */; > @@ -882,7 +882,7 @@ > if (fixed_opt) > LOGICAL(ind)[i] = fgrep_one(spat, s, useBytes, use_UTF8, NULL) >= 0; > else if (perl_opt) { > - if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, &ov, 0) >= 0) > + if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, ov, 3) >= 0) > INTEGER(ind)[i] = 1; > } else { > if (!use_WC) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >