Dan Tenenbaum
2015-Jan-07 23:25 UTC
[Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel
The following code: res <- gsub("(*UCP)\\b(i)\\b", "", "nhgrimelanomaclass", perl = TRUE) results in: Error in gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", "nhgrimelanomaclass", : invalid regular expression '(*UCP)\b(i)\b' In addition: Warning message: In gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", "nhgrimelanomaclass", : PCRE pattern compilation error 'this version of PCRE is not compiled with Unicode property support' at '(*UCP)\b(i)\b' on R Under development (unstable) (2015-01-01 r67290) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.9.5 (Mavericks) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base And also on the same version of R-devel on Snow Leopard, Windows, and Linux. But it does not produce an error on R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base Dan
Prof Brian Ripley
2015-Jan-08 12:06 UTC
[Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel
Why are you reporting that your PCRE library does not have something which the R-admin manual says it should preferably have? To wit, footnote 37 says 'and not PCRE2, which started at version 10.0. PCRE must be built with UTF-8 support (not the default) and support for Unicode properties is assumed by some R packages. Neither are tested by configure. JIT support is desirable.' That certainly does not fail on my Linux, Windows and OS X builds of R-devel. (Issues about pre-built binaries, if that is what you used, should be reported to their maintainers, not here.) And the help does say in ?regex In UTF-8 mode, some Unicode properties may be supported via ?\p{xx}? and ?\P{xx}? which match characters with and without property ?xx? respectively. Note the 'may'. On 07/01/2015 23:25, Dan Tenenbaum wrote:> The following code: > > res <- gsub("(*UCP)\\b(i)\\b", > "", "nhgrimelanomaclass", perl = TRUE) > > results in: > > Error in gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", "nhgrimelanomaclass", : > invalid regular expression '(*UCP)\b(i)\b' > In addition: Warning message: > In gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", "nhgrimelanomaclass", : > PCRE pattern compilation error > 'this version of PCRE is not compiled with Unicode property support' > at '(*UCP)\b(i)\b' > > on > > R Under development (unstable) (2015-01-01 r67290) > Platform: x86_64-apple-darwin13.4.0 (64-bit) > Running under: OS X 10.9.5 (Mavericks) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > And also on the same version of R-devel on Snow Leopard, Windows, and Linux. But it does not produce an error on > > R version 3.1.2 (2014-10-31) > Platform: x86_64-apple-darwin13.4.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > Dan > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK
Kasper Daniel Hansen
2015-Jan-08 14:57 UTC
[Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel
Dan, for OS X, there is a new pcre library posted at http://r.research.att.com/libs/ with a date stamp of Dec 28. This fixes this problem. You can test for this by running make check post compilation. It'll bang out with a failure if this is not in order. (And I know that all of this is described in R-admin). It would be helpful (time saving) if a message is posted to r-sig-mac whenever a new (version of a) library is added to http://r.research.att.com/libs/ I know it is adding more work to the helpful people who are doing all the heavy lifting. Kasper On Thu, Jan 8, 2015 at 7:06 AM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> Why are you reporting that your PCRE library does not have something which > the R-admin manual says it should preferably have? To wit, footnote 37 says > > 'and not PCRE2, which started at version 10.0. PCRE must be built with > UTF-8 support (not the default) and support for Unicode properties is > assumed by some R packages. Neither are tested by configure. JIT support is > desirable.' > > That certainly does not fail on my Linux, Windows and OS X builds of > R-devel. (Issues about pre-built binaries, if that is what you used, > should be reported to their maintainers, not here.) > > And the help does say in ?regex > > In UTF-8 mode, some Unicode properties may be supported via > ?\p{xx}? and ?\P{xx}? which match characters with and without > property ?xx? respectively. > > Note the 'may'. > > > > > > On 07/01/2015 23:25, Dan Tenenbaum wrote: > >> The following code: >> >> res <- gsub("(*UCP)\\b(i)\\b", >> "", "nhgrimelanomaclass", perl = TRUE) >> >> results in: >> >> Error in gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", >> "nhgrimelanomaclass", : >> invalid regular expression '(*UCP)\b(i)\b' >> In addition: Warning message: >> In gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", "nhgrimelanomaclass", : >> PCRE pattern compilation error >> 'this version of PCRE is not compiled with Unicode property >> support' >> at '(*UCP)\b(i)\b' >> >> on >> >> R Under development (unstable) (2015-01-01 r67290) >> Platform: x86_64-apple-darwin13.4.0 (64-bit) >> Running under: OS X 10.9.5 (Mavericks) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> And also on the same version of R-devel on Snow Leopard, Windows, and >> Linux. But it does not produce an error on >> >> R version 3.1.2 (2014-10-31) >> Platform: x86_64-apple-darwin13.4.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> Dan >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Emeritus Professor of Applied Statistics, University of Oxford > 1 South Parks Road, Oxford OX1 3TG, UK > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Reasonably Related Threads
- gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel
- Bug in perl=TRUE regexp matching?
- Invalid UTF-8 with gsub(perl=TRUE) and iconv(sub="")
- R CMD build looking for texi2dvi in the wrong place (R-devel)
- Bug in perl=TRUE regexp matching?