Prof Brian Ripley
2003-Oct-24 08:45 UTC
[Rd] Versions of PCRE, documenting what grep etc do.
A couple of weeks back there was some discussion about documenting the regular expressions as used in R. Several years ago the problem was that this was OS-dependent, and to plug that problem we incorporated regexp code from a version of GNU grep, later updated to grep-2.4.2 in R 1.2.0. I have been looking at documenting what grep(perl=TRUE) does, and we have a similar problem in that the current PCRE, 4.4, implements rather more of Perl's regexps than 3.9 (which is in 1.8.0 if the OS does not supply it, and RH8.0 has PCRE 3.9. Whichever version of Debian is on franz has PCRE 3.4). I could add a configure check for PCRE >= 4.0, and I think probably should do that. However, my inclination is to always use the version of PCRE in the R sources and thereby ensure that all builds of R have the same version, the one I will document. Comments, please. For PCRE 4.4 there is a long man page that I will use as a basis for the documentation. I am inclined just to include either a text or PDF version of the man page -- any preferences for which form? For the non-Perl regexps it is harder, as I am unsure exactly what patterns the GNU regex we have accepts. (From a problem which occurred with some Sweave regexps, I think it accepts more than it is intended to.) One fairly good docu source is the GNU grep man page: does anyone know a better one? I had thought of writing a regexp.Rd help page to which grep.Rd could refer. None of this is imminent (I am too busy) but is intended for the next minor release (which may be called 1.9.0 or 2.0.0, I gather). Brian -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dirk Eddelbuettel
2003-Oct-24 14:26 UTC
[Rd] Versions of PCRE, documenting what grep etc do.
On Fri, Oct 24, 2003 at 07:46:41AM +0100, Prof Brian Ripley wrote:> I have been looking at documenting what grep(perl=TRUE) does, and we > have a similar problem in that the current PCRE, 4.4, implements rather > more of Perl's regexps than 3.9 (which is in 1.8.0 if the OS does not > supply it, and RH8.0 has PCRE 3.9. Whichever version of Debian is on franz > has PCRE 3.4).FWIW the current line of R (>= 1.8.0) in Debian unstable has Depends: [....] libpcre3 (>= 4.0) [...] by virtue of the fact that the pcre libraries in Debian unstable are currently at version 4.3. Dirk -- Those are my principles, and if you don't like them... well, I have others. -- Groucho Marx
>>>>> Prof Brian Ripley writes:> A couple of weeks back there was some discussion about documenting the > regular expressions as used in R. Several years ago the problem was > that this was OS-dependent, and to plug that problem we incorporated > regexp code from a version of GNU grep, later updated to grep-2.4.2 in > R 1.2.0.> I have been looking at documenting what grep(perl=TRUE) does, and we > have a similar problem in that the current PCRE, 4.4, implements > rather more of Perl's regexps than 3.9 (which is in 1.8.0 if the OS > does not supply it, and RH8.0 has PCRE 3.9. Whichever version of > Debian is on franz has PCRE 3.4).> I could add a configure check for PCRE >= 4.0, and I think probably > should do that. However, my inclination is to always use the version > of PCRE in the R sources and thereby ensure that all builds of R have > the same version, the one I will document. Comments, please.I think we should in any case allow maintainers of binary packages on platforms with advanced package management systems to force the use of shared libraries the system can provide. (So the binary maintainers would need to verify that the system package provides the right libs and headers.) Not sure about the default: we typically try to use available system resources, unless this is bound to cause problems, and regex was of the latter type, afaicr.> For PCRE 4.4 there is a long man page that I will use as a basis for > the documentation. I am inclined just to include either a text or PDF > version of the man page -- any preferences for which form?Depends on where you would put the docs, I think. Btw, where can 4.4 be found?> For the non-Perl regexps it is harder, as I am unsure exactly what > patterns the GNU regex we have accepts. (From a problem which > occurred with some Sweave regexps, I think it accepts more than it is > intended to.) One fairly good docu source is the GNU grep man page: > does anyone know a better one? I had thought of writing a regexp.Rd > help page to which grep.Rd could refer.That would be great. Linux has a regex(7) purported to be "taken from Henry Spencer's regex package", which might be used as a start. The old GNU regex .tar.gz has a texinfo file, but does not help for what we need, I think. [I recently looked for available regexp docs, but was not too successful.]> None of this is imminent (I am too busy) but is intended for the next > minor release (which may be called 1.9.0 or 2.0.0, I gather).Too bad :-( Best -k
Seemingly Similar Threads
- [PATCH 0/4] Replace some uses of the Str module with PCRE.
- pcre vs. regexp for Postfix checks
- [PATCH v2 00/18] Replace many more uses of the Str module with PCRE.
- [PATCH v2 3/3] daemon: Restore PCRE regular expressions in OCaml code.
- [PATCH] lib: Add COMPILE_REGEXP macro to hide regexp constructors/destructors.