William Dunlap
2009-Apr-13 18:56 UTC
[Rd] should sub(perl=TRUE) also handle \E in replacement, to complement \U and \L?
Currently sub(perl=TRUE) allows you to specify \U and \L in the replacement argument so that the rest of the subpatterns in the line (the \\<digit> things) will be converted to upper or lower case, respectively. perl also also has a \E operator to end these case conversions for the rest of the subpatterns (so they retain whatever case they had in the original text). For symmetry's sake I think it would be nice if R supported that also. E.g., to capitalize the first and last letters of every word, leaving the case of the interior letters alone, could be done with:> gsub("(\\w)(\\w*)(\\w)", "\\U\\1\\E\\2\\U\\3", "useRs may fly into JFKor laGuardia", perl=TRUE) [1] "UseRS MaY FlY IntO JFK OR LaGuardiA"> sub("(\\w)(\\w*)(\\w)", "\\U\\1\\E\\2\\U\\3", "useRs may fly into JFKor laGuardia", perl=TRUE) [1] "UseRS may fly into JFK or laGuardia" A question regarding this came up in r-help today. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com Index: src/library/base/man/grep.Rd ==================================================================--- src/library/base/man/grep.Rd (revision 48319) +++ src/library/base/man/grep.Rd (working copy) @@ -73,7 +73,7 @@ \code{"\\9"} to parenthesized subexpressions of \code{pattern}. For \code{perl = TRUE} only, it can also contain \code{"\\U"} or \code{"\\L"} to convert the rest of the replacement to upper or - lower case. + lower case, or \code{"\\E"} to end such case conversion. } } \details{ Index: src/main/pcre.c ==================================================================--- src/main/pcre.c (revision 48319) +++ src/main/pcre.c (working copy) @@ -90,6 +90,9 @@ } else if (p[1] == 'L') { p++; n -= 2; upper = FALSE; lower = TRUE; + } else if (p[1] == 'E') { /* end case modification */ + p++; n -= 2; + upper = FALSE; lower = FALSE; } else if (p[1] == 0) { /* can't escape the final '\0' */ n--; @@ -168,6 +171,9 @@ } else if (p[1] == 'L') { p += 2; upper = FALSE; lower = TRUE; + } else if (p[1] == 'E') { /* end case modification */ + p += 2; + upper = FALSE; lower = FALSE; } else if (p[1] == 0) { p += 1; } else { -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pcre.diff.txt URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20090413/db64170b/attachment.txt>
Martin Maechler
2009-Apr-14 15:51 UTC
[Rd] should sub(perl=TRUE) also handle \E in replacement, to complement \U and \L?
>>>>> "WD" == William Dunlap <wdunlap at tibco.com> >>>>> on Mon, 13 Apr 2009 11:56:51 -0700 writes:WD> Currently sub(perl=TRUE) allows you to specify \U and \L WD> in the replacement argument so that the rest of the WD> subpatterns in the line (the \\<digit> things) will be WD> converted to upper or lower case, respectively. perl WD> also also has a \E operator to end these case WD> conversions for the rest of the subpatterns (so they WD> retain whatever case they had in the original text). WD> For symmetry's sake I think it would be nice if R WD> supported that also. E.g., to capitalize the first and WD> last letters of every word, leaving the case of the WD> interior letters alone, could be done with: >> gsub("(\\w)(\\w*)(\\w)", "\\U\\1\\E\\2\\U\\3", "useRs may >> fly into JFK WD> or laGuardia", perl=TRUE) [1] "UseRS MaY FlY IntO JFK OR WD> LaGuardiA" >> sub("(\\w)(\\w*)(\\w)", "\\U\\1\\E\\2\\U\\3", "useRs may >> fly into JFK WD> or laGuardia", perl=TRUE) [1] "UseRS may fly into JFK or WD> laGuardia" WD> A question regarding this came up in r-help today. WD> Bill Dunlap TIBCO Software Inc - Spotfire Division WD> wdunlap tibco.com Thanks a lot, Bill, for your patch! I have applied and committed it (after testing) to R-devel [rev. 48321]. Best regards, Martin Maechler, ETH Zurich