christian.buchta at wu-wien.ac.at
2008-Mar-17 20:55 UTC
[Rd] Inconsistency in gsub in R.2.6.2 (PR#10978)
Hi, May this be an oversight? R version 2.6.2 Patched (2008-03-13 r44783) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 ... > x <- "ab?" > Encoding(x) [1] "latin1" > Encoding(gsub("?","", x)) [1] "unknown" > Encoding(gsub("?","", x, perl = TRUE)) [1] "latin1" The code in src/main/pcre.c (see also do_tolower and do_strsplit in src/main/character.c) suggests to patch as attached. > x <- "ab?" > Encoding(gsub("?","", x)) [1] "latin1" Happy Easter Christian -- Christian Buchta -> Institute for Tourism and Leisure Studies -> Vienna University of Economics and Business Administration -> Vienna -> Austria -> Europe. Visit us on http://www.wu-wien.ac.at/itf/.
ripley at stats.ox.ac.uk
2008-Mar-18 06:50 UTC
[Rd] Inconsistency in gsub in R.2.6.2 (PR#10978)
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --27464147-1221975610-1205822844=:9482 Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8BIT This has already been corrected in R-devel. It was wrong to set the encoding to that of the element of 'x': gsub will have changed it (to native or UTF-8). On Mon, 17 Mar 2008, christian.buchta at wu-wien.ac.at wrote:> This is a multi-part message in MIME format. > --------------040104050805010601010607 > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > Content-Transfer-Encoding: 8bit > > > Hi, > > May this be an oversight? > > R version 2.6.2 Patched (2008-03-13 r44783) > Copyright (C) 2008 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > > ... > > > x <- "ab?" > > Encoding(x) > [1] "latin1" > > Encoding(gsub("?","", x)) > [1] "unknown" > > Encoding(gsub("?","", x, perl = TRUE)) > [1] "latin1" > > The code in src/main/pcre.c (see also do_tolower and do_strsplit in > src/main/character.c) suggests to patch as attached. > > > x <- "ab?" > > Encoding(gsub("?","", x)) > [1] "latin1" > > > Happy Easter > > Christian > > -- > Christian Buchta -> Institute for Tourism and Leisure Studies -> > Vienna University of Economics and Business Administration -> Vienna > -> Austria -> Europe. Visit us on http://www.wu-wien.ac.at/itf/. > > > --------------040104050805010601010607 > Content-Type: text/plain; > name="patch_44783" > Content-Transfer-Encoding: 7bit > Content-Disposition: inline; > filename="patch_44783" > > Index: src/main/character.c > ==================================================================> --- src/main/character.c (revision 44783) > +++ src/main/character.c (working copy) > @@ -1281,7 +1281,7 @@ > strcat(u, t); > } while(global && (st = fgrep_one_bytes(spat, s, useBytes)) >= 0); > strcat(u, s); > - SET_STRING_ELT(ans, i, mkChar(cbuf)); > + SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i))); > Free(cbuf); > } > } else { > @@ -1337,7 +1337,7 @@ > for (j = offset ; s[j] ; j++) > *u++ = s[j]; > *u = '\0'; > - SET_STRING_ELT(ans, i, mkChar(cbuf)); > + SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i))); > Free(cbuf); > } > } > > --------------040104050805010601010607-- > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 --27464147-1221975610-1205822844=:9482--