Hi there. I'm working with some utf-8 incoded csv files which gives me data frames with utf-8 encoded headers. This means when I write things like dat$proporci?n in an R script and then source it, I have to make sure the R script is incoded using utf-8 (and not latin1) and then I also have to explicitly tell R that the encoding is utf-8 every time I source the file, that is, I need to type source("sr.R", encoding="utf-8"). Sure, I could eliminate accents and so forth from the headers by renaming the data frame columns, and I have done and do do this, but I shouldn't be required to do this just to avoid encoding issues. We're living in the 21st century and imho Unicode-based encodings should be the de facto standard these days. I'm aware that R is pretty clever and stores the encoding along with the string value in all character objects and then converts on the fly as necessary. However, Almost everything I work with is in utf-8 or ASCII (which is compatible with utf-8 anyway), so I'd like R to behave as though it does everything natively in utf-8 so I don't have to worry about it. Is there something in Rprofile.site or the user Rprofile or an environment variable I can set or some other way to instruct R to always assume that input stream encodings will be utf-8 unless otherwise specified? This way, I would only ever have to supply an encoding or fileEncoding argument to specify "latin1" if I happ en to encounter it. Many thanks, Andrew.
Hi Andrew, If you look at ?source you see its default value for encoding is picked up from getOption("encoding"). Couldn't you just set this option in your Rprofile? HTH, Eric On Wed, Sep 21, 2022 at 5:34 PM Andrew Hart via R-help <r-help at r-project.org> wrote:> > Hi there. I'm working with some utf-8 incoded csv files which gives me > data frames with utf-8 encoded headers. This means when I write things like > dat$proporci?n > in an R script and then source it, I have to make sure the R script is > incoded using utf-8 (and not latin1) and then I also have to explicitly > tell R that the encoding is utf-8 every time I source the file, that is, > I need to type > source("sr.R", encoding="utf-8"). > > Sure, I could eliminate accents and so forth from the headers by > renaming the data frame columns, and I have done and do do this, but I > shouldn't be required to do this just to avoid encoding issues. > We're living in the 21st century and imho Unicode-based encodings should > be the de facto standard these days. I'm aware that R is pretty clever > and stores the encoding along with the string value in all character > objects and then converts on the fly as necessary. However, Almost > everything I work with is in utf-8 or ASCII (which is compatible with > utf-8 anyway), > so I'd like R to behave as though it does everything natively in utf-8 > so I don't have to worry about it. > Is there something in Rprofile.site or the user Rprofile or an > environment variable I can set or some other way to instruct R to always > assume that input stream encodings will be utf-8 unless otherwise > specified? This way, I would only ever have to supply an encoding or > fileEncoding argument to specify "latin1" if I happ en to encounter it. > > Many thanks, > Andrew. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
?options options(encoding = "utf-8") in a startup file or function should presumably do it. See ?Startup Bert On Wed, Sep 21, 2022 at 7:34 AM Andrew Hart via R-help <r-help at r-project.org> wrote:> Hi there. I'm working with some utf-8 incoded csv files which gives me > data frames with utf-8 encoded headers. This means when I write things like > dat$proporci?n > in an R script and then source it, I have to make sure the R script is > incoded using utf-8 (and not latin1) and then I also have to explicitly > tell R that the encoding is utf-8 every time I source the file, that is, > I need to type > source("sr.R", encoding="utf-8"). > > Sure, I could eliminate accents and so forth from the headers by > renaming the data frame columns, and I have done and do do this, but I > shouldn't be required to do this just to avoid encoding issues. > We're living in the 21st century and imho Unicode-based encodings should > be the de facto standard these days. I'm aware that R is pretty clever > and stores the encoding along with the string value in all character > objects and then converts on the fly as necessary. However, Almost > everything I work with is in utf-8 or ASCII (which is compatible with > utf-8 anyway), > so I'd like R to behave as though it does everything natively in utf-8 > so I don't have to worry about it. > Is there something in Rprofile.site or the user Rprofile or an > environment variable I can set or some other way to instruct R to always > assume that input stream encodings will be utf-8 unless otherwise > specified? This way, I would only ever have to supply an encoding or > fileEncoding argument to specify "latin1" if I happ en to encounter it. > > Many thanks, > Andrew. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]