Hi there. I'm working with some utf-8 incoded csv files which gives me
data frames with utf-8 encoded headers. This means when I write things like
dat$proporci?n
in an R script and then source it, I have to make sure the R script is
incoded using utf-8 (and not latin1) and then I also have to explicitly
tell R that the encoding is utf-8 every time I source the file, that is,
I need to type
source("sr.R", encoding="utf-8").
Sure, I could eliminate accents and so forth from the headers by
renaming the data frame columns, and I have done and do do this, but I
shouldn't be required to do this just to avoid encoding issues.
We're living in the 21st century and imho Unicode-based encodings should
be the de facto standard these days. I'm aware that R is pretty clever
and stores the encoding along with the string value in all character
objects and then converts on the fly as necessary. However, Almost
everything I work with is in utf-8 or ASCII (which is compatible with
utf-8 anyway),
so I'd like R to behave as though it does everything natively in utf-8
so I don't have to worry about it.
Is there something in Rprofile.site or the user Rprofile or an
environment variable I can set or some other way to instruct R to always
assume that input stream encodings will be utf-8 unless otherwise
specified? This way, I would only ever have to supply an encoding or
fileEncoding argument to specify "latin1" if I happ en to encounter
it.
Many thanks,
Andrew.
Hi Andrew,
If you look at ?source you see its default value for encoding is
picked up from getOption("encoding"). Couldn't you just set this
option in your Rprofile?
HTH,
Eric
On Wed, Sep 21, 2022 at 5:34 PM Andrew Hart via R-help
<r-help at r-project.org> wrote:>
> Hi there. I'm working with some utf-8 incoded csv files which gives me
> data frames with utf-8 encoded headers. This means when I write things like
> dat$proporci?n
> in an R script and then source it, I have to make sure the R script is
> incoded using utf-8 (and not latin1) and then I also have to explicitly
> tell R that the encoding is utf-8 every time I source the file, that is,
> I need to type
> source("sr.R", encoding="utf-8").
>
> Sure, I could eliminate accents and so forth from the headers by
> renaming the data frame columns, and I have done and do do this, but I
> shouldn't be required to do this just to avoid encoding issues.
> We're living in the 21st century and imho Unicode-based encodings
should
> be the de facto standard these days. I'm aware that R is pretty clever
> and stores the encoding along with the string value in all character
> objects and then converts on the fly as necessary. However, Almost
> everything I work with is in utf-8 or ASCII (which is compatible with
> utf-8 anyway),
> so I'd like R to behave as though it does everything natively in utf-8
> so I don't have to worry about it.
> Is there something in Rprofile.site or the user Rprofile or an
> environment variable I can set or some other way to instruct R to always
> assume that input stream encodings will be utf-8 unless otherwise
> specified? This way, I would only ever have to supply an encoding or
> fileEncoding argument to specify "latin1" if I happ en to
encounter it.
>
> Many thanks,
> Andrew.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
?options options(encoding = "utf-8") in a startup file or function should presumably do it. See ?Startup Bert On Wed, Sep 21, 2022 at 7:34 AM Andrew Hart via R-help <r-help at r-project.org> wrote:> Hi there. I'm working with some utf-8 incoded csv files which gives me > data frames with utf-8 encoded headers. This means when I write things like > dat$proporci?n > in an R script and then source it, I have to make sure the R script is > incoded using utf-8 (and not latin1) and then I also have to explicitly > tell R that the encoding is utf-8 every time I source the file, that is, > I need to type > source("sr.R", encoding="utf-8"). > > Sure, I could eliminate accents and so forth from the headers by > renaming the data frame columns, and I have done and do do this, but I > shouldn't be required to do this just to avoid encoding issues. > We're living in the 21st century and imho Unicode-based encodings should > be the de facto standard these days. I'm aware that R is pretty clever > and stores the encoding along with the string value in all character > objects and then converts on the fly as necessary. However, Almost > everything I work with is in utf-8 or ASCII (which is compatible with > utf-8 anyway), > so I'd like R to behave as though it does everything natively in utf-8 > so I don't have to worry about it. > Is there something in Rprofile.site or the user Rprofile or an > environment variable I can set or some other way to instruct R to always > assume that input stream encodings will be utf-8 unless otherwise > specified? This way, I would only ever have to supply an encoding or > fileEncoding argument to specify "latin1" if I happ en to encounter it. > > Many thanks, > Andrew. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]