thr3ads.net - search: "enc2utf8"

Displaying 20 results from an estimated 46 matches for "enc2utf8".

special latin1 do not print as glyphs in current devel on windows

2017 Sep 14

special latin1 do not print as glyphs in current devel on windows

...009e" "\u009a" "?" "?", "?", and "?" are printed as (incorrect) unicode escapes. "?" for example should be \u20ac not \u0080. (In R 3.4.1, print(x) shows the glyphs and not the unicode escapes. Apparently, as of v3.5, print() calls enc2utf8() (or its equivalent in C (translateCharUTF8?))?) > print("\u20ac") [1] "?" The characters in x are marked as "latin1". > Encoding(x) [1] "latin1" "latin1" "latin1" "latin1" Looking at the CP1252 table (e.g. link abo...

reg-tests-1d.R fails in r72721

2017 May 24

reg-tests-1d.R fails in r72721

...character is representable. In my locale it is not, so x1 > is converted to UTF-8, and everything compares equal. > > An explicit conversion of x1 to UTF-8 should fix this, i.e. replace > > x1 <- path.expand(paste0("~/", filename)) > > with > > x1 <- enc2utf8(path.expand(paste0("~/", filename))) > > Could you try this and see if it helps? Nope: > ## path.expand shouldn't translate to local encoding PR#17120 > filename <- "\U9b3c.R" > > x11 <- path.expand(paste0("~/", filename)) > print(Enc...

special latin1 do not print as glyphs in current devel on windows

2017 Sep 14

special latin1 do not print as glyphs in current devel on windows

...quot; "?" > > "?", "?", and "?" are printed as (incorrect) unicode escapes. "?" for > example should be \u20ac not \u0080. > (In R 3.4.1, print(x) shows the glyphs and not the unicode escapes. > Apparently, as of v3.5, print() calls enc2utf8() (or its equivalent in > C (translateCharUTF8?))?) > > > print("\u20ac") > [1] "?" > > The characters in x are marked as "latin1". > > > Encoding(x) > [1] "latin1" "latin1" "latin1" "latin1" &gt...

special latin1 do not print as glyphs in current devel on windows

2017 Aug 01

special latin1 do not print as glyphs in current devel on windows

...interested in preserving encodings. What I am worried about is that the encoding is not marked anymore, i.e. that Encoding() returns "unknown". In cp1252 encoding on Windows (note that I am using the cp1252 escape "\x80" and not the Unicode "\u20AC") > x_utf8 <- enc2utf8(c("?", "\x80")) > Encoding(x_utf8) [1] "UTF-8" "UTF-8" > x_nat <- enc2native(x_utf8) > Encoding(x_nat) [1] "unknown" "unknown" See also Kirill's message to this list: "ASCII strings are marked as ASCII internally, but...

special latin1 do not print as glyphs in current devel on windows

2017 Aug 01

special latin1 do not print as glyphs in current devel on windows

...ot;0002", etc. see http://www.cp1252.com/. The exception is the cp1252 "80" to "9F" code range. E.g. the Euro sign is "80" in cp1252 but "20AC" in Unicode, endash "96" in cp1252, "2013" in Unicode. The same error seems to happen with enc2utf8(x) Now with iconv() the result is as expected. iconv(x, to = "UTF-8") The second problem IMO is that encoding markers get lost with the enc2* functions x_utf8 <- enc2utf8(x) Encoding(x_utf8) x_nat <- enc2native(x_utf8) Encoding(x_nat) Again, this is not the case with iconv()...

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

2014 Jul 22

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

..."", exe, "\" \"", pdf2, "\"", sep = ""), wait = F) > txt1<-sub(".pdf", ".txt", pdf1) > txt2<-sub(".pdf", ".txt", pdf2) > d1<-readLines(txt1, encoding="UTF-8") > d1<-iconv(enc2utf8(d1), sub = "byte") > d2<-readLines(txt2, encoding="UTF-8") > d2<-iconv(enc2utf8(d2), sub = "byte") > df<-c(d1,d2) > corpus<-Corpus(VectorSource(df)) > d<-tm_map(corpus, content_transformer(tolower)) > d<-tm_map(d, stripWhitespace) &g...

reg-tests-1d.R fails in r72721

2017 May 24

reg-tests-1d.R fails in r72721

Hi, I am failing make check in r72721 at the end of reg-tests-1d.R. The relevant block of code is ## path.expand shouldn't translate to local encoding PR#17120 filename <- "\U9b3c.R" print(Encoding(filename)) x1 <- path.expand(paste0("~/", filename)) print(Encoding(x1)) x2 <- paste0(path.expand("~/"), filename) print(Encoding(x2)) stopifnot(identical(

readLines interaction with gsub different in R-dev

2018 Feb 17

readLines interaction with gsub different in R-dev

...r ?gsub > replacement > ... For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. However, the following code runs differently: tempf <- tempfile() writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE) entry <- readLines(tempf, encoding = "UTF-8") gsub("(\\w)", "\\U\\1", entry, perl = TRUE) "AUTHOR: AM?LIE" # R-3.4.3 "A" # R-dev Best, Hugh Parsonage.

reg-tests-1d.R fails in r72721

2017 May 24

reg-tests-1d.R fails in r72721

...my locale it is not, so x1 >> is converted to UTF-8, and everything compares equal. >> >> An explicit conversion of x1 to UTF-8 should fix this, i.e. replace >> >> x1 <- path.expand(paste0("~/", filename)) >> >> with >> >> x1 <- enc2utf8(path.expand(paste0("~/", filename))) >> >> Could you try this and see if it helps? > > Nope: Okay, how about if we weaken the test? Instead of stopifnot(identical(path.expand(paste0("~/", filename)), paste0(path.expand("~/"), f...

Character Encoding: Why are valid Windows-1252 characters encoded as invalid ISO-8859-1 characters?

2013 Mar 20

Character Encoding: Why are valid Windows-1252 characters encoded as invalid ISO-8859-1 characters?

...ndows-1252 but NOT VALID in ISO8859-1 > Encoding(x) # R has chosen to encode it as 'latin1' which seems to be a synonym for ISO8859-1 [1] "latin1" > x # Even tho character is invalid in latin1, it renders as if it were the valid windows-1252 character [1] "’" > enc2utf8(x) # Encoding as UTF-8 gives us, not a valid UTF-8 'right quote' (/u2019), but the undefined unicode character 'PRIVATE USE TWO' [1] "\u0092" > enc2native(enc2utf8(x)) # Moving the UTF-8 to back to the native encoding correctly shows that it can't render the 'P...

readLines interaction with gsub different in R-dev

2018 Feb 17

readLines interaction with gsub different in R-dev

...... For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. > | > | However, the following code runs differently: > | > | tempf <- tempfile() > | writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE) > | entry <- readLines(tempf, encoding = "UTF-8") > | gsub("(\\w)", "\\U\\1", entry, perl = TRUE) > | > | > | "AUTHOR: AM?LIE" # R-3.4.3 > | > | "A"...

Writing UTF8 on Windows

2014 Oct 19

Writing UTF8 on Windows

...or socket. RFC7159 prescribes json must be encoded as unicode; ISO-8859 (including latin1) is invalid. Hence I would like R to write strings as utf8, irrespective of the type of connection, platform or locale. Implementing this turns out to be unsurprisingly difficult on windows. > string <- enc2utf8("Z?rich") > Encoding(string) [1] "UTF-8" For example when writing the utf8 string to a binary utf8 binary connection, the output seems to be latin1: > con <- file("test1.txt", open="wb", encoding = "UTF-8") > writeLines(string, con) &g...

gsub() with unicode and escape character

2011 Jul 17

gsub() with unicode and escape character

...that a data frame cannot have unicode codes, cf. e.g. > data.frame(animals=c("d\u0254g","w\u0254lf","cat"))->my.data.2 > my.data.2$animals [1] d?g w?lf cat Levels: cat d<U+0254>g w<U+0254>lf I've done the best I can based on what ?gsub and ?enc2utf8 tell me, but I haven't found a solution. Unrelated to that problem, but related to gsub() is that I can't find a way for gsub() to interpret the backslash as a character. In regular expression, \\ should represent "the character \", but gsub() doesn't: > data.frame(animal...

odfWeave UTF-8 error and latin characters

2010 Sep 17

odfWeave UTF-8 error and latin characters

...teste na doação de sangue: 0.9379 0.7874 0e+00 Resultado do reteste da doação: 0.9317 0.6607 2e-04 Indicação médica para investigação: 0.6957 0.5556 1e-04 Considering some sugestions form other lists I tryed to encode the table using enc2utf8 and descr::toUTF8 such as <<tabela2, echo = FALSE, results = xml>>= odfTable(enc2utf8(tabela2),useRowNames=T,name ='Tabela 2') @ OR <<tabela2, echo = FALSE, results = xml>>= enc2utf8(odfTable(tabela2,useRowNames=T,name ='Tabela 2')) @ OR <<tabe...

Native characterset is wrong for unicode builds for Windows

2015 Feb 26

Native characterset is wrong for unicode builds for Windows

When I send some outlandish characters through enc2native (or format) in R 3.1.2 on Ubuntu trusty it works quite well: > "?????" [1] "?????" > enc2native("?????") [1] "?????" > Encoding(enc2native("?????")) [1] "UTF-8" In Windows the result is different: > "?????" [1] "?????" >

post_processor in rmarkdown not working

2017 Sep 06

post_processor in rmarkdown not working

...;, text) #nolint text <- c( text[1:(maketitle - 1)], "\\begin{fmtext}", text[(maketitle + 1):(end_first_page - 1)], "\\end{fmtext}", "\\maketitle", text[(end_first_page + 1):length(text)] ) writeLines(enc2utf8(text), output_file, useBytes = TRUE) } output_file } output_format( knitr = knitr_options( opts_knit = list( width = 60, concordance = TRUE ), opts_chunk = opts_chunk, knit_hooks = knit_hooks ), pandoc = pandoc_options( to = &qu...

special latin1 do not print as glyphs in current devel on windows

2017 Aug 01

special latin1 do not print as glyphs in current devel on windows

.../www.cp1252.com/. > The exception is the cp1252 "80" to "9F" code range. E.g. the Euro sign is > "80" in cp1252 but "20AC" in Unicode, endash "96" in cp1252, "2013" in > Unicode. > The same error seems to happen with > > enc2utf8(x) > > Now with iconv() the result is as expected. > > iconv(x, to = "UTF-8") > > > The second problem IMO is that encoding markers get lost with the enc2* > functions As you are changing encodings, you do not want to preserve encoding! > x_utf8 <- enc2...

reg-tests-1d.R fails in r72721

2017 May 24

reg-tests-1d.R fails in r72721

...; y1 <- paste0("~/", filename) > print(Encoding(y1)) [1] "UTF-8" > > y2 <- path.expand(y1) > print(Encoding(y2)) [1] "unknown" > > y3a <- iconv(y2, to="UTF-8") > print(Encoding(y3a)) [1] "unknown" > > y3b <- enc2utf8(y2) > print(Encoding(y3b)) [1] "unknown" > > Encoding(y2) <- "UTF-8" > print(Encoding(y2)) [1] "unknown" > h. -- +--- | Hiroyuki Kawakatsu | Business School, Dublin City University | Dublin 9, Ireland. Tel +353 (0)1 700 7496

readLines interaction with gsub different in R-dev

2018 Feb 17

readLines interaction with gsub different in R-dev

...lso contain "\U" or "\L" to > convert the rest of the replacement to upper or lower case and "\E" to end > case conversion. > > | > > | However, the following code runs differently: > > | > > | tempf <- tempfile() > > | writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE) > > | entry <- readLines(tempf, encoding = "UTF-8") > > | gsub("(\\w)", "\\U\\1", entry, perl = TRUE) > > | > > | > > | "AUTHOR: AM?LIE" # R-3.4.3 > > | > &g...

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

...(info.cor, content_transformer(tolower)) > > info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) > > info.cor.cl<-tm_map(info.cor.cl,removePunctuation) > > sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") > > sw<-iconv(enc2utf8(sw), sub = "byte") > > info.cor.cl<-tm_map(info.cor.cl, removeWords, stopwords("spanish")) > > info.tdm<-TermDocumentMatrix(info.cor.cl) > > result<-list(name = informes, tdm= info.tdm) > > } > >>tdm<-lapply(informes, TDM, path =...

search for: enc2utf8