search for: enc2utf8

Displaying 20 results from an estimated 46 matches for "enc2utf8".

2017 Sep 14
2
special latin1 do not print as glyphs in current devel on windows
...009e" "\u009a" "?" "?", "?", and "?" are printed as (incorrect) unicode escapes. "?" for example should be \u20ac not \u0080. (In R 3.4.1, print(x) shows the glyphs and not the unicode escapes. Apparently, as of v3.5, print() calls enc2utf8() (or its equivalent in C (translateCharUTF8?))?) > print("\u20ac") [1] "?" The characters in x are marked as "latin1". > Encoding(x) [1] "latin1" "latin1" "latin1" "latin1" Looking at the CP1252 table (e.g. link abo...
2017 May 24
2
reg-tests-1d.R fails in r72721
...character is representable. In my locale it is not, so x1 > is converted to UTF-8, and everything compares equal. > > An explicit conversion of x1 to UTF-8 should fix this, i.e. replace > > x1 <- path.expand(paste0("~/", filename)) > > with > > x1 <- enc2utf8(path.expand(paste0("~/", filename))) > > Could you try this and see if it helps? Nope: > ## path.expand shouldn't translate to local encoding PR#17120 > filename <- "\U9b3c.R" > > x11 <- path.expand(paste0("~/", filename)) > print(Enc...
2017 Sep 14
0
special latin1 do not print as glyphs in current devel on windows
...quot; "?" > > "?", "?", and "?" are printed as (incorrect) unicode escapes. "?" for > example should be \u20ac not \u0080. > (In R 3.4.1, print(x) shows the glyphs and not the unicode escapes. > Apparently, as of v3.5, print() calls enc2utf8() (or its equivalent in > C (translateCharUTF8?))?) > > > print("\u20ac") > [1] "?" > > The characters in x are marked as "latin1". > > > Encoding(x) > [1] "latin1" "latin1" "latin1" "latin1" &gt...
2017 Aug 01
2
special latin1 do not print as glyphs in current devel on windows
...interested in preserving encodings. What I am worried about is that the encoding is not marked anymore, i.e. that Encoding() returns "unknown". In cp1252 encoding on Windows (note that I am using the cp1252 escape "\x80" and not the Unicode "\u20AC") > x_utf8 <- enc2utf8(c("?", "\x80")) > Encoding(x_utf8) [1] "UTF-8" "UTF-8" > x_nat <- enc2native(x_utf8) > Encoding(x_nat) [1] "unknown" "unknown" See also Kirill's message to this list: "ASCII strings are marked as ASCII internally, but...
2017 Aug 01
3
special latin1 do not print as glyphs in current devel on windows
...ot;0002", etc. see http://www.cp1252.com/. The exception is the cp1252 "80" to "9F" code range. E.g. the Euro sign is "80" in cp1252 but "20AC" in Unicode, endash "96" in cp1252, "2013" in Unicode. The same error seems to happen with enc2utf8(x) Now with iconv() the result is as expected. iconv(x, to = "UTF-8") The second problem IMO is that encoding markers get lost with the enc2* functions x_utf8 <- enc2utf8(x) Encoding(x_utf8) x_nat <- enc2native(x_utf8) Encoding(x_nat) Again, this is not the case with iconv()...
2014 Jul 22
2
Ayuda Error in `colnames<-`(`*tmp*`, value = c(
..."", exe, "\" \"", pdf2, "\"", sep = ""), wait = F) > txt1<-sub(".pdf", ".txt", pdf1) > txt2<-sub(".pdf", ".txt", pdf2) > d1<-readLines(txt1, encoding="UTF-8") > d1<-iconv(enc2utf8(d1), sub = "byte") > d2<-readLines(txt2, encoding="UTF-8") > d2<-iconv(enc2utf8(d2), sub = "byte") > df<-c(d1,d2) > corpus<-Corpus(VectorSource(df)) > d<-tm_map(corpus, content_transformer(tolower)) > d<-tm_map(d, stripWhitespace) &g...
2017 May 24
2
reg-tests-1d.R fails in r72721
Hi, I am failing make check in r72721 at the end of reg-tests-1d.R. The relevant block of code is ## path.expand shouldn't translate to local encoding PR#17120 filename <- "\U9b3c.R" print(Encoding(filename)) x1 <- path.expand(paste0("~/", filename)) print(Encoding(x1)) x2 <- paste0(path.expand("~/"), filename) print(Encoding(x2)) stopifnot(identical(
2018 Feb 17
2
readLines interaction with gsub different in R-dev
...r ?gsub > replacement > ... For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. However, the following code runs differently: tempf <- tempfile() writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE) entry <- readLines(tempf, encoding = "UTF-8") gsub("(\\w)", "\\U\\1", entry, perl = TRUE) "AUTHOR: AM?LIE" # R-3.4.3 "A" # R-dev Best, Hugh Parsonage.
2017 May 24
0
reg-tests-1d.R fails in r72721
...my locale it is not, so x1 >> is converted to UTF-8, and everything compares equal. >> >> An explicit conversion of x1 to UTF-8 should fix this, i.e. replace >> >> x1 <- path.expand(paste0("~/", filename)) >> >> with >> >> x1 <- enc2utf8(path.expand(paste0("~/", filename))) >> >> Could you try this and see if it helps? > > Nope: Okay, how about if we weaken the test? Instead of stopifnot(identical(path.expand(paste0("~/", filename)), paste0(path.expand("~/"), f...
2013 Mar 20
0
Character Encoding: Why are valid Windows-1252 characters encoded as invalid ISO-8859-1 characters?
...ndows-1252 but NOT VALID in ISO8859-1 > Encoding(x) # R has chosen to encode it as 'latin1' which seems to be a synonym for ISO8859-1 [1] "latin1" > x # Even tho character is invalid in latin1, it renders as if it were the valid windows-1252 character [1] "’" > enc2utf8(x) # Encoding as UTF-8 gives us, not a valid UTF-8 'right quote' (/u2019), but the undefined unicode character 'PRIVATE USE TWO' [1] "\u0092" > enc2native(enc2utf8(x)) # Moving the UTF-8 to back to the native encoding correctly shows that it can't render the 'P...
2018 Feb 17
2
readLines interaction with gsub different in R-dev
...... For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. > | > | However, the following code runs differently: > | > | tempf <- tempfile() > | writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE) > | entry <- readLines(tempf, encoding = "UTF-8") > | gsub("(\\w)", "\\U\\1", entry, perl = TRUE) > | > | > | "AUTHOR: AM?LIE" # R-3.4.3 > | > | "A"...
2014 Oct 19
1
Writing UTF8 on Windows
...or socket. RFC7159 prescribes json must be encoded as unicode; ISO-8859 (including latin1) is invalid. Hence I would like R to write strings as utf8, irrespective of the type of connection, platform or locale. Implementing this turns out to be unsurprisingly difficult on windows. > string <- enc2utf8("Z?rich") > Encoding(string) [1] "UTF-8" For example when writing the utf8 string to a binary utf8 binary connection, the output seems to be latin1: > con <- file("test1.txt", open="wb", encoding = "UTF-8") > writeLines(string, con) &g...
2011 Jul 17
3
gsub() with unicode and escape character
...that a data frame cannot have unicode codes, cf. e.g. > data.frame(animals=c("d\u0254g","w\u0254lf","cat"))->my.data.2 > my.data.2$animals [1] d?g w?lf cat Levels: cat d<U+0254>g w<U+0254>lf I've done the best I can based on what ?gsub and ?enc2utf8 tell me, but I haven't found a solution. Unrelated to that problem, but related to gsub() is that I can't find a way for gsub() to interpret the backslash as a character. In regular expression, \\ should represent "the character \", but gsub() doesn't: > data.frame(animal...
2010 Sep 17
1
odfWeave UTF-8 error and latin characters
...teste na doação de sangue: 0.9379 0.7874 0e+00 Resultado do reteste da doação: 0.9317 0.6607 2e-04 Indicação médica para investigação: 0.6957 0.5556 1e-04 Considering some sugestions form other lists I tryed to encode the table using enc2utf8 and descr::toUTF8 such as <<tabela2, echo = FALSE, results = xml>>= odfTable(enc2utf8(tabela2),useRowNames=T,name ='Tabela 2') @ OR <<tabela2, echo = FALSE, results = xml>>= enc2utf8(odfTable(tabela2,useRowNames=T,name ='Tabela 2')) @ OR <<tabe...
2015 Feb 26
4
Native characterset is wrong for unicode builds for Windows
When I send some outlandish characters through enc2native (or format) in R 3.1.2 on Ubuntu trusty it works quite well: > "?????" [1] "?????" > enc2native("?????") [1] "?????" > Encoding(enc2native("?????")) [1] "UTF-8" In Windows the result is different: > "?????" [1] "?????" >
2017 Sep 06
4
post_processor in rmarkdown not working
...;, text) #nolint text <- c( text[1:(maketitle - 1)], "\\begin{fmtext}", text[(maketitle + 1):(end_first_page - 1)], "\\end{fmtext}", "\\maketitle", text[(end_first_page + 1):length(text)] ) writeLines(enc2utf8(text), output_file, useBytes = TRUE) } output_file } output_format( knitr = knitr_options( opts_knit = list( width = 60, concordance = TRUE ), opts_chunk = opts_chunk, knit_hooks = knit_hooks ), pandoc = pandoc_options( to = &qu...
2017 Aug 01
0
special latin1 do not print as glyphs in current devel on windows
.../www.cp1252.com/. > The exception is the cp1252 "80" to "9F" code range. E.g. the Euro sign is > "80" in cp1252 but "20AC" in Unicode, endash "96" in cp1252, "2013" in > Unicode. > The same error seems to happen with > > enc2utf8(x) > > Now with iconv() the result is as expected. > > iconv(x, to = "UTF-8") > > > The second problem IMO is that encoding markers get lost with the enc2* > functions As you are changing encodings, you do not want to preserve encoding! > x_utf8 <- enc2...
2017 May 24
1
reg-tests-1d.R fails in r72721
...; y1 <- paste0("~/", filename) > print(Encoding(y1)) [1] "UTF-8" > > y2 <- path.expand(y1) > print(Encoding(y2)) [1] "unknown" > > y3a <- iconv(y2, to="UTF-8") > print(Encoding(y3a)) [1] "unknown" > > y3b <- enc2utf8(y2) > print(Encoding(y3b)) [1] "unknown" > > Encoding(y2) <- "UTF-8" > print(Encoding(y2)) [1] "unknown" > h. -- +--- | Hiroyuki Kawakatsu | Business School, Dublin City University | Dublin 9, Ireland. Tel +353 (0)1 700 7496
2018 Feb 17
0
readLines interaction with gsub different in R-dev
...lso contain "\U" or "\L" to > convert the rest of the replacement to upper or lower case and "\E" to end > case conversion. > > | > > | However, the following code runs differently: > > | > > | tempf <- tempfile() > > | writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE) > > | entry <- readLines(tempf, encoding = "UTF-8") > > | gsub("(\\w)", "\\U\\1", entry, perl = TRUE) > > | > > | > > | "AUTHOR: AM?LIE" # R-3.4.3 > > | > &g...
2014 Jul 28
2
wordcloud y tabla de palabras
...(info.cor, content_transformer(tolower)) > > info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) > > info.cor.cl<-tm_map(info.cor.cl,removePunctuation) > > sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") > > sw<-iconv(enc2utf8(sw), sub = "byte") > > info.cor.cl<-tm_map(info.cor.cl, removeWords, stopwords("spanish")) > > info.tdm<-TermDocumentMatrix(info.cor.cl) > > result<-list(name = informes, tdm= info.tdm) > > } > >>tdm<-lapply(informes, TDM, path =...