thr3ads.net - search: "usebytes"

Displaying 20 results from an estimated 110 matches for "usebytes".

suggestion/request: install.packages and unnecessary file modifications

2009 Mar 10

suggestion/request: install.packages and unnecessary file modifications

...ckage.URLs'. Adding the commented lines below, near the end of the function, avoids the unnecessary rewrite. Mark Bravington CSIRO Hobart Australia for (f in files) { page <- readLines(f) old.page <- page # MVB page <- gsub(olddoc, doc, page, fixed = TRUE, useBytes = TRUE) page <- gsub(oldbase, base, page, fixed = TRUE, useBytes = TRUE) page <- gsub(oldutils, utils, page, fixed = TRUE, useBytes = TRUE) page <- gsub(oldgraphics, graphics, page, fixed = TRUE, useBytes = TRUE) page <- gsub(oldstats, stats,...

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015 Mar 02

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings. Here's an example (must be run on Windows to reproduce the error): Sys.setlocale("LC_CTYPE", "chinese") y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97))) Encoding(y) <- "UTF-8" y # [1] "?" grep("\n", y, fixed = TRUE) # Error in grep("\n", y, fixed = TRUE) : invalid

regular expressions, sub

2006 Jan 27

regular expressions, sub

...or (i3 in c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," "," ~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes: ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, useBytes=i4)),sep=""));invisible(0) } trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases 5,6,13..16, the rest unsubstituted trysub("log(","ln ",fu) # B: no substitut...

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015 Mar 04

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

...("a", y, fixed = TRUE) : invalid multibyte string at '<97>' ======================= I believe the problem is in the main/grep.c file, in the fgrep_one function. It tests for a multi-byte character string locale `mbcslocale`, and then for the `use_UTF8`, like so: if (!useBytes && mbcslocale) { ... } else if (!useBytes && use_UTF8) { ... } else ... This can be seen at https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692 A similar pattern occurs in the fgrep_one_bytes function, at...

invert argument in grep

2006 Nov 09

invert argument in grep

Hello, What about an `invert` argument in grep, to return elements that are *not* matching a regular expression : R> grep("pink", colors(), invert = TRUE, value = TRUE) would essentially return the same as : R> colors() [ - grep("pink", colors()) ] I'm attaching the files that I modified (against today's tarball) for that purpose. Cheers, Romain --

writeLines argument useBytes = TRUE still making conversions

2018 Feb 15

writeLines argument useBytes = TRUE still making conversions

I think this behavior is inconsistent with the documentation: tmp <- '?' tmp <- iconv(tmp, to = 'UTF-8') print(Encoding(tmp)) print(charToRaw(tmp)) tmpfilepath <- tempfile() writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = TRUE) [1] "UTF-8" [1] c3 a9 Raw text as hex: c3 83 c2 a9 If I switch to useBytes = FALSE, then the variable is written correctly as c3 a9. Any thoughts? This behavior is related to this issue: https://github.com/yihui/knitr/issues/1509 [[alternative HTML version deleted]]

Writing UTF8 on Windows

2014 Oct 19

Writing UTF8 on Windows

...Lines(string, con) > close(con) > system("file test1.txt") test1.txt: ISO-8859 text > readLines("test1.txt", encoding="UTF-8") [1] "Z\xfcrich" I am not quite sure if this is a bug or expected. To avoid this and other problems, jsonlite uses the 'useBytes` argument, which is supposed to suppress re-encoding when writing to the connection. This is exactly what we need: use enc2utf8 to convert our string to utf8 and then pass it byte-by-byte to the connection: > con <- file("test2.txt", open="wb", encoding = "UTF-8&quot...

writeLines argument useBytes = TRUE still making conversions

2018 Feb 15

writeLines argument useBytes = TRUE still making conversions

...mentation correctly) file(..., encoding = "native.enc") means "assume that strings are in the native encoding, and hence translation is unnecessary". Note that it does not mean "attempt to translate strings to the native encoding". Also note that writeLines(..., useBytes = FALSE) will explicitly translate to the current encoding before sending bytes to the requested connection. In other words, there are two locations where translation might occur in your example: 1) In the call to writeLines(), 2) When characters are passed to the connection. In your case,...

regular expression help

2017 Jun 08

regular expression help

...lar expression is: parties_present_start_1= regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T) parties_present_start_2= regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE) > parties_present_start_1 [1] 138 attr(,"match.length") [1] 123 attr(,"useBytes") [1] TRUE > parties_present_start_2 [1] 20 attr(,"match.length") [1] 949 attr(,"useBytes") [1] TRUE > Why do I see the correct result only in the first case? Best Regards, Ashim [[alternative HTML version deleted]]

writeLines argument useBytes = TRUE still making conversions

2018 Feb 17

writeLines argument useBytes = TRUE still making conversions

...gt;> If all that is true I think ?file needs some attention. I've read it >> several times now and I just don't see how it can be interpreted as >> you've described it. >> >> Best, >> Ista >> >>> >>> Also note that writeLines(..., useBytes = FALSE) will explicitly >>> translate to the current encoding before sending bytes to the >>> requested connection. In other words, there are two locations where >>> translation might occur in your example: >>> >>> 1) In the call to writeLines(), &gt...

regular expression help

2017 Jun 08

regular expression help

...t; regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T) > > parties_present_start_2= > regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE) > >> parties_present_start_1 > [1] 138 > attr(,"match.length") > [1] 123 > attr(,"useBytes") > [1] TRUE >> parties_present_start_2 > [1] 20 > attr(,"match.length") > [1] 949 > attr(,"useBytes") > [1] TRUE >> > > Why do I see the correct result only in the first case? > > Best Regards, > Ashim > In Perl, '.'...

error handling in strcapture

2016 Sep 21

error handling in strcapture

...the pattern is compatible with the prototype or it isn't, it does not depend on the text input. E.g., > regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280")) [[1]] [1] 1 1 1 0 attr(,"match.length") [1] 6 6 6 0 attr(,"useBytes") [1] TRUE [[2]] [1] 1 1 0 1 attr(,"match.length") [1] 2 2 0 2 attr(,"useBytes") [1] TRUE [[3]] [1] -1 attr(,"match.length") [1] -1 attr(,"useBytes") [1] TRUE Second, an error message like 'some lines were bad' is not very helpful. Should it p...

writeLines argument useBytes = TRUE still making conversions

2018 Feb 15

writeLines argument useBytes = TRUE still making conversions

...mean "attempt to > translate strings to the native encoding". If all that is true I think ?file needs some attention. I've read it several times now and I just don't see how it can be interpreted as you've described it. Best, Ista > > Also note that writeLines(..., useBytes = FALSE) will explicitly > translate to the current encoding before sending bytes to the > requested connection. In other words, there are two locations where > translation might occur in your example: > > 1) In the call to writeLines(), > 2) When characters are passed to th...

writeLines argument useBytes = TRUE still making conversions

2018 Feb 17

writeLines argument useBytes = TRUE still making conversions

...to the native encoding". > > If all that is true I think ?file needs some attention. I've read it > several times now and I just don't see how it can be interpreted as > you've described it. > > Best, > Ista > >> >> Also note that writeLines(..., useBytes = FALSE) will explicitly >> translate to the current encoding before sending bytes to the >> requested connection. In other words, there are two locations where >> translation might occur in your example: >> >> 1) In the call to writeLines(), >> 2) When cha...

Feature request: non-dropping regmatches/strextract

2019 Aug 15

Feature request: non-dropping regmatches/strextract

...ted function strextract in utils which is very similar to stringr::str_extract. It would be great if this function, once exported, were to include a drop argument to prevent dropping positions with no matches.? An example solution (last option): strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop = T) { m <- regexec(pattern, x, perl=perl, useBytes=useBytes) result <- regmatches(x, m) if(isTRUE(drop)){ unlist(result) } else if(isFALSE(drop)) { unlist({result[lengths(result)==0] <- NA_character_; result}) } else { stop("Invalid argument for `drop`")...

Load R data files

2017 Sep 12

Load R data files

...ot;) datahs0csv <- read.table("hs0.csv", header=T, sep=",") attach(datahs0csv) detach(datahs0csv) rm(list=ls()) Then I tried to reload the data, but I got this error message. I am not sure what was wrong. *> load("datahs0csv.rda")* Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'datahs0csv.rda', probable reason 'No such file or directory' Any help will be appreciated. with thanks abou ______________________ AbouEl-Ma...

error handling in strcapture

2016 Sep 21

error handling in strcapture

...end > > on the text input. E.g., > > > >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280")) > > [[1]] > > [1] 1 1 1 0 > > attr(,"match.length") > > [1] 6 6 6 0 > > attr(,"useBytes") > > [1] TRUE > > > > [[2]] > > [1] 1 1 0 1 > > attr(,"match.length") > > [1] 2 2 0 2 > > attr(,"useBytes") > > [1] TRUE > > > > [[3]] > > [1] -1 > > attr(,"match.length") > > [1] -1 &g...

readLines interaction with gsub different in R-dev

2018 Feb 17

readLines interaction with gsub different in R-dev

...UE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. However, the following code runs differently: tempf <- tempfile() writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE) entry <- readLines(tempf, encoding = "UTF-8") gsub("(\\w)", "\\U\\1", entry, perl = TRUE) "AUTHOR: AM?LIE" # R-3.4.3 "A" # R-dev Best, Hugh Parsonage.

readLines interaction with gsub different in R-dev

2018 Feb 17

readLines interaction with gsub different in R-dev

...ry different to 'A', R> gsub("^(\\w+?): (\\w)", "\\U\\1\\E: \\2", entry, perl = TRUE) "AUTHOR" # Where did everything after the first group go? I should note the following example too: R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE, useBytes = TRUE) [1] "AUTHOR: AM??LIE" # latin1 encoding A call to `readLines` (possibly `scan()` and `read.table` and friends) is essential. On 18 February 2018 at 02:15, Dirk Eddelbuettel <edd at debian.org> wrote: > > On 17 February 2018 at 21:10, Hugh Parsonage wrote: > |...

Bug report: POSIX regular expression doesn't match for somewhat higher values of upper bound

2017 Apr 04

Bug report: POSIX regular expression doesn't match for somewhat higher values of upper bound

Dear Sirs, while > regexpr('(.{1,2})\\1', 'foo') [1] 2 attr(,"match.length") [1] 2 attr(,"useBytes") [1] TRUE yields the correct match, an incremented upper bound in > regexpr('(.{1,3})\\1', 'foo') [1] -1 attr(,"match.length") [1] -1 attr(,"useBytes") [1] TRUE incorrectly yields no match. R versions tested: 2.11.1 on i486-pc-linux-gnu 2.15.1 on x86...

search for: usebytes