search for: usebytes

Displaying 20 results from an estimated 106 matches for "usebytes".

2009 Mar 10
1
suggestion/request: install.packages and unnecessary file modifications
...ckage.URLs'. Adding the commented lines below, near the end of the function, avoids the unnecessary rewrite. Mark Bravington CSIRO Hobart Australia for (f in files) { page <- readLines(f) old.page <- page # MVB page <- gsub(olddoc, doc, page, fixed = TRUE, useBytes = TRUE) page <- gsub(oldbase, base, page, fixed = TRUE, useBytes = TRUE) page <- gsub(oldutils, utils, page, fixed = TRUE, useBytes = TRUE) page <- gsub(oldgraphics, graphics, page, fixed = TRUE, useBytes = TRUE) page <- gsub(oldstats, stats,...
2015 Mar 02
2
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings. Here's an example (must be run on Windows to reproduce the error): Sys.setlocale("LC_CTYPE", "chinese") y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97))) Encoding(y) <- "UTF-8" y # [1] "?" grep("\n", y, fixed = TRUE) # Error in grep("\n", y, fixed = TRUE) : invalid
2006 Jan 27
4
regular expressions, sub
...or (i3 in c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," "," ~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes: ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, useBytes=i4)),sep=""));invisible(0) } trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases 5,6,13..16, the rest unsubstituted trysub("log(","ln ",fu) # B: no substitut...
2015 Mar 04
0
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
...("a", y, fixed = TRUE) : invalid multibyte string at '<97>' ======================= I believe the problem is in the main/grep.c file, in the fgrep_one function. It tests for a multi-byte character string locale `mbcslocale`, and then for the `use_UTF8`, like so: if (!useBytes && mbcslocale) { ... } else if (!useBytes && use_UTF8) { ... } else ... This can be seen at https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692 A similar pattern occurs in the fgrep_one_bytes function, at...
2006 Nov 09
1
invert argument in grep
Hello, What about an `invert` argument in grep, to return elements that are *not* matching a regular expression : R> grep("pink", colors(), invert = TRUE, value = TRUE) would essentially return the same as : R> colors() [ - grep("pink", colors()) ] I'm attaching the files that I modified (against today's tarball) for that purpose. Cheers, Romain --
2018 Feb 15
2
writeLines argument useBytes = TRUE still making conversions
I think this behavior is inconsistent with the documentation: tmp <- '?' tmp <- iconv(tmp, to = 'UTF-8') print(Encoding(tmp)) print(charToRaw(tmp)) tmpfilepath <- tempfile() writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = TRUE) [1] "UTF-8" [1] c3 a9 Raw text as hex: c3 83 c2 a9 If I switch to useBytes = FALSE, then the variable is written correctly as c3 a9. Any thoughts? This behavior is related to this issue: https://github.com/yihui/knitr/issues/1509 [[alternative HTML version deleted]]
2014 Oct 19
1
Writing UTF8 on Windows
...Lines(string, con) > close(con) > system("file test1.txt") test1.txt: ISO-8859 text > readLines("test1.txt", encoding="UTF-8") [1] "Z\xfcrich" I am not quite sure if this is a bug or expected. To avoid this and other problems, jsonlite uses the 'useBytes` argument, which is supposed to suppress re-encoding when writing to the connection. This is exactly what we need: use enc2utf8 to convert our string to utf8 and then pass it byte-by-byte to the connection: > con <- file("test2.txt", open="wb", encoding = "UTF-8&quot...
2018 Feb 15
0
writeLines argument useBytes = TRUE still making conversions
...mentation correctly) file(..., encoding = "native.enc") means "assume that strings are in the native encoding, and hence translation is unnecessary". Note that it does not mean "attempt to translate strings to the native encoding". Also note that writeLines(..., useBytes = FALSE) will explicitly translate to the current encoding before sending bytes to the requested connection. In other words, there are two locations where translation might occur in your example: 1) In the call to writeLines(), 2) When characters are passed to the connection. In your case,...
2017 Jun 08
2
regular expression help
...lar expression is: parties_present_start_1= regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T) parties_present_start_2= regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE) > parties_present_start_1 [1] 138 attr(,"match.length") [1] 123 attr(,"useBytes") [1] TRUE > parties_present_start_2 [1] 20 attr(,"match.length") [1] 949 attr(,"useBytes") [1] TRUE > Why do I see the correct result only in the first case? Best Regards, Ashim [[alternative HTML version deleted]]
2018 Feb 17
1
writeLines argument useBytes = TRUE still making conversions
...gt;> If all that is true I think ?file needs some attention. I've read it >> several times now and I just don't see how it can be interpreted as >> you've described it. >> >> Best, >> Ista >> >>> >>> Also note that writeLines(..., useBytes = FALSE) will explicitly >>> translate to the current encoding before sending bytes to the >>> requested connection. In other words, there are two locations where >>> translation might occur in your example: >>> >>> 1) In the call to writeLines(), &gt...
2017 Jun 08
0
regular expression help
...t; regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T) > > parties_present_start_2= > regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE) > >> parties_present_start_1 > [1] 138 > attr(,"match.length") > [1] 123 > attr(,"useBytes") > [1] TRUE >> parties_present_start_2 > [1] 20 > attr(,"match.length") > [1] 949 > attr(,"useBytes") > [1] TRUE >> > > Why do I see the correct result only in the first case? > > Best Regards, > Ashim > In Perl, '.'...
2016 Sep 21
2
error handling in strcapture
...the pattern is compatible with the prototype or it isn't, it does not depend on the text input. E.g., > regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280")) [[1]] [1] 1 1 1 0 attr(,"match.length") [1] 6 6 6 0 attr(,"useBytes") [1] TRUE [[2]] [1] 1 1 0 1 attr(,"match.length") [1] 2 2 0 2 attr(,"useBytes") [1] TRUE [[3]] [1] -1 attr(,"match.length") [1] -1 attr(,"useBytes") [1] TRUE Second, an error message like 'some lines were bad' is not very helpful. Should it p...
2018 Feb 15
2
writeLines argument useBytes = TRUE still making conversions
...mean "attempt to > translate strings to the native encoding". If all that is true I think ?file needs some attention. I've read it several times now and I just don't see how it can be interpreted as you've described it. Best, Ista > > Also note that writeLines(..., useBytes = FALSE) will explicitly > translate to the current encoding before sending bytes to the > requested connection. In other words, there are two locations where > translation might occur in your example: > > 1) In the call to writeLines(), > 2) When characters are passed to th...
2018 Feb 17
0
writeLines argument useBytes = TRUE still making conversions
...to the native encoding". > > If all that is true I think ?file needs some attention. I've read it > several times now and I just don't see how it can be interpreted as > you've described it. > > Best, > Ista > >> >> Also note that writeLines(..., useBytes = FALSE) will explicitly >> translate to the current encoding before sending bytes to the >> requested connection. In other words, there are two locations where >> translation might occur in your example: >> >> 1) In the call to writeLines(), >> 2) When cha...
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
...ted function strextract in utils which is very similar to stringr::str_extract. It would be great if this function, once exported, were to include a drop argument to prevent dropping positions with no matches.? An example solution (last option): strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop = T) { m <- regexec(pattern, x, perl=perl, useBytes=useBytes) result <- regmatches(x, m) if(isTRUE(drop)){ unlist(result) } else if(isFALSE(drop)) { unlist({result[lengths(result)==0] <- NA_character_; result}) } else { stop("Invalid argument for `drop`")...
2017 Sep 12
3
Load R data files
...ot;) datahs0csv <- read.table("hs0.csv", header=T, sep=",") attach(datahs0csv) detach(datahs0csv) rm(list=ls()) Then I tried to reload the data, but I got this error message. I am not sure what was wrong. *> load("datahs0csv.rda")* Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'datahs0csv.rda', probable reason 'No such file or directory' Any help will be appreciated. with thanks abou ______________________ AbouEl-Ma...
2016 Sep 21
2
error handling in strcapture
...end > > on the text input. E.g., > > > >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280")) > > [[1]] > > [1] 1 1 1 0 > > attr(,"match.length") > > [1] 6 6 6 0 > > attr(,"useBytes") > > [1] TRUE > > > > [[2]] > > [1] 1 1 0 1 > > attr(,"match.length") > > [1] 2 2 0 2 > > attr(,"useBytes") > > [1] TRUE > > > > [[3]] > > [1] -1 > > attr(,"match.length") > > [1] -1 &g...
2018 Feb 17
2
readLines interaction with gsub different in R-dev
...UE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. However, the following code runs differently: tempf <- tempfile() writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE) entry <- readLines(tempf, encoding = "UTF-8") gsub("(\\w)", "\\U\\1", entry, perl = TRUE) "AUTHOR: AM?LIE" # R-3.4.3 "A" # R-dev Best, Hugh Parsonage.
2018 Feb 17
2
readLines interaction with gsub different in R-dev
...ry different to 'A', R> gsub("^(\\w+?): (\\w)", "\\U\\1\\E: \\2", entry, perl = TRUE) "AUTHOR" # Where did everything after the first group go? I should note the following example too: R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE, useBytes = TRUE) [1] "AUTHOR: AM??LIE" # latin1 encoding A call to `readLines` (possibly `scan()` and `read.table` and friends) is essential. On 18 February 2018 at 02:15, Dirk Eddelbuettel <edd at debian.org> wrote: > > On 17 February 2018 at 21:10, Hugh Parsonage wrote: > |...
2017 Apr 04
2
Bug report: POSIX regular expression doesn't match for somewhat higher values of upper bound
Dear Sirs, while > regexpr('(.{1,2})\\1', 'foo') [1] 2 attr(,"match.length") [1] 2 attr(,"useBytes") [1] TRUE yields the correct match, an incremented upper bound in > regexpr('(.{1,3})\\1', 'foo') [1] -1 attr(,"match.length") [1] -1 attr(,"useBytes") [1] TRUE incorrectly yields no match. R versions tested: 2.11.1 on i486-pc-linux-gnu 2.15.1 on x86...