Displaying 20 results from an estimated 110 matches for "usebytes".
2009 Mar 10
1
suggestion/request: install.packages and unnecessary file modifications
...ckage.URLs'. Adding the commented lines below, near the end of the function, avoids the unnecessary rewrite.
Mark Bravington
CSIRO
Hobart
Australia
for (f in files) {
page <- readLines(f)
old.page <- page # MVB
page <- gsub(olddoc, doc, page, fixed = TRUE, useBytes = TRUE)
page <- gsub(oldbase, base, page, fixed = TRUE, useBytes = TRUE)
page <- gsub(oldutils, utils, page, fixed = TRUE, useBytes = TRUE)
page <- gsub(oldgraphics, graphics, page, fixed = TRUE,
useBytes = TRUE)
page <- gsub(oldstats, stats,...
2015 Mar 02
2
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings.
Here's an example (must be run on Windows to reproduce the error):
Sys.setlocale("LC_CTYPE", "chinese")
y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97)))
Encoding(y) <- "UTF-8"
y
# [1] "?"
grep("\n", y, fixed = TRUE)
# Error in grep("\n", y, fixed = TRUE) : invalid
2006 Jan 27
4
regular expressions, sub
...or (i3 in
c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE))
print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," ","
~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes:
", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3,
useBytes=i4)),sep=""));invisible(0) }
trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases
5,6,13..16, the rest unsubstituted
trysub("log(","ln ",fu) # B: no substitut...
2015 Mar 04
0
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
...("a", y, fixed = TRUE) : invalid multibyte string at '<97>'
=======================
I believe the problem is in the main/grep.c file, in the fgrep_one
function. It tests for a multi-byte character string locale
`mbcslocale`, and then for the `use_UTF8`, like so:
if (!useBytes && mbcslocale) {
...
} else if (!useBytes && use_UTF8) {
...
} else ...
This can be seen at
https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692
A similar pattern occurs in the fgrep_one_bytes function, at...
2006 Nov 09
1
invert argument in grep
Hello,
What about an `invert` argument in grep, to return elements that are
*not* matching a regular expression :
R> grep("pink", colors(), invert = TRUE, value = TRUE)
would essentially return the same as :
R> colors() [ - grep("pink", colors()) ]
I'm attaching the files that I modified (against today's tarball) for
that purpose.
Cheers,
Romain
--
2018 Feb 15
2
writeLines argument useBytes = TRUE still making conversions
I think this behavior is inconsistent with the documentation:
tmp <- '?'
tmp <- iconv(tmp, to = 'UTF-8')
print(Encoding(tmp))
print(charToRaw(tmp))
tmpfilepath <- tempfile()
writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = TRUE)
[1] "UTF-8"
[1] c3 a9
Raw text as hex: c3 83 c2 a9
If I switch to useBytes = FALSE, then the variable is written correctly as c3 a9.
Any thoughts? This behavior is related to this issue: https://github.com/yihui/knitr/issues/1509
[[alternative HTML version deleted]]
2014 Oct 19
1
Writing UTF8 on Windows
...Lines(string, con)
> close(con)
> system("file test1.txt")
test1.txt: ISO-8859 text
> readLines("test1.txt", encoding="UTF-8")
[1] "Z\xfcrich"
I am not quite sure if this is a bug or expected. To avoid this and
other problems, jsonlite uses the 'useBytes` argument, which is
supposed to suppress re-encoding when writing to the connection. This
is exactly what we need: use enc2utf8 to convert our string to utf8
and then pass it byte-by-byte to the connection:
> con <- file("test2.txt", open="wb", encoding = "UTF-8"...
2018 Feb 15
0
writeLines argument useBytes = TRUE still making conversions
...mentation correctly)
file(..., encoding = "native.enc")
means "assume that strings are in the native encoding, and hence
translation is unnecessary". Note that it does not mean "attempt to
translate strings to the native encoding".
Also note that writeLines(..., useBytes = FALSE) will explicitly
translate to the current encoding before sending bytes to the
requested connection. In other words, there are two locations where
translation might occur in your example:
1) In the call to writeLines(),
2) When characters are passed to the connection.
In your case,...
2017 Jun 08
2
regular expression help
...lar expression is:
parties_present_start_1=
regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)
parties_present_start_2=
regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)
> parties_present_start_1
[1] 138
attr(,"match.length")
[1] 123
attr(,"useBytes")
[1] TRUE
> parties_present_start_2
[1] 20
attr(,"match.length")
[1] 949
attr(,"useBytes")
[1] TRUE
>
Why do I see the correct result only in the first case?
Best Regards,
Ashim
[[alternative HTML version deleted]]
2018 Feb 17
1
writeLines argument useBytes = TRUE still making conversions
...gt;> If all that is true I think ?file needs some attention. I've read it
>> several times now and I just don't see how it can be interpreted as
>> you've described it.
>>
>> Best,
>> Ista
>>
>>>
>>> Also note that writeLines(..., useBytes = FALSE) will explicitly
>>> translate to the current encoding before sending bytes to the
>>> requested connection. In other words, there are two locations where
>>> translation might occur in your example:
>>>
>>> 1) In the call to writeLines(),
>...
2017 Jun 08
0
regular expression help
...t; regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)
>
> parties_present_start_2=
> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)
>
>> parties_present_start_1
> [1] 138
> attr(,"match.length")
> [1] 123
> attr(,"useBytes")
> [1] TRUE
>> parties_present_start_2
> [1] 20
> attr(,"match.length")
> [1] 949
> attr(,"useBytes")
> [1] TRUE
>>
>
> Why do I see the correct result only in the first case?
>
> Best Regards,
> Ashim
>
In Perl, '.'...
2016 Sep 21
2
error handling in strcapture
...the
pattern is compatible with the prototype or it isn't, it does not depend
on the text input. E.g.,
> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280"))
[[1]]
[1] 1 1 1 0
attr(,"match.length")
[1] 6 6 6 0
attr(,"useBytes")
[1] TRUE
[[2]]
[1] 1 1 0 1
attr(,"match.length")
[1] 2 2 0 2
attr(,"useBytes")
[1] TRUE
[[3]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE
Second, an error message like 'some lines were bad' is not very helpful.
Should it p...
2018 Feb 15
2
writeLines argument useBytes = TRUE still making conversions
...mean "attempt to
> translate strings to the native encoding".
If all that is true I think ?file needs some attention. I've read it
several times now and I just don't see how it can be interpreted as
you've described it.
Best,
Ista
>
> Also note that writeLines(..., useBytes = FALSE) will explicitly
> translate to the current encoding before sending bytes to the
> requested connection. In other words, there are two locations where
> translation might occur in your example:
>
> 1) In the call to writeLines(),
> 2) When characters are passed to th...
2018 Feb 17
0
writeLines argument useBytes = TRUE still making conversions
...to the native encoding".
>
> If all that is true I think ?file needs some attention. I've read it
> several times now and I just don't see how it can be interpreted as
> you've described it.
>
> Best,
> Ista
>
>>
>> Also note that writeLines(..., useBytes = FALSE) will explicitly
>> translate to the current encoding before sending bytes to the
>> requested connection. In other words, there are two locations where
>> translation might occur in your example:
>>
>> 1) In the call to writeLines(),
>> 2) When cha...
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
...ted function strextract in utils which is very similar to stringr::str_extract. It would be great if this function, once exported, were to include a drop argument to prevent dropping positions with no matches.?
An example solution (last option):
strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop = T) {
m <- regexec(pattern, x, perl=perl, useBytes=useBytes)
result <- regmatches(x, m)
if(isTRUE(drop)){
unlist(result)
} else if(isFALSE(drop)) {
unlist({result[lengths(result)==0] <- NA_character_; result})
} else {
stop("Invalid argument for `drop`")...
2017 Sep 12
3
Load R data files
...ot;)
datahs0csv <- read.table("hs0.csv", header=T, sep=",")
attach(datahs0csv)
detach(datahs0csv)
rm(list=ls())
Then I tried to reload the data, but I got this error message. I am not
sure what was wrong.
*> load("datahs0csv.rda")*
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'datahs0csv.rda', probable reason 'No such
file or directory'
Any help will be appreciated.
with thanks
abou
______________________
AbouEl-Ma...
2016 Sep 21
2
error handling in strcapture
...end
> > on the text input. E.g.,
> >
> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280"))
> > [[1]]
> > [1] 1 1 1 0
> > attr(,"match.length")
> > [1] 6 6 6 0
> > attr(,"useBytes")
> > [1] TRUE
> >
> > [[2]]
> > [1] 1 1 0 1
> > attr(,"match.length")
> > [1] 2 2 0 2
> > attr(,"useBytes")
> > [1] TRUE
> >
> > [[3]]
> > [1] -1
> > attr(,"match.length")
> > [1] -1
&g...
2018 Feb 17
2
readLines interaction with gsub different in R-dev
...UE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion.
However, the following code runs differently:
tempf <- tempfile()
writeLines(enc2utf8("author: Am?lie"), con = tempf, useBytes = TRUE)
entry <- readLines(tempf, encoding = "UTF-8")
gsub("(\\w)", "\\U\\1", entry, perl = TRUE)
"AUTHOR: AM?LIE" # R-3.4.3
"A" # R-dev
Best,
Hugh Parsonage.
2018 Feb 17
2
readLines interaction with gsub different in R-dev
...ry different to 'A',
R> gsub("^(\\w+?): (\\w)", "\\U\\1\\E: \\2", entry, perl = TRUE)
"AUTHOR" # Where did everything after the first group go?
I should note the following example too:
R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE, useBytes = TRUE)
[1] "AUTHOR: AM??LIE" # latin1 encoding
A call to `readLines` (possibly `scan()` and `read.table` and friends)
is essential.
On 18 February 2018 at 02:15, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> On 17 February 2018 at 21:10, Hugh Parsonage wrote:
> |...
2017 Apr 04
2
Bug report: POSIX regular expression doesn't match for somewhat higher values of upper bound
Dear Sirs,
while
> regexpr('(.{1,2})\\1', 'foo')
[1] 2
attr(,"match.length")
[1] 2
attr(,"useBytes")
[1] TRUE
yields the correct match, an incremented upper bound in
> regexpr('(.{1,3})\\1', 'foo')
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE
incorrectly yields no match.
R versions tested:
2.11.1 on i486-pc-linux-gnu
2.15.1 on x86...