Displaying 20 results from an estimated 300 matches similar to: "Errors on Windows with grep(fixed=TRUE) on UTF-8 strings"
2015 Mar 04
0
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
After a bit more investigation, I think I've found the cause of the bug,
and I have a patch.
This bug happens with grep(), when:
* Running on Windows.
* The search uses fixed=TRUE.
* The search pattern is a single byte.
* The current locale has a multibyte encoding.
=======================
Here's an example that demonstrates the bug:
# First, create a 3-byte UTF-8 character
y <-
2007 Jun 24
2
problem gsub in the locale of CP932 and SJIS (PR#9751)
Full_Name: Ei-ji Nakama
Version: R-2.5.0
OS: any
Submission from: (NULL) (219.117.236.5)
problem by operation of gsub in the locale of CP932 and SJIS.
The inconvenient character code which used 0x5c after the first byte.
--- R-2.5.0.orig/src/main/character.c 2007-04-03 11:05:05.000000000 +0900
+++ R-2.5.0/src/main/character.c 2007-06-24 22:31:06.000000000 +0900
@@ -986,6 +986,17 @@
2009 Apr 09
3
type.convert (PR#13646)
Full_Name: Stefan Raberger
Version: 2.8.1
OS: Windows XP
Submission from: (NULL) (213.185.163.242)
Hi there,
I recently noticed some strange behaviour of the command "type.convert",
depending on the startup mode used. But there also seems to be different
behaviour on different PCs (all running the same OS and the same version of R).
On PC1:
When I start R in SDI mode (RGui --no-save
2013 May 01
1
Windows, format.POSIXct and character encodings
Hi all,
In what encoding does format.POSIXct return its output? It doesn't
seem to be utf-8:
Sys.setlocale("LC_ALL", "Japanese_Japan.932")
times <- c("1970-01-01 01:00:00 UTC", "1970-02-02 22:00:00 UTC")
ampm <- format(as.POSIXct(times), format = "%p")
x <- gsub(">", "*", paste(ampm, collapse =
2011 Sep 29
3
grep and PCRE fun
Hello,
I think I've found a bug in the C function do_grep located in
src/main/grep.c. It seems to affect both the latest revisions of
R-2-13-branch and trunk when compiling R without optimizations and
with it's own version of pcre located in src/extra, at least on ubuntu
10.04.
According to the pcre_exec API (I presume the later versions), the
ovecsize argument must be a multiple of 3 ,
2011 Aug 04
1
slightly speeding up readChar()
Hi,
I was trying to have R read files faster with readChar(). That was before I noticed that readChar() is not that bad! In any case, below I suggest a few simple changes that will make readChar slightly faster.
I followed readChar(useBytes=T), and tried to identify all O(N) operations, where N is the size of the file. The assumption is that for LARGE files we want to avoid any O(N) operations,
2005 Jul 20
1
(PR#8017) build of REventLoop package crashes with 2.1 due
In what way is this a bug in R? It looks like a bug in the package, and
as Defn.h is not part of R's API any packge using it is `at risk' (and
cannot be installed in a binary-only installation, or even an installed
version of R).
In particular, Defn,.h depends on config.h, and it seems you installed a
binary version of R and used separate sources. I would suggest building
R from
2017 Mar 19
2
[PATCH] Improve utf8clen and remove utf8_table4
Given a char `c' which should be the start byte of a utf8 character,
the utf8clen function returns the byte length of the utf8 character.
Before this patch, the utf8clen function would return either:
* 1 if `c' was an ascii character or a utf8 continuation byte
* An int in the range [2, 6] indicating the byte length of the utf8
character
With this patch, the utf8clen function
2023 Jan 31
1
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
Can we use the "bytes" encoding for such environment variables invalid
in the current locale? The following patch preserves CE_NATIVE for
strings valid in the current UTF-8 or multibyte locale (or
non-multibyte strings) but sets CE_BYTES for those that are invalid:
Index: src/main/sysutils.c
===================================================================
--- src/main/sysutils.c
2005 Jul 19
0
build of REventLoop package crashes with 2.1 due tosyntax error in Defn.h (PR#8017)
Full_Name: Richard Boyce
Version: 2.1.-1
OS: Debian testing/unstable
Submission from: (NULL) (128.95.123.29)
While building a custom package using a modified version of Duncan's REventLoop
with R version 2.1 (Debian package r-base, r-base-dev) and R source from apt-get
source 2.1.1 I get the following error:
$ R CMD build vjREventLoop
* checking for file
2008 Mar 17
1
Inconsistency in gsub in R.2.6.2 (PR#10978)
Hi,
May this be an oversight?
R version 2.6.2 Patched (2008-03-13 r44783)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
...
> x <- "ab?"
> Encoding(x)
[1] "latin1"
> Encoding(gsub("?","", x))
[1] "unknown"
> Encoding(gsub("?","", x, perl = TRUE))
[1] "latin1"
2018 Mar 29
2
Possible `substr` bug in UTF-8 Corner Case
I think there is a memory bug in `substr` that is triggered by a UTF-8 corner case: an incomplete UTF-8 byte sequence at the end of a string.? With a valgrind level 2 instrumented build of R-devel I get:
> string <- "abc\xEE"??? # \xEE indicates the start of a 3 byte UTF-8 sequence
> Encoding(string) <- "UTF-8"
> substr(string, 1, 10)
==15375== Invalid read of
2023 Jan 30
2
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
/Hello.
SUMMARY:
$ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv()"
Error in substring(x, m + 1L) : invalid multibyte string at '<ff>'
$ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv('BOOM')"
[1] "\xff"
BACKGROUND:
I launch R through an Son of Grid Engine (SGE) scheduler, where the R
2009 Mar 18
1
sprintf("%d", integer(0)) aborts
In R's sprintf() if any of the arguments has length 0
the function aborts. E.g.,
> sprintf("%d", integer(0))
Error in sprintf("%d", integer(0)) : zero-length argument
> sprintf(character(), integer(0))
Error in sprintf(character(), integer(0)) :
'fmt' is not a non-empty character vector
This comes up in code like
x[nchar(x)==0] <-
2016 Jan 19
6
FWD: [patch] scp + UTF-8
Hi,
Martijn sent the following patch to me in private and agreed that i post
it here.
In any other program in OpenBSD base, i'd probably agree with the
basic approach. Regarding OpenSSH, however, i worry whether wcwidth(3)
can be used. While wcwidth(3) is POSIX, it is not ISO C. Does
OpenSSH target platforms that don't provide wcwidth(3)? If so,
do you think the problem can be solved
2023 Jan 31
1
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
On 1/31/23 09:48, Ivan Krylov wrote:
> Can we use the "bytes" encoding for such environment variables invalid
> in the current locale? The following patch preserves CE_NATIVE for
> strings valid in the current UTF-8 or multibyte locale (or
> non-multibyte strings) but sets CE_BYTES for those that are invalid:
>
> Index: src/main/sysutils.c
>
2006 Oct 17
1
crush in edit()
Dear all,
I am new to R system. When I tried to edit data read from a csv file, R
system crushed, I got an error message as follows:
> edit(data)
*** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated
======= Backtrace: =========
/lib/libc.so.6(__chk_fail+0x41)[0x49d020b1]
/lib/libc.so.6[0x49d034a2]
/usr/lib/R/modules//R_X11.so[0x33ed7a]
/usr/lib/R/modules//R_X11.so[0x34050d]
2006 Oct 17
1
crush in edit()
Dear all,
I am new to R system. When I tried to edit data read from a csv file, R
system crushed, I got an error message as follows:
> edit(data)
*** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated
======= Backtrace: =========
/lib/libc.so.6(__chk_fail+0x41)[0x49d020b1]
/lib/libc.so.6[0x49d034a2]
/usr/lib/R/modules//R_X11.so[0x33ed7a]
/usr/lib/R/modules//R_X11.so[0x34050d]
2016 Jan 27
2
rstan warning messages
Confirmed that gcc-gfortran is installed
Package gcc-gfortran-4.4.7-16.el6.x86_64 already installed and latest version
What could I check next?
I do not have the following installed and will get that done and tested again.
libcurl-devel
libidn-devel
Thanks,
Larry
-----Original Message-----
From: Tom Callaway [mailto:tcallawa at redhat.com]
Sent: Wednesday, January 27, 2016
2023 Jan 31
2
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
>>>>> Tomas Kalibera
>>>>> on Tue, 31 Jan 2023 10:53:21 +0100 writes:
> On 1/31/23 09:48, Ivan Krylov wrote:
>> Can we use the "bytes" encoding for such environment variables invalid
>> in the current locale? The following patch preserves CE_NATIVE for
>> strings valid in the current UTF-8 or multibyte locale (or