similar to: Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

Displaying 20 results from an estimated 300 matches similar to: "Errors on Windows with grep(fixed=TRUE) on UTF-8 strings"

2015 Mar 04
0
Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
After a bit more investigation, I think I've found the cause of the bug, and I have a patch. This bug happens with grep(), when: * Running on Windows. * The search uses fixed=TRUE. * The search pattern is a single byte. * The current locale has a multibyte encoding. ======================= Here's an example that demonstrates the bug: # First, create a 3-byte UTF-8 character y <-
2007 Jun 24
2
problem gsub in the locale of CP932 and SJIS (PR#9751)
Full_Name: Ei-ji Nakama Version: R-2.5.0 OS: any Submission from: (NULL) (219.117.236.5) problem by operation of gsub in the locale of CP932 and SJIS. The inconvenient character code which used 0x5c after the first byte. --- R-2.5.0.orig/src/main/character.c 2007-04-03 11:05:05.000000000 +0900 +++ R-2.5.0/src/main/character.c 2007-06-24 22:31:06.000000000 +0900 @@ -986,6 +986,17 @@
2009 Apr 09
3
type.convert (PR#13646)
Full_Name: Stefan Raberger Version: 2.8.1 OS: Windows XP Submission from: (NULL) (213.185.163.242) Hi there, I recently noticed some strange behaviour of the command "type.convert", depending on the startup mode used. But there also seems to be different behaviour on different PCs (all running the same OS and the same version of R). On PC1: When I start R in SDI mode (RGui --no-save
2013 May 01
1
Windows, format.POSIXct and character encodings
Hi all, In what encoding does format.POSIXct return its output? It doesn't seem to be utf-8: Sys.setlocale("LC_ALL", "Japanese_Japan.932") times <- c("1970-01-01 01:00:00 UTC", "1970-02-02 22:00:00 UTC") ampm <- format(as.POSIXct(times), format = "%p") x <- gsub(">", "*", paste(ampm, collapse =
2011 Sep 29
3
grep and PCRE fun
Hello, I think I've found a bug in the C function do_grep located in src/main/grep.c. It seems to affect both the latest revisions of R-2-13-branch and trunk when compiling R without optimizations and with it's own version of pcre located in src/extra, at least on ubuntu 10.04. According to the pcre_exec API (I presume the later versions), the ovecsize argument must be a multiple of 3 ,
2011 Aug 04
1
slightly speeding up readChar()
Hi, I was trying to have R read files faster with readChar(). That was before I noticed that readChar() is not that bad! In any case, below I suggest a few simple changes that will make readChar slightly faster. I followed readChar(useBytes=T), and tried to identify all O(N) operations, where N is the size of the file. The assumption is that for LARGE files we want to avoid any O(N) operations,
2005 Jul 20
1
(PR#8017) build of REventLoop package crashes with 2.1 due
In what way is this a bug in R? It looks like a bug in the package, and as Defn.h is not part of R's API any packge using it is `at risk' (and cannot be installed in a binary-only installation, or even an installed version of R). In particular, Defn,.h depends on config.h, and it seems you installed a binary version of R and used separate sources. I would suggest building R from
2017 Mar 19
2
[PATCH] Improve utf8clen and remove utf8_table4
Given a char `c' which should be the start byte of a utf8 character, the utf8clen function returns the byte length of the utf8 character. Before this patch, the utf8clen function would return either: * 1 if `c' was an ascii character or a utf8 continuation byte * An int in the range [2, 6] indicating the byte length of the utf8 character With this patch, the utf8clen function
2023 Jan 31
1
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
Can we use the "bytes" encoding for such environment variables invalid in the current locale? The following patch preserves CE_NATIVE for strings valid in the current UTF-8 or multibyte locale (or non-multibyte strings) but sets CE_BYTES for those that are invalid: Index: src/main/sysutils.c =================================================================== --- src/main/sysutils.c
2005 Jul 19
0
build of REventLoop package crashes with 2.1 due tosyntax error in Defn.h (PR#8017)
Full_Name: Richard Boyce Version: 2.1.-1 OS: Debian testing/unstable Submission from: (NULL) (128.95.123.29) While building a custom package using a modified version of Duncan's REventLoop with R version 2.1 (Debian package r-base, r-base-dev) and R source from apt-get source 2.1.1 I get the following error: $ R CMD build vjREventLoop * checking for file
2008 Mar 17
1
Inconsistency in gsub in R.2.6.2 (PR#10978)
Hi, May this be an oversight? R version 2.6.2 Patched (2008-03-13 r44783) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 ... > x <- "ab?" > Encoding(x) [1] "latin1" > Encoding(gsub("?","", x)) [1] "unknown" > Encoding(gsub("?","", x, perl = TRUE)) [1] "latin1"
2018 Mar 29
2
Possible `substr` bug in UTF-8 Corner Case
I think there is a memory bug in `substr` that is triggered by a UTF-8 corner case: an incomplete UTF-8 byte sequence at the end of a string.? With a valgrind level 2 instrumented build of R-devel I get: > string <- "abc\xEE"??? # \xEE indicates the start of a 3 byte UTF-8 sequence > Encoding(string) <- "UTF-8" > substr(string, 1, 10) ==15375== Invalid read of
2023 Jan 30
2
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
/Hello. SUMMARY: $ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv()" Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' $ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv('BOOM')" [1] "\xff" BACKGROUND: I launch R through an Son of Grid Engine (SGE) scheduler, where the R
2009 Mar 18
1
sprintf("%d", integer(0)) aborts
In R's sprintf() if any of the arguments has length 0 the function aborts. E.g., > sprintf("%d", integer(0)) Error in sprintf("%d", integer(0)) : zero-length argument > sprintf(character(), integer(0)) Error in sprintf(character(), integer(0)) : 'fmt' is not a non-empty character vector This comes up in code like x[nchar(x)==0] <-
2016 Jan 19
6
FWD: [patch] scp + UTF-8
Hi, Martijn sent the following patch to me in private and agreed that i post it here. In any other program in OpenBSD base, i'd probably agree with the basic approach. Regarding OpenSSH, however, i worry whether wcwidth(3) can be used. While wcwidth(3) is POSIX, it is not ISO C. Does OpenSSH target platforms that don't provide wcwidth(3)? If so, do you think the problem can be solved
2023 Jan 31
1
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
On 1/31/23 09:48, Ivan Krylov wrote: > Can we use the "bytes" encoding for such environment variables invalid > in the current locale? The following patch preserves CE_NATIVE for > strings valid in the current UTF-8 or multibyte locale (or > non-multibyte strings) but sets CE_BYTES for those that are invalid: > > Index: src/main/sysutils.c >
2006 Oct 17
1
crush in edit()
Dear all, I am new to R system. When I tried to edit data read from a csv file, R system crushed, I got an error message as follows: > edit(data) *** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated ======= Backtrace: ========= /lib/libc.so.6(__chk_fail+0x41)[0x49d020b1] /lib/libc.so.6[0x49d034a2] /usr/lib/R/modules//R_X11.so[0x33ed7a] /usr/lib/R/modules//R_X11.so[0x34050d]
2006 Oct 17
1
crush in edit()
Dear all, I am new to R system. When I tried to edit data read from a csv file, R system crushed, I got an error message as follows: > edit(data) *** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated ======= Backtrace: ========= /lib/libc.so.6(__chk_fail+0x41)[0x49d020b1] /lib/libc.so.6[0x49d034a2] /usr/lib/R/modules//R_X11.so[0x33ed7a] /usr/lib/R/modules//R_X11.so[0x34050d]
2016 Jan 27
2
rstan warning messages
Confirmed that gcc-gfortran is installed Package gcc-gfortran-4.4.7-16.el6.x86_64 already installed and latest version What could I check next? I do not have the following installed and will get that done and tested again. libcurl-devel libidn-devel Thanks, Larry -----Original Message----- From: Tom Callaway [mailto:tcallawa at redhat.com] Sent: Wednesday, January 27, 2016
2023 Jan 31
2
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
>>>>> Tomas Kalibera >>>>> on Tue, 31 Jan 2023 10:53:21 +0100 writes: > On 1/31/23 09:48, Ivan Krylov wrote: >> Can we use the "bytes" encoding for such environment variables invalid >> in the current locale? The following patch preserves CE_NATIVE for >> strings valid in the current UTF-8 or multibyte locale (or