thr3ads.net - similar to: "Errors on Windows with grep(fixed=TRUE) on UTF-8 strings"

Displaying 20 results from an estimated 300 matches similar to: "Errors on Windows with grep(fixed=TRUE) on UTF-8 strings"

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015 Mar 04

Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

After a bit more investigation, I think I've found the cause of the bug, and I have a patch. This bug happens with grep(), when: * Running on Windows. * The search uses fixed=TRUE. * The search pattern is a single byte. * The current locale has a multibyte encoding. ======================= Here's an example that demonstrates the bug: # First, create a 3-byte UTF-8 character y <-

problem gsub in the locale of CP932 and SJIS (PR#9751)

2007 Jun 24

problem gsub in the locale of CP932 and SJIS (PR#9751)

Full_Name: Ei-ji Nakama Version: R-2.5.0 OS: any Submission from: (NULL) (219.117.236.5) problem by operation of gsub in the locale of CP932 and SJIS. The inconvenient character code which used 0x5c after the first byte. --- R-2.5.0.orig/src/main/character.c 2007-04-03 11:05:05.000000000 +0900 +++ R-2.5.0/src/main/character.c 2007-06-24 22:31:06.000000000 +0900 @@ -986,6 +986,17 @@

type.convert (PR#13646)

2009 Apr 09

type.convert (PR#13646)

Full_Name: Stefan Raberger Version: 2.8.1 OS: Windows XP Submission from: (NULL) (213.185.163.242) Hi there, I recently noticed some strange behaviour of the command "type.convert", depending on the startup mode used. But there also seems to be different behaviour on different PCs (all running the same OS and the same version of R). On PC1: When I start R in SDI mode (RGui --no-save

Windows, format.POSIXct and character encodings

2013 May 01

Windows, format.POSIXct and character encodings

Hi all, In what encoding does format.POSIXct return its output? It doesn't seem to be utf-8: Sys.setlocale("LC_ALL", "Japanese_Japan.932") times <- c("1970-01-01 01:00:00 UTC", "1970-02-02 22:00:00 UTC") ampm <- format(as.POSIXct(times), format = "%p") x <- gsub(">", "*", paste(ampm, collapse =

grep and PCRE fun

2011 Sep 29

grep and PCRE fun

Hello, I think I've found a bug in the C function do_grep located in src/main/grep.c. It seems to affect both the latest revisions of R-2-13-branch and trunk when compiling R without optimizations and with it's own version of pcre located in src/extra, at least on ubuntu 10.04. According to the pcre_exec API (I presume the later versions), the ovecsize argument must be a multiple of 3 ,

slightly speeding up readChar()

2011 Aug 04

slightly speeding up readChar()

Hi, I was trying to have R read files faster with readChar(). That was before I noticed that readChar() is not that bad! In any case, below I suggest a few simple changes that will make readChar slightly faster. I followed readChar(useBytes=T), and tried to identify all O(N) operations, where N is the size of the file. The assumption is that for LARGE files we want to avoid any O(N) operations,

(PR#8017) build of REventLoop package crashes with 2.1 due

2005 Jul 20

(PR#8017) build of REventLoop package crashes with 2.1 due

In what way is this a bug in R? It looks like a bug in the package, and as Defn.h is not part of R's API any packge using it is `at risk' (and cannot be installed in a binary-only installation, or even an installed version of R). In particular, Defn,.h depends on config.h, and it seems you installed a binary version of R and used separate sources. I would suggest building R from

[PATCH] Improve utf8clen and remove utf8_table4

2017 Mar 19

[PATCH] Improve utf8clen and remove utf8_table4

Given a char `c' which should be the start byte of a utf8 character, the utf8clen function returns the byte length of the utf8 character. Before this patch, the utf8clen function would return either: * 1 if `c' was an ascii character or a utf8 continuation byte * An int in the range [2, 6] indicating the byte length of the utf8 character With this patch, the utf8clen function

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

2023 Jan 31

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

Can we use the "bytes" encoding for such environment variables invalid in the current locale? The following patch preserves CE_NATIVE for strings valid in the current UTF-8 or multibyte locale (or non-multibyte strings) but sets CE_BYTES for those that are invalid: Index: src/main/sysutils.c =================================================================== --- src/main/sysutils.c

build of REventLoop package crashes with 2.1 due tosyntax error in Defn.h (PR#8017)

2005 Jul 19

build of REventLoop package crashes with 2.1 due tosyntax error in Defn.h (PR#8017)

Full_Name: Richard Boyce Version: 2.1.-1 OS: Debian testing/unstable Submission from: (NULL) (128.95.123.29) While building a custom package using a modified version of Duncan's REventLoop with R version 2.1 (Debian package r-base, r-base-dev) and R source from apt-get source 2.1.1 I get the following error: $ R CMD build vjREventLoop * checking for file

Inconsistency in gsub in R.2.6.2 (PR#10978)

2008 Mar 17

Inconsistency in gsub in R.2.6.2 (PR#10978)

Hi, May this be an oversight? R version 2.6.2 Patched (2008-03-13 r44783) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 ... > x <- "ab?" > Encoding(x) [1] "latin1" > Encoding(gsub("?","", x)) [1] "unknown" > Encoding(gsub("?","", x, perl = TRUE)) [1] "latin1"

Possible `substr` bug in UTF-8 Corner Case

2018 Mar 29

Possible `substr` bug in UTF-8 Corner Case

I think there is a memory bug in `substr` that is triggered by a UTF-8 corner case: an incomplete UTF-8 byte sequence at the end of a string.? With a valgrind level 2 instrumented build of R-devel I get: > string <- "abc\xEE"??? # \xEE indicates the start of a 3 byte UTF-8 sequence > Encoding(string) <- "UTF-8" > substr(string, 1, 10) ==15375== Invalid read of

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

2023 Jan 30

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

/Hello. SUMMARY: $ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv()" Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' $ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv('BOOM')" [1] "\xff" BACKGROUND: I launch R through an Son of Grid Engine (SGE) scheduler, where the R

sprintf("%d", integer(0)) aborts

2009 Mar 18

sprintf("%d", integer(0)) aborts

In R's sprintf() if any of the arguments has length 0 the function aborts. E.g., > sprintf("%d", integer(0)) Error in sprintf("%d", integer(0)) : zero-length argument > sprintf(character(), integer(0)) Error in sprintf(character(), integer(0)) : 'fmt' is not a non-empty character vector This comes up in code like x[nchar(x)==0] <-

FWD: [patch] scp + UTF-8

2016 Jan 19

FWD: [patch] scp + UTF-8

Hi, Martijn sent the following patch to me in private and agreed that i post it here. In any other program in OpenBSD base, i'd probably agree with the basic approach. Regarding OpenSSH, however, i worry whether wcwidth(3) can be used. While wcwidth(3) is POSIX, it is not ISO C. Does OpenSSH target platforms that don't provide wcwidth(3)? If so, do you think the problem can be solved

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

2023 Jan 31

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

On 1/31/23 09:48, Ivan Krylov wrote: > Can we use the "bytes" encoding for such environment variables invalid > in the current locale? The following patch preserves CE_NATIVE for > strings valid in the current UTF-8 or multibyte locale (or > non-multibyte strings) but sets CE_BYTES for those that are invalid: > > Index: src/main/sysutils.c >

crush in edit()

2006 Oct 17

crush in edit()

Dear all, I am new to R system. When I tried to edit data read from a csv file, R system crushed, I got an error message as follows: > edit(data) *** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated ======= Backtrace: ========= /lib/libc.so.6(__chk_fail+0x41)[0x49d020b1] /lib/libc.so.6[0x49d034a2] /usr/lib/R/modules//R_X11.so[0x33ed7a] /usr/lib/R/modules//R_X11.so[0x34050d]

crush in edit()

2006 Oct 17

crush in edit()

rstan warning messages

2016 Jan 27

rstan warning messages

Confirmed that gcc-gfortran is installed Package gcc-gfortran-4.4.7-16.el6.x86_64 already installed and latest version What could I check next? I do not have the following installed and will get that done and tested again. libcurl-devel libidn-devel Thanks, Larry -----Original Message----- From: Tom Callaway [mailto:tcallawa at redhat.com] Sent: Wednesday, January 27, 2016

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

2023 Jan 31

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

>>>>> Tomas Kalibera >>>>> on Tue, 31 Jan 2023 10:53:21 +0100 writes: > On 1/31/23 09:48, Ivan Krylov wrote: >> Can we use the "bytes" encoding for such environment variables invalid >> in the current locale? The following patch preserves CE_NATIVE for >> strings valid in the current UTF-8 or multibyte locale (or

similar to: Errors on Windows with grep(fixed=TRUE) on UTF-8 strings