Displaying 6 results from an estimated 6 matches for "ce_bytes".
2023 Jan 31
1
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
Can we use the "bytes" encoding for such environment variables invalid
in the current locale? The following patch preserves CE_NATIVE for
strings valid in the current UTF-8 or multibyte locale (or
non-multibyte strings) but sets CE_BYTES for those that are invalid:
Index: src/main/sysutils.c
===================================================================
--- src/main/sysutils.c (revision 83731)
+++ src/main/sysutils.c (working copy)
@@ -393,8 +393,16 @@
char **e;
for (i = 0, e = environ; *e != NULL; i++, e++);
PROTECT(an...
2023 Jan 31
1
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
On 1/31/23 09:48, Ivan Krylov wrote:
> Can we use the "bytes" encoding for such environment variables invalid
> in the current locale? The following patch preserves CE_NATIVE for
> strings valid in the current UTF-8 or multibyte locale (or
> non-multibyte strings) but sets CE_BYTES for those that are invalid:
>
> Index: src/main/sysutils.c
> ===================================================================
> --- src/main/sysutils.c (revision 83731)
> +++ src/main/sysutils.c (working copy)
> @@ -393,8 +393,16 @@
> char **e;
> for (i = 0, e = env...
2018 Mar 29
2
Possible `substr` bug in UTF-8 Corner Case
...p; str < end; i++) {
????????int used = utf8clen(*str);
????????if (i < sa - 1) { str += used; continue; }
-????????for (j = 0; j < used; j++) *buf++ = *str++;
+????????for (j = 0; j < used && str < end; j++) *buf++ = *str++;
????}
???? } else if (ienc == CE_LATIN1 || ienc == CE_BYTES) {
????for (str += (sa - 1), i = sa; i <= so; i++) *buf++ = *str++;
The change above removed the valgrind error for me.? I re-built R with the change and ran "make check" which seemed to work fine. I also ran some simple checks on UTF-8 strings and things seem to work okay.
I have ve...
2023 Jan 30
2
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
/Hello.
SUMMARY:
$ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv()"
Error in substring(x, m + 1L) : invalid multibyte string at '<ff>'
$ BOOM=$'\xFF' LC_ALL=en_US.UTF-8 Rscript --vanilla -e "Sys.getenv('BOOM')"
[1] "\xff"
BACKGROUND:
I launch R through an Son of Grid Engine (SGE) scheduler, where the R
2023 Jan 31
2
Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF
...v wrote:
>> Can we use the "bytes" encoding for such environment variables invalid
>> in the current locale? The following patch preserves CE_NATIVE for
>> strings valid in the current UTF-8 or multibyte locale (or
>> non-multibyte strings) but sets CE_BYTES for those that are invalid:
>>
>> Index: src/main/sysutils.c
>> ===================================================================
>> --- src/main/sysutils.c (revision 83731)
.....
>>
>> Here are the potential problems with this approac...
2018 Mar 29
0
Possible `substr` bug in UTF-8 Corner Case
...????int used = utf8clen(*str);
> ????????if (i < sa - 1) { str += used; continue; }
> -????????for (j = 0; j < used; j++) *buf++ = *str++;
> +????????for (j = 0; j < used && str < end; j++) *buf++ = *str++;
> ????}
> ???? } else if (ienc == CE_LATIN1 || ienc == CE_BYTES) {
> ????for (str += (sa - 1), i = sa; i <= so; i++) *buf++ = *str++;
>
> The change above removed the valgrind error for me.? I re-built R with the change and ran "make check" which seemed to work fine. I also ran some simple checks on UTF-8 strings and things seem to work o...