Ivan Krylov
2025-Dec-21 19:53 UTC
[Rd] Vector underflow [-1] in sort(method="radix", na.last=NA)
Hello R-devel,
Some inputs cause sort(method="radix") to try to read vectors at index
-1, which is caught for character vectors on some builds that use clang
-fsanitize=address since r89198:
podman run --rm -it \
registry.gitlab.com/rdatatable/dockerfiles/r-devel-clang-san \
R -q -s -e "order(NA_character_, 'c', method = 'radix',
na.last = NA)"
# Error in order(NA_character_, "c", method = "radix",
na.last = NA) :
# attempt access index -1/1 in STRING_ELT
Since savetl_end() did not run, some CHARSXPs retain their altered
TRUELENGTHs. The R session is then likely to crash when it tries to
read a negative-numbered hash bucket (usually during install() while
lazy-loading bytecode for another function call, e.g., when wrapping
the order() call in try()).
This seems to be a matter of catching elements already sorted as NA on
a previous pass:
Index: src/main/radixsort.c
==================================================================---
src/main/radixsort.c (revision 89211)
+++ src/main/radixsort.c (working copy)
@@ -1766,7 +1766,9 @@
// this edge case had to be taken care of
// here.. (see the bottom of this file for
// more explanation)
- switch (TYPEOF(x)) {
+ if (o[i] == 0) { // already sorted as NA
+ isSorted = false;
+ } else switch (TYPEOF(x)) {
case INTSXP:
if (INTEGER(x)[o[i] - 1] == NA_INTEGER) {
isSorted = false;
I don't entirely understand what causes src/main/radixsort.c to call
the non-inlined version of STRING_ELT in some cases.
--
Best regards,
Ivan
iuke-tier@ey m@iii@g oii uiow@@edu
2025-Dec-22 21:08 UTC
[Rd] [External] Vector underflow [-1] in sort(method="radix", na.last=NA)
On Sun, 21 Dec 2025, Ivan Krylov via R-devel wrote:> Hello R-devel, > > Some inputs cause sort(method="radix") to try to read vectors at index > -1, which is caught for character vectors on some builds that use clang > -fsanitize=address since r89198: > > podman run --rm -it \ > registry.gitlab.com/rdatatable/dockerfiles/r-devel-clang-san \ > R -q -s -e "order(NA_character_, 'c', method = 'radix', na.last = NA)" > # Error in order(NA_character_, "c", method = "radix", na.last = NA) : > # attempt access index -1/1 in STRING_ELTWith a build configured with --enable-strict-barrier most base calls will use the non-inlined version, so for my setup luke at MacBook-Air-102 build% ../barrier/bin/R -q -s -e "order(NA_character_, 'c', method = 'radix', na.last = NA)" Error in order(NA_character_, "c", method = "radix", na.last = NA) : attempt access index -1/1 in STRING_ELT Execution halted> > Since savetl_end() did not run, some CHARSXPs retain their altered > TRUELENGTHs. The R session is then likely to crash when it tries to > read a negative-numbered hash bucket (usually during install() while > lazy-loading bytecode for another function call, e.g., when wrapping > the order() call in try()). > > This seems to be a matter of catching elements already sorted as NA on > a previous pass: > > Index: src/main/radixsort.c > ==================================================================> --- src/main/radixsort.c (revision 89211) > +++ src/main/radixsort.c (working copy) > @@ -1766,7 +1766,9 @@ > // this edge case had to be taken care of > // here.. (see the bottom of this file for > // more explanation) > - switch (TYPEOF(x)) { > + if (o[i] == 0) { // already sorted as NA > + isSorted = false; > + } else switch (TYPEOF(x)) { > case INTSXP: > if (INTEGER(x)[o[i] - 1] == NA_INTEGER) { > isSorted = false; > > I don't entirely understand what causes src/main/radixsort.c to call > the non-inlined version of STRING_ELT in some cases.`inline` is only a hint to the compiler; some compilers ignore the hint more often than others. This code was originally contributed by data.table. I believe Michael Lawrence handled the integration at the time. There were a number of issues like this early on that were resolved on the R side and I believe contributed back to data.table. If you have the energy it might be good to compare the two now and see if there are things that should be ported from one to the other. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu