Benjamin Schwendinger
2025-Dec-27 02:54 UTC
[Rd] [External] Vector underflow [-1] in sort(method="radix", na.last=NA)
Hello! data.table diverged over 7 years ago from this code when Matt rewrote forder.> This code was originally contributed by data.table. I believe Michael > Lawrence handled the integration at the time. There were a number of > issues like this early on that were resolved on the R side and I > believe contributed back to data.table. If you have the energy it > might be good to compare the two now and see if there are things that > should be ported from one to the other.The original code had the following note which is also mentioned multiple times but not explicitly part of radixsort.c>> Note on challenges implementing `nalast=NA` so that things are under the same perspective when looked at later. To go to the point, there are four challenges in implementing `nalast=NA` with the current form of forder.c (which is excellent!): >> >> 1) Handling the i|d|csorted routines for nalast=NA: >> The current implementation of nalast=NA in *sorted routine is that if all values are NAs, then it returns -2, a different value, so that all values of o/osub can be replacedw with 0. If any values are NA, then we assume that this vector is unsorted and return 0 to be taken care of by the corresponding sort routine. Now, this may very well be questioned/improved. For ex: we could detect if the vector is sorted and also if the first value is NA, and if so, get the group size of these NAs and replace that many indices with 0. This is not yet done, but could very well be implemented later. If there are no NAs in the vector, then things are the same. >> >> 2) Handling cases where n==1 and n==2: >> The current sorting routines don't take care of n<=2, and rightly so, as they can be dealt with directly. But it's not the case for nalast=NA because, when n==2 and all are NAs, point 1 will handle it. But when n==1 and it is NA and the column is not the first column to order upon, with the current implementation, "thisgrpn" is checked for "== 1" and if so, just pushes that group size and continues.. But we've to replace it with 0 in our case. So, nalast==0 is checked for inside that if statement and if so, if the value is NA is checked for all types and if so, replaced with 0. For n==2 and "any" is NA, we've to introduce a special case for this in each sort routine to take care of and not result in an error (which tells this should be already taken care of in *sorted). >> >> 3) o[0] and newo[0] = -1: >> Since nalast already populates NAs with 0 values in o, we can't use the old value of 0 to check if it's already populated or not. Therefore this has been changed to -1. >> >> 4) default cases are untouched as much as possible: >> In order to not compromise in speed for default case (`setkey`), iinsert and dinsert are not touched at all, which means, we've to make sure that they are not run when n<200 and nalast==0, as they don't replace o[x=NA] with 0. And, 'icount', 'iradix', 'dradix' are modified to take care of nalast=NA, to my knowledge.However, case 2) does not cater for Ivans identified corner case example of order(NA_character_, "c", method = "radix", na.last = NA), which is not tied to n==1 or n==2 but rather to thisgrpn==1 and x=c(NA_character_, another_string). Hence, the missing case is indeed to cater for o[i] = 0 with Ivans 1-liner or more explicitly via --- src/main/radixsort.c @@ -1766,7 +1766,12 @@ // this edge case had to be taken care of // here.. (see the bottom of this file for // more explanation) - switch (TYPEOF(x)) { + if (o[i] == 0) { + isSorted = false; + i++; + push(1); + continue; + } switch (TYPEOF(x)) { case INTSXP: if (INTEGER(x)[o[i] - 1] == NA_INTEGER) { isSorted = false; Best, Ben [[alternative HTML version deleted]]
iuke-tier@ey m@iii@g oii uiow@@edu
2025-Dec-27 23:34 UTC
[Rd] [External] Vector underflow [-1] in sort(method="radix", na.last=NA)
Thanks; I've committed Ivan'x fix to R-devel in r89248. Best, luke On Sat, 27 Dec 2025, Benjamin Schwendinger wrote:> Hello! data.table diverged over 7 years ago from this code when Matt > rewrote forder. > >> This code was originally contributed by data.table. I believe Michael >> Lawrence handled the integration at the time. There were a number of >> issues like this early on that were resolved on the R side and I >> believe contributed back to data.table. If you have the energy it >> might be good to compare the two now and see if there are things that >> should be ported from one to the other. > > The original code had the following note which is also mentioned > multiple times but not explicitly part of radixsort.c > >>> Note on challenges implementing `nalast=NA` so that things are under the same perspective when looked at later. To go to the point, there are four challenges in implementing `nalast=NA` with the current form of forder.c (which is excellent!): >>> >>> 1) Handling the i|d|csorted routines for nalast=NA: >>> The current implementation of nalast=NA in *sorted routine is that if all values are NAs, then it returns -2, a different value, so that all values of o/osub can be replacedw with 0. If any values are NA, then we assume that this vector is unsorted and return 0 to be taken care of by the corresponding sort routine. Now, this may very well be questioned/improved. For ex: we could detect if the vector is sorted and also if the first value is NA, and if so, get the group size of these NAs and replace that many indices with 0. This is not yet done, but could very well be implemented later. If there are no NAs in the vector, then things are the same. >>> >>> 2) Handling cases where n==1 and n==2: >>> The current sorting routines don't take care of n<=2, and rightly so, as they can be dealt with directly. But it's not the case for nalast=NA because, when n==2 and all are NAs, point 1 will handle it. But when n==1 and it is NA and the column is not the first column to order upon, with the current implementation, "thisgrpn" is checked for "== 1" and if so, just pushes that group size and continues.. But we've to replace it with 0 in our case. So, nalast==0 is checked for inside that if statement and if so, if the value is NA is checked for all types and if so, replaced with 0. For n==2 and "any" is NA, we've to introduce a special case for this in each sort routine to take care of and not result in an error (which tells this should be already taken care of in *sorted). >>> >>> 3) o[0] and newo[0] = -1: >>> Since nalast already populates NAs with 0 values in o, we can't use the old value of 0 to check if it's already populated or not. Therefore this has been changed to -1. >>> >>> 4) default cases are untouched as much as possible: >>> In order to not compromise in speed for default case (`setkey`), iinsert and dinsert are not touched at all, which means, we've to make sure that they are not run when n<200 and nalast==0, as they don't replace o[x=NA] with 0. And, 'icount', 'iradix', 'dradix' are modified to take care of nalast=NA, to my knowledge. > > However, case 2) does not cater for Ivans identified corner case > example of order(NA_character_, "c", method = "radix", na.last = NA), > which is not tied to n==1 or n==2 but rather to thisgrpn==1 and > x=c(NA_character_, another_string). > > Hence, the missing case is indeed to cater for o[i] = 0 with Ivans > 1-liner or more explicitly via > > --- src/main/radixsort.c > @@ -1766,7 +1766,12 @@ > // this edge case had to be taken care of > // here.. (see the bottom of this file for > // more explanation) > - switch (TYPEOF(x)) { > + if (o[i] == 0) { > + isSorted = false; > + i++; > + push(1); > + continue; > + } switch (TYPEOF(x)) { > case INTSXP: > if (INTEGER(x)[o[i] - 1] == NA_INTEGER) { > isSorted = false; > > Best, Ben > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/