tplate@acm.org
2005-Apr-29 19:00 UTC
[Rd] handling of zero and negative indices in src/main/subscript.c:mat2indsub() (PR#7824)
This message contains a description of what looks like a bug, examples of the suspect behavior, a proposed change to the C code to change this behavior, example of behavior with the fix, and suggestions for 3 places to update the documentation to reflect the proposed behavior. It is submitted for consideration for inclusion in R. Comments are requested. Currently, the code for subscripting by matrices checks that values in the matrix are not greater than the dimensions of the array being indexed. However, it does not check for zero or negative indices, and blindly does index computation with them as though they were positive indices (including for negative indices whose absolute value exceeds the dimensions of the array being indexed). Here are examples of indexing by matrices that do not return unequivocally sensible results (in most cases): > x <- matrix(1:6,ncol=2) > dim(x) [1] 3 2 > x [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 > x[rbind(c(1,1), c(2,2))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(0,1))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(0,0))] Error: only 0's may be mixed with negative subscripts > x[rbind(c(1,1), c(2,2), c(0,2))] [1] 1 5 3 > x[rbind(c(1,1), c(2,2), c(0,3))] Error: subscript out of bounds > x[rbind(c(1,1), c(2,2), c(1,0))] Error: only 0's may be mixed with negative subscripts > x[rbind(c(1,1), c(2,2), c(2,0))] Error: only 0's may be mixed with negative subscripts > x[rbind(c(1,1), c(2,2), c(3,0))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(1,2))] [1] 1 5 4 > x[rbind(c(1,1), c(2,2), c(-1,2))] [1] 1 5 2 > x[rbind(c(1,1), c(2,2), c(-2,2))] [1] 1 5 1 > x[rbind(c(1,1), c(2,2), c(-3,2))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(-4,2))] Error: only 0's may be mixed with negative subscripts > x[rbind(c(1,1), c(2,2), c(-1,-1))] Error: subscript out of bounds > > # range checks are not applied to negative indices > x <- matrix(1:6, ncol=3) > dim(x) [1] 2 3 > x[rbind(c(1,1), c(2,2), c(-3,3))] [1] 1 4 1 > x[rbind(c(1,1), c(2,2), c(-4,3))] [1] 1 4 > > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 1.0 year 2005 month 04 day 18 language R > The followed version of mat2indsub() (to replace the one in src/main/subscript.c) allows zero indices and explicitly disallows all negative indices: /* Special Matrix Subscripting: Handles the case x[i] where */ /* x is an n-way array and i is a matrix with n columns. */ /* This code returns a vector containing the integer subscripts */ /* to be extracted when x is regarded as unravelled. */ /* Negative indices are not allowed. */ /* A zero anywhere in a row will cause a zero in the same */ /* position in the result. */ SEXP mat2indsub(SEXP dims, SEXP s) { int tdim, j, i, k, nrs = nrows(s); SEXP rvec; PROTECT(rvec = allocVector(INTSXP, nrs)); s = coerceVector(s, INTSXP); setIVector(INTEGER(rvec), nrs, 0); for (i = 0; i < nrs; i++) { tdim = 1; /* compute 0-based subscripts for a row (0 in the input */ /* gets -1 in the output here) */ for (j = 0; j < LENGTH(dims); j++) { k = INTEGER(s)[i + j * nrs]; if(k == NA_INTEGER) { INTEGER(rvec)[i] = NA_INTEGER; break; } if(k < 0) error(_("cannot have negative values in matrices used as subscripts")); if(k == 0) { INTEGER(rvec)[i] = -1; break; } if (k > INTEGER(dims)[j]) error(_("subscript out of bounds")); INTEGER(rvec)[i] += (k - 1) * tdim; tdim *= INTEGER(dims)[j]; } /* transform to 1 based subscripting (0 in the input */ /* gets 0 in the output here) */ if(INTEGER(rvec)[i] != NA_INTEGER) INTEGER(rvec)[i]++; } UNPROTECT(1); return (rvec); } With this version, the above commands (+ a couple more) produce the following output (with commands that fail suitably wrapped in try() so that they can be included in a test files.) > x <- matrix(1:6,ncol=2) > dim(x) [1] 3 2 > x [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 > x[rbind(c(1,1), c(2,2))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(0,1))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(0,0))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(0,2))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(0,3))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(1,0))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(2,0))] [1] 1 5 > x[rbind(c(1,1), c(2,2), c(3,0))] [1] 1 5 > x[rbind(c(1,0), c(0,2), c(3,0))] numeric(0) > x[rbind(c(1,0), c(0,0), c(3,0))] numeric(0) > x[rbind(c(1,1), c(2,2), c(1,2))] [1] 1 5 4 > x[rbind(c(1,1), c(2,NA), c(1,2))] [1] 1 NA 4 > x[rbind(c(1,0), c(2,NA), c(1,2))] [1] NA 4 > try(x[rbind(c(1,1), c(2,2), c(-1,2))]) Error in try(x[rbind(c(1, 1), c(2, 2), c(-1, 2))]) : cannot have negative values in matrices used as subscripts > try(x[rbind(c(1,1), c(2,2), c(-2,2))]) Error in try(x[rbind(c(1, 1), c(2, 2), c(-2, 2))]) : cannot have negative values in matrices used as subscripts > try(x[rbind(c(1,1), c(2,2), c(-3,2))]) Error in try(x[rbind(c(1, 1), c(2, 2), c(-3, 2))]) : cannot have negative values in matrices used as subscripts > try(x[rbind(c(1,1), c(2,2), c(-4,2))]) Error in try(x[rbind(c(1, 1), c(2, 2), c(-4, 2))]) : cannot have negative values in matrices used as subscripts > try(x[rbind(c(1,1), c(2,2), c(-1,-1))]) Error in try(x[rbind(c(1, 1), c(2, 2), c(-1, -1))]) : cannot have negative values in matrices used as subscripts > > # verify that range checks are applied to negative indices > x <- matrix(1:6, ncol=3) > dim(x) [1] 2 3 > try(x[rbind(c(1,1), c(2,2), c(-3,3))]) Error in try(x[rbind(c(1, 1), c(2, 2), c(-3, 3))]) : cannot have negative values in matrices used as subscripts > try(x[rbind(c(1,1), c(2,2), c(-4,3))]) Error in try(x[rbind(c(1, 1), c(2, 2), c(-4, 3))]) : cannot have negative values in matrices used as subscripts > The documentation page for ?Extract could have the following sentences added to the end of the 2nd para in the description of arguments "i, j, ...": Negative indices are not allowed in matrices used as indices. NA and zero values are allowed: rows containing a zero are omitted from the result, and rows containing an NA produce an NA in the result. In "Introduction to R", the same material could be added to the end of the section "Index Arrays" (in the chapter "Arrays and matrices"). In "R Language Definition", the third paragraph of the section "Indexing matrices and arrays" (Section 3.4.2 in my copy) currently reads:> It is possible to use a matrix of integers as an index. In > this case, the number of columns of the matrix should match > the number of dimensions of the structure, and the result > will be a vector with length as the number of rows of the > matrix. The following example shows how to extract the > elements m[1, 1] and m[2, 2] in one operation.It could be changed to the following: It is possible to use a matrix of integers as an index. Negative indices are not allowed in matrices used as indices. NA and zero values are allowed: rows containing a zero are omitted from the result, and rows containing an NA produce an NA in the result. To use a matrix as an index, the number of columns of the matrix should match the number of dimensions of the structure, and the result will be a vector with length as the number of rows of the matrix (except when the matrix contains zeros). The following example shows how to extract the elements m[1, 1] and m[2, 2] in one operation. Additionally, indexing by matrix just goes ahead and uses matrix elements as vector indices in cases where the number of columns in the matrix does not match the number of dimensions in the array. E.g.: > x <- matrix(1:6,ncol=2) > x[rbind(c(1,1,1), c(2,2,2))] [1] 1 2 1 2 1 2 > Would it not be preferable behavior to stop with an error in such cases?