Karolis Koncevičius
2023-May-03 07:36 UTC
[Rd] Inquiry about the behaviour of subsetting and names in matrices
Hello, I have stumbled upon a few cases where the behaviour of naming and subsetting in matrices seems unintuitive. All those look related so wanted to put everything in one message. 1. Why row/col selection by names with NAs is not allowed? x <- setNames(1:10, letters[1:10]) X <- matrix(x, nrow=2, dimnames = list(letters[1:2], LETTERS[1:5])) x[c(1, NA, 3)] # vector: works and adds "NA" x[c("a", NA, "c")] # vector: works and adds "NA" X[,c(1, NA, 3)] # works and selects "NA" column X[,c("A", NA, "C")] # <error> 2. Should setting names() for a matrix be allowed? names(X) <- paste0("e", 1:length(X)) X["e4"] # works # but any operation on a matrix drops the names X <- X[,-1] # all names are gone X["e4"] # <error> Maybe names() should not be allowed on a matrix? 3. Should selection of non-existent dimension names really be an error? x[22] # works on a vector - gives "NA" X[,22] # <error> A potential useful use-case is matching a smaller matrix to a larger one: A <- matrix(rnorm(10), nrow=2, dimnames = list(c("a","c"))) B <- matrix(rnorm(20), nrow=4, dimnames = list(c("a", "b", "c", "d"))) # matching larger matrix to the smaller one <works> B[rownames(A),] # matching smaller matrix to the larger one <error> A[rownames(B),] These also doesn't seem to be documented in '[', 'names', 'rownames?. Interested if there specific reasons for this behaviour, or could these potentially be adjusted? Kind regards, Karolis K.
GILLIBERT, Andre
2023-May-03 08:15 UTC
[Rd] Inquiry about the behaviour of subsetting and names in matrices
Karolis wrote:> Hello,> I have stumbled upon a few cases where the behaviour of naming and subsetting in matrices seems unintuitive. > All those look related so wanted to put everything in one message.> 1. Why row/col selection by names with NAs is not allowed?> x <- setNames(1:10, letters[1:10]) > X <- matrix(x, nrow=2, dimnames = list(letters[1:2], LETTERS[1:5]))> x[c(1, NA, 3)] # vector: works and adds "NA" > x[c("a", NA, "c")] # vector: works and adds "NA" > X[,c(1, NA, 3)] # works and selects "NA" column > X[,c("A", NA, "C")] # <error>I would state the question the other way : why are NAs integer indices allowed? In my experience, they are sometimes useful but they often delay the detection of bugs. However, due to backward compatibility, this feature cannot be removed. Adding this feature to character indices would worsen the problem. I see another reason to keep the behavior as is currently : character indices are most often used with column names in contexts were they are unlikely to be NAs except as a consequence of a bug. In other words, I fear that the valid-use-case/bug ratio would be quite poor with this feature.> 2. Should setting names() for a matrix be allowed? > > names(X) <- paste0("e", 1:length(X)) > X["e4"] # works > > # but any operation on a matrix drops the names > X <- X[,-1] # all names are gone > X["e4"] # <error> > > Maybe names() should not be allowed on a matrix?Setting names() on a matrix is a rarely used feature that has practically no positive and no negative consequences. I see no incentive to change the behavior and break existing code.> 3. Should selection of non-existent dimension names really be an error? > > x[22] # works on a vector - gives "NA" > X[,22] # <error>This is very often a bug on vectors and should not have been allowed on vectors in the first place... But for backwards compatibility, it is hard to remove. Adding this unsafe feature to matrices is a poor idea in my opinion.> A potential useful use-case is matching a smaller matrix to a larger one:This is a valid use-case, but in my opinion, it adds more problems than it solves.> These also doesn't seem to be documented in '[', 'names', 'rownames?.Indeed, the documentation of '[' seems to be unclear on indices out of range. It can be improved.> Interested if there specific reasons for this behaviour, or could these potentially be adjusted?In my opinion adding these features would improve the consistency of R but would add more sources of bugs in an already unsafe language. Sincerely Andr? GILLIBERT
Apparently Analagous Threads
- Inquiry about the behaviour of subsetting and names in matrices
- Inquiry about the behaviour of subsetting and names in matrices
- R Language Definition: Subsetting matrices with negative indices is *not* an error
- R Language Definition: Subsetting matrices with negative indices is *not* an error
- R Language Definition: Subsetting matrices with negative indices is *not* an error