Russ Hamilton
2016-Oct-10 14:46 UTC
[Rd] Bug/Inconsistency in merge() with all.x when first nonmatching column in y is matrix
I've noticed inconsistent behavior with merge() when using all.x=TRUE. After some digging I found the following test cases: 1) The snippet below doesn't work as expected, as the non-matching columns of rows in a but not b take the value from the first matching row instead of being NA: --- Snip >>> NUM<-25; a <- data.frame(id=factor(letters[1:NUM]), qq=rep(NA, NUM), rr=rep(1.0,NUM)) b <- data.frame(id=c("e","a","f","y","x")) b$mm <- as.vector(c(1,2,3.1,4.0,NA))%o%3.14 b$nn <- rep("from b", 5) merge(a,b,by="id",all.x=TRUE) <<< Snip --- 2) The modified snippet below works as expected: --- Snip >>> NUM<-25; a <- data.frame(id=factor(letters[1:NUM]), qq=rep(NA, NUM), rr=rep(1.0,NUM)) b <- data.frame(id=c("e","a","f","y","x")) b$nn <- rep("from b", 5) b$mm <- as.vector(c(1,2,3.1,4.0,NA))%o%3.14 merge(a,b,by="id",all.x=TRUE) <<< Snip --- In src/library/base/R/merge.R:154, I see the following: --- Snip >>> for(i in seq_along(y)) { ## do it this way to invoke methods for e.g. factor if(is.matrix(y[[1]])) y[[1]][zap, ] <- NA else is.na(y[[i]]) <- zap } <<< Snip --- Changing the '1's in the if statement to 'i's fixes this issue for me, i.e.: --- Snip >>> for(i in seq_along(y)) { ## do it this way to invoke methods for e.g. factor if(is.matrix(y[[i]])) y[[i]][zap, ] <- NA else is.na(y[[i]]) <- zap } <<< Snip --- I'm actually not sure if the "if statement" is even needed (the "else" case seems to handle matrices just fine). --Russ Hamilton
Seemingly Similar Threads
- Formatting Y axis.
- First logon after smbpasswd -a someuser failed
- mixed formatting of integer and numeric (e. g., by summary.default())
- Duplicate column names created by base::merge() when by.x has the same name as a column in y
- Duplicate column names created by base::merge() when by.x has the same name as a column in y