ivar.herfindal at bio.ntnu.no
2009-Nov-25 14:30 UTC
[Rd] Possible bug in "unsplit" (PR#14084)
Dear R-bug-people I have encountered a problem with "unsplit", which I believe may be caused by a bug in the function. However, unexpericend with bug-reports I apologise if this is barely a user problem rather than a problem within R. The problem occurs if an object is split by several grouping factors with levels not occuring in the data, and using drop = TRUE. This may appear as a special and hardly relevant case, but I had to split a data frame on several factors, do some analyses on each of the subsets in the splitted object, and then unsplit it. I had to use drop = TRUE, otherwise my analyses would not run. Nevertheless, I found a fix to the unsplit, which I suggest is due to that the drop-argument not is maintained in the call to unsplit within unsplit. Description and example below. The problem was found on R version 2.9.0 and 2.10.0 on windows XP. > sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=Norwegian (Bokm?l)_Norway.1252 LC_CTYPE=Norwegian (Bokm?l)_Norway.1252 [3] LC_MONETARY=Norwegian (Bokm?l)_Norway.1252 LC_NUMERIC=C [5] LC_TIME=Norwegian (Bokm?l)_Norway.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.10.0 > ## a reproducable example: dff <- data.frame(gr1=factor(c(1,1,1,1,1,2,2,2,2,2,2), levels=c(1,2,3,4)), gr2=factor(c(1,2,1,2,1,2,1,2,1,2,3), levels=c(1,2,3,4)), yy=rnorm(11)) # note that the two groups "gr1" and "gr2" have defined levels which not occur in the data. dff2 <- split(dff, list(dff$gr1, dff$gr2), drop=TRUE) # I dont want empty objects, so I use drop=TRUE # now I want to unsplit it, and expect the following to work: dff3 <- unsplit(dff2, list(dff$gr1, dff$gr2), drop=TRUE) Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "11", "3", "11", : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ?1?, ?11?, ?3?, ?5? ### end Looking at the unsplit function, we find: > unsplit function (value, f, drop = FALSE) { len <- length(if (is.list(f)) f[[1L]] else f) if (is.data.frame(value[[1L]])) { x <- value[[1L]][rep(NA, len), , drop = FALSE] rownames(x) <- unsplit(lapply(value, rownames), f) } else x <- value[[1L]][rep(NA, len)] split(x, f, drop = drop) <- value x } <environment: namespace:base> > Note that if "value" is a data.frame, then rownames for the output x is made by the call: rownames(x) <- unsplit(lapply(value, rownames), f) This call to unsplit ignores the drop-argument, and in the example above we get from this call: > unsplit(lapply(dff2, rownames), list(dff$gr1, dff$gr2)) [1] "1" "11" "3" "11" "5" "1" "7" "3" "9" "5" "11" i.e. not unique row names for the output x. A simple fix is to add drop = drop to that argument, such that the updated unsplit (here called unsplit2) is like this: unsplit2 <- function (value, f, drop = FALSE) { len <- length(if (is.list(f)) f[[1L]] else f) if (is.data.frame(value[[1L]])) { x <- value[[1L]][rep(NA, len), , drop = FALSE] rownames(x) <- unsplit(lapply(value, rownames), f, drop=drop) # note new "drop=drop" } else x <- value[[1L]][rep(NA, len)] split(x, f, drop = drop) <- value x } This works fine in the example above, and the original levels in gr1 and gr2 (i.e. they both have four levels) are maintained in the output data frame such that it has similar attributes as the orignial dff: > dff3 <- unsplit2(dff2, list(dff$gr1, dff$gr2), drop=TRUE) > dff3 gr1 gr2 yy 1 1 1 2.13749771 2 1 2 -0.02166458 3 1 1 0.45960452 4 1 2 2.72074958 5 1 1 -0.17536995 6 2 2 -0.08909495 7 2 1 0.94260802 8 2 2 -0.09979505 9 2 1 1.22240834 10 2 2 -0.81710781 11 2 3 0.76071130 > I must admit that I have not the possiblity to check if such a quick-fix conflicts with other use of unsplit or on other types of data, but I cannot see that it should be a problem. Sincerely Ivar Herfindal -------------------------------- Centre for Conservation Biology Norwegian University for Science and Technology N-7491 Trondheim, Norway email: ivar.herfindal at bio.ntnu.no