William Dunlap
2011-Mar-21 16:16 UTC
[Rd] split(factor, shortGroupVector) gives incorrect results in R 2.12.2
When split's x argument has a class attribute and the grouping vector, f, is shorter than x then split gives the wrong result. It appears to not extend f to the length of x before doing the split. E.g., > split(factor(letters[1:3]), "Group one") # expect all 3 elements in the single group $`Group one` [1] a Levels: a b c > split(factor(letters[1:3]), c("Group one", "Group two")) # expect warning and Group one should contain "a" and "c". $`Group one` [1] a Levels: a b c $`Group two` [1] b Levels: a b c We expect the above to act like the similar cases where x is a character vector > split(letters[1:3], "Group one") $`Group one` [1] "a" "b" "c" > split(letters[1:3], c("Group one", "Group two")) $`Group one` [1] "a" "c" $`Group two` [1] "b" Warning message: In split.default(letters[1:3], c("Group one", "Group two")) : data length is not a multiple of split variable We get a similar problem for other stray classes of x > split(structure(letters[1:3],class="no sUch cLaSs"), c("Group one", "Group two")) $`Group one` [1] "a" $`Group two` [1] "b" Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
peter dalgaard
2011-Mar-21 17:13 UTC
[Rd] split(factor, shortGroupVector) gives incorrect results in R 2.12.2
On Mar 21, 2011, at 17:16 , William Dunlap wrote:>> split(factor(letters[1:3]), c("Group one", "Group two"))Yes, that's a bug (at the very least, it is against documented behavior) The strong suspicion is that ind <- .Internal(split(seq_along(f), f)) should have seq_along(x) , not f. But would that break for other reasons? (It would! Surv() objects to name one case. In general, we seem to be in trouble if "[" and length() methods are not compatible.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com