William Dunlap
2011-Mar-21 16:16 UTC
[Rd] split(factor, shortGroupVector) gives incorrect results in R 2.12.2
When split's x argument has a class attribute and the
grouping vector, f, is shorter than x then split gives
the wrong result. It appears to not extend f to the length
of x before doing the split. E.g.,
> split(factor(letters[1:3]), "Group one") # expect all 3
elements in
the single group
$`Group one`
[1] a
Levels: a b c
> split(factor(letters[1:3]), c("Group one", "Group
two")) # expect
warning and Group one should contain "a" and "c".
$`Group one`
[1] a
Levels: a b c
$`Group two`
[1] b
Levels: a b c
We expect the above to act like the similar cases where x is
a character vector
> split(letters[1:3], "Group one")
$`Group one`
[1] "a" "b" "c"
> split(letters[1:3], c("Group one", "Group two"))
$`Group one`
[1] "a" "c"
$`Group two`
[1] "b"
Warning message:
In split.default(letters[1:3], c("Group one", "Group
two")) :
data length is not a multiple of split variable
We get a similar problem for other stray classes of x
> split(structure(letters[1:3],class="no sUch cLaSs"),
c("Group one",
"Group two"))
$`Group one`
[1] "a"
$`Group two`
[1] "b"
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
peter dalgaard
2011-Mar-21 17:13 UTC
[Rd] split(factor, shortGroupVector) gives incorrect results in R 2.12.2
On Mar 21, 2011, at 17:16 , William Dunlap wrote:>> split(factor(letters[1:3]), c("Group one", "Group two"))Yes, that's a bug (at the very least, it is against documented behavior) The strong suspicion is that ind <- .Internal(split(seq_along(f), f)) should have seq_along(x) , not f. But would that break for other reasons? (It would! Surv() objects to name one case. In general, we seem to be in trouble if "[" and length() methods are not compatible.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com