I've played a bit with the problem that Jeff Laake reported on R-help: # create data: jl <- data.frame(x=rep(1, 3), y=tapply(1:9, rep(c('A','B','C'), each=3), sum)) jl2 <- jl jl2$y <- as.numeric(jl2$y) # do the test: > tapply(jl$y, jl$x, length) 1 3 > tapply(jl2$y, jl2$x, length) 1 3 > by(jl2$y, jl2$x, length) jl2$x: 1 [1] 3 > by(jl$y, jl$x, length) INDICES: 1 [1] 1 The result of 'by' on the 1-dimensional array is giving the correct answer to a question that I don't think many of us thought we were asking. Once upon a time 'by' gave 3 as the answer in both situations. 'by.default' used to be a one-liner, but now decides what to do based on 'length(dim(data))'. This specific problem goes away if the line: if (length(dim(data))) is replaced by: if(length(dim(data)) > 1) But I don't know what other mischief such a change would make. Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User")