Hi R People: Several of you pointed out that using "tapply" on a data frame will work on the iris data frame. I'm still having a problem. The iris data frame has 150 rows, 5 variables. The first 4 are numeric, while the last is a factor, which has the Species names. I can use tapply for 1 variable at a time:>tapply(iris[,1],iris[,5],mean)setosa versicolor virginica 5.006 5.936 6.588>but if I try to use this for all of the first 4, I get an error:>tapply(iris[,1:4],iris[,5],mean)Error in tapply(iris[, 1:4], iris[, 5], mean) : arguments must have same length>Any ideas of what I'm doing wrong, please? Thanks, Laura Holt mailto: lauraholt_983 at hotmail.com R Version 1.9.1 Windows
Laura Holt <lauraholt_983 <at> hotmail.com> writes:> > Hi R People: > > Several of you pointed out that using "tapply" on a data frame will work on > the iris data frame. > > I'm still having a problem. > > The iris data frame has 150 rows, 5 variables. The first 4 are numeric, > while the last is a factor, which has the Species names. > > I can use tapply for 1 variable at a time: > >tapply(iris[,1],iris[,5],mean) > setosa versicolor virginica > 5.006 5.936 6.588 > > > but if I try to use this for all of the first 4, I get an error: > >tapply(iris[,1:4],iris[,5],mean) > Error in tapply(iris[, 1:4], iris[, 5], mean) : > arguments must have same lengthThis is a job for aggregate: R> data(iris) R> aggregate(iris[,1:4], list(Species = iris[,5]), mean) Species Sepal.Length Sepal.Width Petal.Length Petal.Width 1 setosa 5.006 3.428 1.462 0.246 2 versicolor 5.936 2.770 4.260 1.326 3 virginica 6.588 2.974 5.552 2.026 The by command would also work using colMeans: R> by(iris[,1:4], list(Species = iris[,5]), colMeans) Species: setosa Sepal.Length Sepal.Width Petal.Length Petal.Width 5.006 3.428 1.462 0.246 ------------------------------------------------------------ Species: versicolor Sepal.Length Sepal.Width Petal.Length Petal.Width 5.936 2.770 4.260 1.326 ------------------------------------------------------------ Species: virginica Sepal.Length Sepal.Width Petal.Length Petal.Width 6.588 2.974 5.552 2.026
On Fri, Aug 20, 2004 at 11:40:16PM -0500, Laura Holt wrote:> Hi R People: > > Several of you pointed out that using "tapply" on a data frame will work on > the iris data frame. > > I'm still having a problem. > > The iris data frame has 150 rows, 5 variables. The first 4 are numeric, > while the last is a factor, which has the Species names. > > I can use tapply for 1 variable at a time: > >tapply(iris[,1],iris[,5],mean) > setosa versicolor virginica > 5.006 5.936 6.588 > > > but if I try to use this for all of the first 4, I get an error: > >tapply(iris[,1:4],iris[,5],mean) > Error in tapply(iris[, 1:4], iris[, 5], mean) : > arguments must have same length > > > Any ideas of what I'm doing wrong, please?You are not reading the help page: Usage: tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE) Arguments: X: an atomic object, typically a vector. iris[, 1:4] is a data frame of length 4; iris[, 5] is a vector of length 150. You probably need to loop over the four first columns and apply tapply(!) four times, but I'm sure there is a smarter way. Others will tell you. G??ran -- G??ran Brostr??m tel: +46 90 786 5223 Department of Statistics fax: +46 90 786 6614 Ume?? University http://www.stat.umu.se/egna/gb/ SE-90187 Ume??, Sweden e-mail: gb at stat.umu.se