Hello! Could you please explain why the first 5 lines work but the last 2 lines don't? Thank you! by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) by(data = iris[myvars], INDICES = iris["Species"], FUN = var) by(data = iris[myvars], INDICES = iris["Species"], FUN = max) by(data = iris[myvars], INDICES = iris["Species"], FUN = min) by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) -- Dimitri Liakhovitski
Hi Dimitri, I changed this into a reproducible example (we don't know what myvars is). Assuming length(myvars) > 1, I'm not convinced that your first five lines "work" either: what do you expect? I get:> by(data = iris[, -5], INDICES = iris["Species"], FUN = min)Species: setosa [1] 0.1 ------------------------------------------------------------------ Species: versicolor [1] 1 ------------------------------------------------------------------ Species: virginica [1] 1.4 But was expecting:> aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=min)Species Sepal.Length Sepal.Width Petal.Length Petal.Width 1 setosa 4.3 2.3 1.0 0.1 2 versicolor 4.9 2.0 3.0 1.0 3 virginica 4.9 2.2 4.5 1.4 aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=sd) aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=mean) provide the answers I would expect. If you want clearer advice, you need to provide an actually reproducible example, and tell us more about what you expect to get. Sarah On Tue, Dec 8, 2015 at 5:30 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hello! > Could you please explain why the first 5 lines work but the last 2 lines don't? > Thank you! > > by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) > by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) > by(data = iris[myvars], INDICES = iris["Species"], FUN = var) > by(data = iris[myvars], INDICES = iris["Species"], FUN = max) > by(data = iris[myvars], INDICES = iris["Species"], FUN = min) > > by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) > by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) > > -- > Dimitri Liakhovitski >
Sorry, I omitted the first line: myvars <- c("Sepal.Length", "Sepal.Width") by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) by(data = iris[myvars], INDICES = iris["Species"], FUN = var) by(data = iris[myvars], INDICES = iris["Species"], FUN = max) by(data = iris[myvars], INDICES = iris["Species"], FUN = min) by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) The first lines are doing what I expected them to do: for each level of the factor "Species" they gave me a summary, a sum, a variance, a max, a min for each of the 2 variables in question (myvars). I expected by to generate the sd and the mean for the 2 variables in question for each level of "Species". On Tue, Dec 8, 2015 at 5:50 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> Hi Dimitri, > > I changed this into a reproducible example (we don't know what myvars > is). Assuming length(myvars) > 1, I'm not convinced that your first > five lines "work" either: what do you expect? > > I get: > >> by(data = iris[, -5], INDICES = iris["Species"], FUN = min) > Species: setosa > [1] 0.1 > ------------------------------------------------------------------ > Species: versicolor > [1] 1 > ------------------------------------------------------------------ > Species: virginica > [1] 1.4 > > But was expecting: > >> aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=min) > Species Sepal.Length Sepal.Width Petal.Length Petal.Width > 1 setosa 4.3 2.3 1.0 0.1 > 2 versicolor 4.9 2.0 3.0 1.0 > 3 virginica 4.9 2.2 4.5 1.4 > > > > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=sd) > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=mean) > > provide the answers I would expect. If you want clearer advice, you > need to provide an actually reproducible example, and tell us more > about what you expect to get. > > Sarah > > > On Tue, Dec 8, 2015 at 5:30 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Hello! >> Could you please explain why the first 5 lines work but the last 2 lines don't? >> Thank you! >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = var) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = max) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = min) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) >> >> -- >> Dimitri Liakhovitski >>-- Dimitri Liakhovitski
Because you are using by() incorrectly. "A data frame is split by row into **data frames** subsetted by the values of one or more factors, and function FUN is applied to each subset in turn." So your FUN is applied to a subset of the data frame (which is also a list). Note that sum, min, and max have "..." as their initial arguments and so use all the columns in the data frame of each subset for .... var() takes the covariance matrix of the several columns and summary.data.frame summarizes each column. mean() and sd() must be fed a numeric vector as their first argument, which a data frame is not -- ergo the error. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Dec 8, 2015 at 2:30 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hello! > Could you please explain why the first 5 lines work but the last 2 lines don't? > Thank you! > > by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) > by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) > by(data = iris[myvars], INDICES = iris["Species"], FUN = var) > by(data = iris[myvars], INDICES = iris["Species"], FUN = max) > by(data = iris[myvars], INDICES = iris["Species"], FUN = min) > > by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) > by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Sarah: Note that (as I read them) aggregate() and by() work differently on data frames. aggregate() computes FUN column by column while by() feeds the whole (subset) data frame to FUN. If I am wrong about this, I would greatly appreciate being corrected. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Dec 8, 2015 at 3:09 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> Because you are using by() incorrectly. > > "A data frame is split by row into **data frames** subsetted by the > values of one or more factors, and function FUN is applied to each > subset in turn." > > So your FUN is applied to a subset of the data frame (which is also a > list). Note that sum, min, and max have "..." as their initial > arguments and so use all the columns in the data frame of each subset > for .... var() takes the covariance matrix of the several columns and > summary.data.frame summarizes each column. mean() and sd() must be fed > a numeric vector as their first argument, which a data frame is not -- > ergo the error. > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Tue, Dec 8, 2015 at 2:30 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Hello! >> Could you please explain why the first 5 lines work but the last 2 lines don't? >> Thank you! >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = var) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = max) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = min) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.