Sorry, I omitted the first line: myvars <- c("Sepal.Length", "Sepal.Width") by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) by(data = iris[myvars], INDICES = iris["Species"], FUN = var) by(data = iris[myvars], INDICES = iris["Species"], FUN = max) by(data = iris[myvars], INDICES = iris["Species"], FUN = min) by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) The first lines are doing what I expected them to do: for each level of the factor "Species" they gave me a summary, a sum, a variance, a max, a min for each of the 2 variables in question (myvars). I expected by to generate the sd and the mean for the 2 variables in question for each level of "Species". On Tue, Dec 8, 2015 at 5:50 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> Hi Dimitri, > > I changed this into a reproducible example (we don't know what myvars > is). Assuming length(myvars) > 1, I'm not convinced that your first > five lines "work" either: what do you expect? > > I get: > >> by(data = iris[, -5], INDICES = iris["Species"], FUN = min) > Species: setosa > [1] 0.1 > ------------------------------------------------------------------ > Species: versicolor > [1] 1 > ------------------------------------------------------------------ > Species: virginica > [1] 1.4 > > But was expecting: > >> aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=min) > Species Sepal.Length Sepal.Width Petal.Length Petal.Width > 1 setosa 4.3 2.3 1.0 0.1 > 2 versicolor 4.9 2.0 3.0 1.0 > 3 virginica 4.9 2.2 4.5 1.4 > > > > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=sd) > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=mean) > > provide the answers I would expect. If you want clearer advice, you > need to provide an actually reproducible example, and tell us more > about what you expect to get. > > Sarah > > > On Tue, Dec 8, 2015 at 5:30 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Hello! >> Could you please explain why the first 5 lines work but the last 2 lines don't? >> Thank you! >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = var) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = max) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = min) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) >> >> -- >> Dimitri Liakhovitski >>-- Dimitri Liakhovitski
by() calls FUN with a data.frame as the argument. summary(), sum(), etc. have methods that work on data.frames but sd() and mean() do not. aggregate() calls its FUN with each column of a data.frame as the argument. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Dec 8, 2015 at 3:08 PM, Dimitri Liakhovitski < dimitri.liakhovitski at gmail.com> wrote:> Sorry, I omitted the first line: > > myvars <- c("Sepal.Length", "Sepal.Width") > by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) > by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) > by(data = iris[myvars], INDICES = iris["Species"], FUN = var) > by(data = iris[myvars], INDICES = iris["Species"], FUN = max) > by(data = iris[myvars], INDICES = iris["Species"], FUN = min) > > by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) > by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) > > The first lines are doing what I expected them to do: for each level > of the factor "Species" they gave me a summary, a sum, a variance, a > max, a min for each of the 2 variables in question (myvars). > I expected by to generate the sd and the mean for the 2 variables in > question for each level of "Species". > > On Tue, Dec 8, 2015 at 5:50 PM, Sarah Goslee <sarah.goslee at gmail.com> > wrote: > > Hi Dimitri, > > > > I changed this into a reproducible example (we don't know what myvars > > is). Assuming length(myvars) > 1, I'm not convinced that your first > > five lines "work" either: what do you expect? > > > > I get: > > > >> by(data = iris[, -5], INDICES = iris["Species"], FUN = min) > > Species: setosa > > [1] 0.1 > > ------------------------------------------------------------------ > > Species: versicolor > > [1] 1 > > ------------------------------------------------------------------ > > Species: virginica > > [1] 1.4 > > > > But was expecting: > > > >> aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=min) > > Species Sepal.Length Sepal.Width Petal.Length Petal.Width > > 1 setosa 4.3 2.3 1.0 0.1 > > 2 versicolor 4.9 2.0 3.0 1.0 > > 3 virginica 4.9 2.2 4.5 1.4 > > > > > > > > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=sd) > > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=mean) > > > > provide the answers I would expect. If you want clearer advice, you > > need to provide an actually reproducible example, and tell us more > > about what you expect to get. > > > > Sarah > > > > > > On Tue, Dec 8, 2015 at 5:30 PM, Dimitri Liakhovitski > > <dimitri.liakhovitski at gmail.com> wrote: > >> Hello! > >> Could you please explain why the first 5 lines work but the last 2 > lines don't? > >> Thank you! > >> > >> by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) > >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) > >> by(data = iris[myvars], INDICES = iris["Species"], FUN = var) > >> by(data = iris[myvars], INDICES = iris["Species"], FUN = max) > >> by(data = iris[myvars], INDICES = iris["Species"], FUN = min) > >> > >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) > >> by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) > >> > >> -- > >> Dimitri Liakhovitski > >> > > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Got it - thank you, everybody! by splits it into data frames. Lesson: use aggregate. On Tue, Dec 8, 2015 at 6:17 PM, William Dunlap <wdunlap at tibco.com> wrote:> by() calls FUN with a data.frame as the argument. summary(), sum(), etc. > have methods that work on data.frames but sd() and mean() do not. > > aggregate() calls its FUN with each column of a data.frame as the argument. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Tue, Dec 8, 2015 at 3:08 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> >> Sorry, I omitted the first line: >> >> myvars <- c("Sepal.Length", "Sepal.Width") >> by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = var) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = max) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = min) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) >> by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) >> >> The first lines are doing what I expected them to do: for each level >> of the factor "Species" they gave me a summary, a sum, a variance, a >> max, a min for each of the 2 variables in question (myvars). >> I expected by to generate the sd and the mean for the 2 variables in >> question for each level of "Species". >> >> On Tue, Dec 8, 2015 at 5:50 PM, Sarah Goslee <sarah.goslee at gmail.com> >> wrote: >> > Hi Dimitri, >> > >> > I changed this into a reproducible example (we don't know what myvars >> > is). Assuming length(myvars) > 1, I'm not convinced that your first >> > five lines "work" either: what do you expect? >> > >> > I get: >> > >> >> by(data = iris[, -5], INDICES = iris["Species"], FUN = min) >> > Species: setosa >> > [1] 0.1 >> > ------------------------------------------------------------------ >> > Species: versicolor >> > [1] 1 >> > ------------------------------------------------------------------ >> > Species: virginica >> > [1] 1.4 >> > >> > But was expecting: >> > >> >> aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=min) >> > Species Sepal.Length Sepal.Width Petal.Length Petal.Width >> > 1 setosa 4.3 2.3 1.0 0.1 >> > 2 versicolor 4.9 2.0 3.0 1.0 >> > 3 virginica 4.9 2.2 4.5 1.4 >> > >> > >> > >> > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=sd) >> > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=mean) >> > >> > provide the answers I would expect. If you want clearer advice, you >> > need to provide an actually reproducible example, and tell us more >> > about what you expect to get. >> > >> > Sarah >> > >> > >> > On Tue, Dec 8, 2015 at 5:30 PM, Dimitri Liakhovitski >> > <dimitri.liakhovitski at gmail.com> wrote: >> >> Hello! >> >> Could you please explain why the first 5 lines work but the last 2 >> >> lines don't? >> >> Thank you! >> >> >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = summary) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sum) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = var) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = max) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = min) >> >> >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sd) >> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = mean) >> >> >> >> -- >> >> Dimitri Liakhovitski >> >> >> >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- Dimitri Liakhovitski