dear R experts: has someone written a function that returns the results of by() as a data frame? ??of course, this can work only if the output of the function that is an argument to by() is a numerical vector. presumably, what is now names(byobject) would become a column in the data frame, and the by object's list elements would become columns. it's a little bit like flattening the by() output object (so that the name of the list item and its contents become the same row), and having the right names for the columns. ?I don't know how to do this quickly in the R way. ?(Doing it slowly, e.g., with a for loop over the list of vectors, is easy, but would not make a nice function for me to use often.) for example, lets say my by() output is currently by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ) $`A` [1] 2 3 $`B` [2] 4 5 then the revised by() would instead produce charid ?m ?s A ? ? ? ? ?2 ?3 B ? ? ? ? ?4 ?5 working with data frames is often more intuitive than working with the output of by(). the R wizards are probably chuckling now about how easy this is... regards, /iaw ---- Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
Try this: as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )) On Mon, Aug 30, 2010 at 10:19 AM, ivo welch <ivo.welch@gmail.com> wrote:> dear R experts: > > has someone written a function that returns the results of by() as a > data frame? of course, this can work only if the output of the > function that is an argument to by() is a numerical vector. > presumably, what is now names(byobject) would become a column in the > data frame, and the by object's list elements would become columns. > it's a little bit like flattening the by() output object (so that the > name of the list item and its contents become the same row), and > having the right names for the columns. I don't know how to do this > quickly in the R way. (Doing it slowly, e.g., with a for loop over > the list of vectors, is easy, but would not make a nice function for > me to use often.) > > for example, lets say my by() output is currently > > by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ) > > $`A` > [1] 2 3 > $`B` > [2] 4 5 > > then the revised by() would instead produce > > charid m s > A 2 3 > B 4 5 > > working with data frames is often more intuitive than working with the > output of by(). the R wizards are probably chuckling now about how > easy this is... > > regards, > > /iaw > > ---- > Ivo Welch (ivo.welch@brown.edu, ivo.welch@gmail.com) > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
I can definitely recommend the "plyr" package for these sorts of operations. http://had.co.nz/plyr/ ivo welch wrote:> dear R experts: > > has someone written a function that returns the results of by() as a > data frame? of course, this can work only if the output of the > function that is an argument to by() is a numerical vector. > presumably, what is now names(byobject) would become a column in the > data frame, and the by object's list elements would become columns. > it's a little bit like flattening the by() output object (so that the > name of the list item and its contents become the same row), and > having the right names for the columns. I don't know how to do this > quickly in the R way. (Doing it slowly, e.g., with a for loop over > the list of vectors, is easy, but would not make a nice function for > me to use often.) > > for example, lets say my by() output is currently > > by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ) > > $`A` > [1] 2 3 > $`B` > [2] 4 5 > > then the revised by() would instead produce > > charid m s > A 2 3 > B 4 5 > > working with data frames is often more intuitive than working with the > output of by(). the R wizards are probably chuckling now about how > easy this is... > > regards, > > /iaw > > ---- > Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Have you tried aggregate or plyr's ddply? by() is meant for functions that return such complicated return values that automatically combining them is not feasible (e.g., lm()). aggregate() works for functions that return scalars or simple vectors and returns a data.frame. ddply is part of a family of apply functions with a uniform interface. I didn't notice any sample data so I made up some and your by() call didn't work with what I made up, so perhaps you need something else.> indf<-data.frame(charid=c("A","A","A","B","A","B"), x= 11:16) > by(indf$x, indf$charid, function(x)c(m=mean(x),s=sd(x)))indf$charid: A m s 12.750000 1.707825 ------------------------------------------------------------ indf$charid: B m s 15.000000 1.414214> ddply(indf, .variables=.(charid), .fun=function(df)c(m=mean(df$x),s=sd(df$x)))charid m s 1 A 12.75 1.707825 2 B 15.00 1.414214> str(.Last.value)'data.frame': 2 obs. of 3 variables: $ charid: Factor w/ 2 levels "A","B": 1 2 $ m : num 12.8 15 $ s : num 1.71 1.41 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of ivo welch > Sent: Monday, August 30, 2010 6:19 AM > To: r-help > Subject: [R] different interface to by (tapply)? > > dear R experts: > > has someone written a function that returns the results of by() as a > data frame? ??of course, this can work only if the output of the > function that is an argument to by() is a numerical vector. > presumably, what is now names(byobject) would become a column in the > data frame, and the by object's list elements would become columns. > it's a little bit like flattening the by() output object (so that the > name of the list item and its contents become the same row), and > having the right names for the columns. ?I don't know how to do this > quickly in the R way. ?(Doing it slowly, e.g., with a for loop over > the list of vectors, is easy, but would not make a nice function for > me to use often.) > > for example, lets say my by() output is currently > > by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ) > > $`A` > [1] 2 3 > $`B` > [2] 4 5 > > then the revised by() would instead produce > > charid ?m ?s > A ? ? ? ? ?2 ?3 > B ? ? ? ? ?4 ?5 > > working with data frames is often more intuitive than working with the > output of by(). the R wizards are probably chuckling now about how > easy this is... > > regards, > > /iaw > > ---- > Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Aug 30, 2010, at 9:19 AM, ivo welch wrote:> dear R experts: > > has someone written a function that returns the results of by() as a > data frame? of course, this can work only if the output of the > function that is an argument to by() is a numerical vector. > presumably, what is now names(byobject) would become a column in the > data frame, and the by object's list elements would become columns. > it's a little bit like flattening the by() output object (so that the > name of the list item and its contents become the same row), and > having the right names for the columns. I don't know how to do this > quickly in the R way. (Doing it slowly, e.g., with a for loop over > the list of vectors, is easy, but would not make a nice function for > me to use often.) > > for example, lets say my by() output is currently > > by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ) > > $`A` > [1] 2 3 > $`B` > [2] 4 5Doesn't by() return names for the "m" and "s" values?> > then the revised by() would instead produce > > charid m s > A 2 3 > B 4 5by shares with table and tapply the return of a list of matrix or array-like objects. I would expect that as.data.frame would return a dataframe from such an argument, but my experience is that it generally forces this into a "long" format and one may need to resort to reshape() or reshape::melt and reshape::cast() to get it back into a rectangular format. You _could_ have created a richer example that would become the basis for further elaboration. -- David.> > working with data frames is often more intuitive than working with the > output of by(). the R wizards are probably chuckling now about how > easy this is... > > regards, > > /iaw > > ---- > Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT