Hello, I have a dataframe (obtained from read.table()) which looks like ExpA ExpB ExpC Size 1 12 23 33 1 2 12 24 29 1 3 10 22 34 1 4 25 50 60 2 5 24 53 62 2 6 21 49 61 2 now I want to take all rows that have the same value in the "Size" column and apply a function to the columns of these rows (for example median()). The result should be a new dataframe with the medians of the groups, like this: ExpA ExpB ExpC Size 1 12 23 34 1 2 24 50 61 2 I tried to play with the functions by() and tapply() but I didn't get the results I wanted so far, so any help on this would be great! The reason why I am having this problem: (I explain this just to make sure I don't do something against the nature of R.) I am doing 3 simillar experiments, A,B,C and I change a parameter in the experiment (size). Every experiment is done multiple times and I need the median or average over all experiments that are the same. Should I preprocess my data files so that they are completely different? Or is it fine the way it is and I just overlooked the simple solution to the problem described above? Regards, Timo
Tena koe Timo ?aggregate HTH ... Peter Alspach> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Timo Schneider > Sent: Wednesday, 15 July 2009 3:56 p.m. > To: r-help at r-project.org > Subject: [R] Grouping data in dataframe > > Hello, > > I have a dataframe (obtained from read.table()) which looks like > > ExpA ExpB ExpC Size > 1 12 23 33 1 > 2 12 24 29 1 > 3 10 22 34 1 > 4 25 50 60 2 > 5 24 53 62 2 > 6 21 49 61 2 > > now I want to take all rows that have the same value in the "Size" > column and apply a function to the columns of these rows (for > example median()). The result should be a new dataframe with > the medians of the groups, like this: > > ExpA ExpB ExpC Size > 1 12 23 34 1 > 2 24 50 61 2 > > I tried to play with the functions by() and tapply() but I > didn't get the results I wanted so far, so any help on this > would be great! > > The reason why I am having this problem: (I explain this just > to make sure I don't do something against the nature of R.) > > I am doing 3 simillar experiments, A,B,C and I change a > parameter in the experiment (size). Every experiment is done > multiple times and I need the median or average over all > experiments that are the same. Should I preprocess my data > files so that they are completely different? Or is it fine > the way it is and I just overlooked the simple solution to > the problem described above? > > Regards, > Timo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.
Try ?aggregate --- On Wed, 15/7/09, Timo Schneider <timo.schneider at s2004.tu-chemnitz.de> wrote:> From: Timo Schneider <timo.schneider at s2004.tu-chemnitz.de> > Subject: [R] Grouping data in dataframe > To: "r-help at r-project.org" <r-help at r-project.org> > Received: Wednesday, 15 July, 2009, 1:56 PM > Hello, > > I have a dataframe (obtained from read.table()) which looks > like > > ? > ???ExpA???ExpB???ExpC???Size > 1? ? ? 12? ???23? > ? 33? ? ? 1 > 2? ? ? 12? ???24? > ? 29? ? ? 1 > 3? ? ? 10? ???22? > ? 34? ? ? 1 > 4? ? ? 25? ???50? > ? 60? ? ? 2 > 5? ? ? 24? ???53? > ? 62? ? ? 2 > 6? ? ? 21? ???49? > ? 61? ? ? 2 > > now I want to take all rows that have the same value in the > "Size" > column and apply a function to the columns of these rows > (for example > median()). The result should be a new dataframe with the > medians of the > groups, like this: > > ? > ???ExpA???ExpB???ExpC???Size > 1? ? ? 12? ???23? > ? 34? ? ? 1 > 2? ? ? 24? ???50? > ? 61? ? ? 2 > > I tried to play with the functions by() and tapply() but I > didn't get > the results I wanted so far, so any help on this would be > great! > > The reason why I am having this problem: (I explain this > just to make > sure I don't do something against the nature of R.) > > I am doing 3 simillar experiments, A,B,C and I change a > parameter in the > experiment (size). Every experiment is done multiple times > and I need > the median or average over all experiments that are the > same. Should I > preprocess my data files so that they are completely > different? Or is it > fine the way it is and I just overlooked the simple > solution to the > problem described above? > > Regards, > Timo > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
Timo Schneider wrote:> > > I have a dataframe (obtained from read.table()) which looks like > > ExpA ExpB ExpC Size > 1 12 23 33 1 > 2 12 24 29 1 > 3 10 22 34 1 > 4 25 50 60 2 > 5 24 53 62 2 > 6 21 49 61 2 > > now I want to take all rows that have the same value in the "Size" > column and apply a function to the columns of these rows (for example > median()). The result should be a new dataframe with the medians of the > groups, like this: > >Besides the mentioned aggregate, you could use one of the functions in package plyr. Dieter -- View this message in context: http://www.nabble.com/Grouping-data-in-dataframe-tp24491539p24492807.html Sent from the R help mailing list archive at Nabble.com.
Another approach is to use the reshape package --Assuming your data.frame is called xx ------------------------------------------ libarary(reshape) mm <- melt(xx, id=c("Size")) ; mm cast(mm, Size ~variable, median) -------------------------------------- --- On Tue, 7/14/09, Timo Schneider <timo.schneider at s2004.tu-chemnitz.de> wrote:> From: Timo Schneider <timo.schneider at s2004.tu-chemnitz.de> > Subject: [R] Grouping data in dataframe > To: "r-help at r-project.org" <r-help at r-project.org> > Received: Tuesday, July 14, 2009, 11:56 PM > Hello, > > I have a dataframe (obtained from read.table()) which looks > like > > ? > ???ExpA???ExpB???ExpC???Size > 1? ? ? 12? ???23? > ? 33? ? ? 1 > 2? ? ? 12? ???24? > ? 29? ? ? 1 > 3? ? ? 10? ???22? > ? 34? ? ? 1 > 4? ? ? 25? ???50? > ? 60? ? ? 2 > 5? ? ? 24? ???53? > ? 62? ? ? 2 > 6? ? ? 21? ???49? > ? 61? ? ? 2 > > now I want to take all rows that have the same value in the > "Size" > column and apply a function to the columns of these rows > (for example > median()). The result should be a new dataframe with the > medians of the > groups, like this: > > ? > ???ExpA???ExpB???ExpC???Size > 1? ? ? 12? ???23? > ? 34? ? ? 1 > 2? ? ? 24? ???50? > ? 61? ? ? 2 > > I tried to play with the functions by() and tapply() but I > didn't get > the results I wanted so far, so any help on this would be > great! > > The reason why I am having this problem: (I explain this > just to make > sure I don't do something against the nature of R.) > > I am doing 3 simillar experiments, A,B,C and I change a > parameter in the > experiment (size). Every experiment is done multiple times > and I need > the median or average over all experiments that are the > same. Should I > preprocess my data files so that they are completely > different? Or is it > fine the way it is and I just overlooked the simple > solution to the > problem described above? > > Regards, > Timo > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >__________________________________________________________________ Make your browsing faster, safer, and easier with the new Internet Explorer? 8. Optimized for Yahoo! Get it Now for Free! at http://downloads.yahoo.com/ca/internetexplorer/
Am Mittwoch, den 15.07.2009, 00:42 -0500 schrieb markleeds at verizon.net: Hi!> Hi: I think aggregate does what you want. you had 34 in one of your > columns but I think you meant it to be 33. > > DF <- read.table(textConnection("ExpA ExpB ExpC Size > 1 12 23 33 1 > 2 12 24 29 1 > 3 10 22 34 1 > 4 25 50 60 2 > 5 24 53 62 2 > 6 21 49 61 2"),header=TRUE) > > print(DF) > print(str(DF)) > > aggregate(DF,list(DF$Size),median)Yes, thanks to you and all the other people who helped! The aggregate function is exactly what I was looking for. Thanks for the help! Regards, Timo