I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets of the data. I've RTFM ( not very clear) and looked at a variety of samples but cant seem to figure out how to make these functions work. A sample of what I want to do would be this: ids<-seq(1,50) years<-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) data<-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA, rep(40,4)) data2<-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5), rep(38,5)) DF<-data.frame(ids,years,data,data2) That will give you a dataframe that is a good analog of what I have. i would like to calculate means ( with NA removed na.rm) for each level of years. data data2 5 xx. yy. 6 xx yz 7 ... ,,, 8 .. ... And then things like this: 5-7 : xx yy 8 : xy zz [[alternative HTML version deleted]]
Here is one solution for your question: mean.data <- with(DF, tapply(data, years, mean, na.rm = T)) mean.data2 <- with(DF, tapply(data2, years, mean, na.rm = T)) cbind(mean.data , mean.data2) Another one would be for you to read about the package plyr (which is better for this job, actually) And regarding the years being recoded, look at either: ?cut or ?recode (from the car package) Best, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Sun, Apr 25, 2010 at 9:29 AM, steven mosher <moshersteven@gmail.com>wrote:> I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets > of the data. > I've RTFM ( not very clear) and looked at a variety of samples but cant > seem > to figure out > how to make these functions work. > > A sample of what I want to do would be this: > > ids<-seq(1,50) > years<-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) > data<-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA, > rep(40,4)) > data2<-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5), > rep(38,5)) > DF<-data.frame(ids,years,data,data2) > > That will give you a dataframe that is a good analog of what I have. i > would like to calculate means > ( with NA removed na.rm) for each level of years. > > data data2 > 5 xx. yy. > 6 xx yz > 7 ... ,,, > 8 .. ... > > And then things like this: > > 5-7 : xx yy > 8 : xy zz > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Here's one way with aggregate() library(car) # You probably will need to install it. aggregate(DF[,3-4], by=list(years), mean,na.rm=TRUE) recode(x, "c(1,2)='A'; else='B'") DF$years <- recode(DF$years, "c(5,6,7)= '5-7'") DF You may also want to have a look at the reshape and plyr packages. --- On Sun, 4/25/10, steven mosher <moshersteven at gmail.com> wrote:> From: steven mosher <moshersteven at gmail.com> > Subject: [R] Noobie question on aggregate tapply and by > To: "r-help" <r-help at r-project.org> > Received: Sunday, April 25, 2010, 2:29 AM > I have a 43MB dataframe ( 5 > variables) and I'm trying to summarize subsets > of the data. > I've RTFM ( not very clear) and looked at a variety of > samples but cant seem > to figure out > how to make these functions work. > > A sample of what I want to do would be this: > > ids<-seq(1,50) > years<-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) > > data<-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), > NA, > rep(40,4)) > data2<-c(rep(22.2,5),rep(13.2,8),NA, > rep(29.8,16),rep(12.4,10),rep(16.3,5), > rep(38,5)) > DF<-data.frame(ids,years,data,data2) > > That will give you a dataframe that is a good analog of > what I have. i > would like to calculate means > ( with NA removed na.rm) for each level of years. > > ? ? ? ? ? data? data2 > 5? ? ? ???xx.? > ???yy. > 6? ? ? ???xx? > ???yz > 7? ? ? ???...? > ???,,, > 8? ? ? ???..? ? > ? ... > > And then things like this: > > 5-7 :???xx? ???yy > 8???:? ? xy? > ???zz > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
Try this: aggregate(DF[c('data', 'data2')], DF[ 'years'], FUN = sum, na.rm = TRUE) aggregate(DF[c('data', 'data2')], list(as.character(factor(DF[, 'years'], labels = c('5-7', '5-7', '5-7', 8)))), FUN = sum, na.rm = TRUE) On Sun, Apr 25, 2010 at 3:29 AM, steven mosher <moshersteven@gmail.com>wrote:> I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets > of the data. > I've RTFM ( not very clear) and looked at a variety of samples but cant > seem > to figure out > how to make these functions work. > > A sample of what I want to do would be this: > > ids<-seq(1,50) > years<-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) > data<-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA, > rep(40,4)) > data2<-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5), > rep(38,5)) > DF<-data.frame(ids,years,data,data2) > > That will give you a dataframe that is a good analog of what I have. i > would like to calculate means > ( with NA removed na.rm) for each level of years. > > data data2 > 5 xx. yy. > 6 xx yz > 7 ... ,,, > 8 .. ... > > And then things like this: > > 5-7 : xx yy > 8 : xy zz > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]