I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets
of the data.
I've RTFM ( not very clear) and looked at a variety of samples but cant seem
to figure out
how to make these functions work.
A sample of what I want to do would be this:
ids<-seq(1,50)
years<-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20))
data<-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA,
rep(40,4))
data2<-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5),
rep(38,5))
DF<-data.frame(ids,years,data,data2)
That will give you a dataframe that is a good analog of what I have. i
would like to calculate means
( with NA removed na.rm) for each level of years.
data data2
5 xx. yy.
6 xx yz
7 ... ,,,
8 .. ...
And then things like this:
5-7 : xx yy
8 : xy zz
[[alternative HTML version deleted]]
Here is one solution for your question: mean.data <- with(DF, tapply(data, years, mean, na.rm = T)) mean.data2 <- with(DF, tapply(data2, years, mean, na.rm = T)) cbind(mean.data , mean.data2) Another one would be for you to read about the package plyr (which is better for this job, actually) And regarding the years being recoded, look at either: ?cut or ?recode (from the car package) Best, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Sun, Apr 25, 2010 at 9:29 AM, steven mosher <moshersteven@gmail.com>wrote:> I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets > of the data. > I've RTFM ( not very clear) and looked at a variety of samples but cant > seem > to figure out > how to make these functions work. > > A sample of what I want to do would be this: > > ids<-seq(1,50) > years<-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) > data<-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA, > rep(40,4)) > data2<-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5), > rep(38,5)) > DF<-data.frame(ids,years,data,data2) > > That will give you a dataframe that is a good analog of what I have. i > would like to calculate means > ( with NA removed na.rm) for each level of years. > > data data2 > 5 xx. yy. > 6 xx yz > 7 ... ,,, > 8 .. ... > > And then things like this: > > 5-7 : xx yy > 8 : xy zz > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Here's one way with aggregate() library(car) # You probably will need to install it. aggregate(DF[,3-4], by=list(years), mean,na.rm=TRUE) recode(x, "c(1,2)='A'; else='B'") DF$years <- recode(DF$years, "c(5,6,7)= '5-7'") DF You may also want to have a look at the reshape and plyr packages. --- On Sun, 4/25/10, steven mosher <moshersteven at gmail.com> wrote:> From: steven mosher <moshersteven at gmail.com> > Subject: [R] Noobie question on aggregate tapply and by > To: "r-help" <r-help at r-project.org> > Received: Sunday, April 25, 2010, 2:29 AM > I have a 43MB dataframe ( 5 > variables) and I'm trying to summarize subsets > of the data. > I've RTFM ( not very clear) and looked at a variety of > samples but cant seem > to figure out > how to make these functions work. > > A sample of what I want to do would be this: > > ids<-seq(1,50) > years<-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) > > data<-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), > NA, > rep(40,4)) > data2<-c(rep(22.2,5),rep(13.2,8),NA, > rep(29.8,16),rep(12.4,10),rep(16.3,5), > rep(38,5)) > DF<-data.frame(ids,years,data,data2) > > That will give you a dataframe that is a good analog of > what I have. i > would like to calculate means > ( with NA removed na.rm) for each level of years. > > ? ? ? ? ? data? data2 > 5? ? ? ???xx.? > ???yy. > 6? ? ? ???xx? > ???yz > 7? ? ? ???...? > ???,,, > 8? ? ? ???..? ? > ? ... > > And then things like this: > > 5-7 :???xx? ???yy > 8???:? ? xy? > ???zz > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
Try this:
aggregate(DF[c('data', 'data2')], DF[ 'years'], FUN =
sum, na.rm = TRUE)
aggregate(DF[c('data', 'data2')], list(as.character(factor(DF[,
'years'],
labels = c('5-7', '5-7', '5-7', 8)))), FUN = sum, na.rm
= TRUE)
On Sun, Apr 25, 2010 at 3:29 AM, steven mosher
<moshersteven@gmail.com>wrote:
> I have a 43MB dataframe ( 5 variables) and I'm trying to summarize
subsets
> of the data.
> I've RTFM ( not very clear) and looked at a variety of samples but cant
> seem
> to figure out
> how to make these functions work.
>
> A sample of what I want to do would be this:
>
> ids<-seq(1,50)
> years<-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20))
> data<-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5),
NA,
> rep(40,4))
> data2<-c(rep(22.2,5),rep(13.2,8),NA,
rep(29.8,16),rep(12.4,10),rep(16.3,5),
> rep(38,5))
> DF<-data.frame(ids,years,data,data2)
>
> That will give you a dataframe that is a good analog of what I have. i
> would like to calculate means
> ( with NA removed na.rm) for each level of years.
>
> data data2
> 5 xx. yy.
> 6 xx yz
> 7 ... ,,,
> 8 .. ...
>
> And then things like this:
>
> 5-7 : xx yy
> 8 : xy zz
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O
[[alternative HTML version deleted]]