thr3ads.net - R help - [R] averaging between rows with repeated data [Nov 2011]

If this information is useful, please help other people find it:
Share via:

robgriffin247

2011-Nov-15 10:52 UTC

[R] averaging between rows with repeated data

*The situation (or an example at least!)*

example<-data.frame(rep(letters[1:10]))
colnames(example)[1]<-("Letters")
example$numb1<-rnorm(10,1,1)
example$numb2<-rnorm(10,1,1)
example$numb3<-rnorm(10,1,1)
example$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG232","CG441","CG232","CG125")

*this produces something like this:*
  Letters     numb1      numb2        numb3    id
1        a 0.8139130 -0.9775570 -0.002996244 CG234
2        b 0.8268700  0.4980661  1.647717998 CG232
3        c 0.2384088  1.0249684  0.120663273 CG441
4        d 0.8215922  0.5686534  1.591208307 CG128
5        e 0.7865918  0.5411476  0.838300185 CG125
6        f 2.2385522  1.2668070  1.268005020 CG182
7        g 0.7403965 -0.6224205  1.374641549 CG232
8        h 0.2526634  1.0282978 -0.110449844 CG441
9        i 1.9333444  1.6667486  2.937252363 CG232
10       j 1.6996701  0.5964623  1.967870617 CG125
 
*The Problem:*
Some of these id's are repeated, I want to average the values for those rows
within each column but obviously they have different numbers in the numbers
column, and they also have different letters in the letters column, the
letters are not necessary for my analysis, only the duplicated id's and the
numb columns are important
 
I also need to keep the existing dataframe so would like to build a new
dataframe that averages the repeated values and keeps their id - my actual
dataset is much more complex (271*13890) - but the solution to this can be
expanded out to my main data set because there is just more columns of
numbers and still only one alphanumeric id to keep in my example data, id
CG232 occurs 3 times, CG441 & CG125 occur twice, everthing else once so the
new dataframe (from this example) there would be 3 number columns (numb1,
numb2, numb3) and an id the numb column values would be the averages of the
rows which had the same id 
 
so for example the new dataframe would contain an entry for CG125 which
would be something like this:
 
numb1    numb2    numb3       id
1.2431     0.5688     1.403         CG125
 
Just as a thought, all of the IDs start with CG so could I use then grep (?)
to delete CG and replace it with 0, that way duplicated ids could be
averaged as a number (they would be the same) but I still don?t know how to
produce the new dataframe with the averaged rows in it...
 
I hope this is clear enough! email me if you need further detail or even
better, if you have a solution!! 
also sorry to be posting my second question in under 24hours but I seem to
have become more than a little stuck ? I was making such good progress with
R! 

Rob
 
(also I'm sorry if this appears more than once on the mailing list - I'm
having some network & windows live issues so I'm not convinced previous
attempts to send this have worked, but have no way of telling if they are
just milling around in the internet somewhere as we speak and will decide to
come out of hiding later!)

--
View this message in context:
http://r.789695.n4.nabble.com/averaging-between-rows-with-repeated-data-tp4042513p4042513.html
Sent from the R help mailing list archive at Nabble.com.

R. Michael Weylandt

2011-Nov-15 11:46 UTC

head link

[R] averaging between rows with repeated data

Good morning Rob,

First off, thank you for providing a reproducible example. This is one
of those little tasks that R is pretty great at, but there
exist>\infty ways to do so and it can be a little overwhelming for thebeginner: here's one with the base function ave():

cbind(ave(example[,2:4], example[,5]), id = example[,5])

This splits example according to the fifth column (id) and averages
the other values: we then stick another copy of the id back on the end
and are good to go.

The base function aggregate can do something similar:

aggregate(example[,2:4], by = example[,5, drop = F], mean)

Note that you need the little-publicized but super useful drop = F
command to make this one work.

There are other ways to do this with the plyr or doBy packages as
well, but this should get you started.

Hope it helps,

Michael

On Tue, Nov 15, 2011 at 5:52 AM, robgriffin247
<robgriffin247 at hotmail.com> wrote:> *The situation (or an example at least!)*
>
> example<-data.frame(rep(letters[1:10]))
> colnames(example)[1]<-("Letters")
> example$numb1<-rnorm(10,1,1)
> example$numb2<-rnorm(10,1,1)
> example$numb3<-rnorm(10,1,1)
>
example$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG232","CG441","CG232","CG125")
>
> *this produces something like this:*
> ?Letters ? ? numb1 ? ? ?numb2 ? ? ? ?numb3 ? ?id
> 1 ? ? ? ?a 0.8139130 -0.9775570 -0.002996244 CG234
> 2 ? ? ? ?b 0.8268700 ?0.4980661 ?1.647717998 CG232
> 3 ? ? ? ?c 0.2384088 ?1.0249684 ?0.120663273 CG441
> 4 ? ? ? ?d 0.8215922 ?0.5686534 ?1.591208307 CG128
> 5 ? ? ? ?e 0.7865918 ?0.5411476 ?0.838300185 CG125
> 6 ? ? ? ?f 2.2385522 ?1.2668070 ?1.268005020 CG182
> 7 ? ? ? ?g 0.7403965 -0.6224205 ?1.374641549 CG232
> 8 ? ? ? ?h 0.2526634 ?1.0282978 -0.110449844 CG441
> 9 ? ? ? ?i 1.9333444 ?1.6667486 ?2.937252363 CG232
> 10 ? ? ? j 1.6996701 ?0.5964623 ?1.967870617 CG125
>
> *The Problem:*
> Some of these id's are repeated, I want to average the values for those
rows
> within each column but obviously they have different numbers in the numbers
> column, and they also have different letters in the letters column, the
> letters are not necessary for my analysis, only the duplicated id's and
the
> numb columns are important
>
> I also need to keep the existing dataframe so would like to build a new
> dataframe that averages the repeated values and keeps their id - my actual
> dataset is much more complex (271*13890) - but the solution to this can be
> expanded out to my main data set because there is just more columns of
> numbers and still only one alphanumeric id to keep in my example data, id
> CG232 occurs 3 times, CG441 & CG125 occur twice, everthing else once so
the
> new dataframe (from this example) there would be 3 number columns (numb1,
> numb2, numb3) and an id the numb column values would be the averages of the
> rows which had the same id
>
> so for example the new dataframe would contain an entry for CG125 which
> would be something like this:
>
> numb1 ? ?numb2 ? ?numb3 ? ? ? id
> 1.2431 ? ? 0.5688 ? ? 1.403 ? ? ? ? CG125
>
> Just as a thought, all of the IDs start with CG so could I use then grep
(?)
> to delete CG and replace it with 0, that way duplicated ids could be
> averaged as a number (they would be the same) but I still don?t know how to
> produce the new dataframe with the averaged rows in it...
>
> I hope this is clear enough! email me if you need further detail or even
> better, if you have a solution!!
> also sorry to be posting my second question in under 24hours but I seem to
> have become more than a little stuck ? I was making such good progress with
> R!
>
> Rob
>
> (also I'm sorry if this appears more than once on the mailing list -
I'm
> having some network & windows live issues so I'm not convinced
previous
> attempts to send this have worked, but have no way of telling if they are
> just milling around in the internet somewhere as we speak and will decide
to
> come out of hiding later!)
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/averaging-between-rows-with-repeated-data-tp4042513p4042513.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Nov 2011 - averaging between rows with repeated data

[R] averaging between rows with repeated data

[R] averaging between rows with repeated data

Possibly Parallel Threads