thr3ads.net - R help - [R] Averaging within a range of values [Jan 2012]

If this information is useful, please help other people find it:
Share via:

doggysaywhat

2012-Jan-13 11:34 UTC

[R] Averaging within a range of values

Hello all.

I have two data frames.  
Group       Start          End
G1	               200	         700
G2	               500	         1000
G3	               2000        3000
G4	               4000        6000
G5	               7000        8000


and

Pos		    C0		    C1
200		    0.9	  	  0.6
500	   	  0.8	   	  0.8
800		    0.9	  	  0.7
1000		  0.7	  	0.6
2000	  	0.6		  0.4
2500	  	1.2	  	0.8
3000	  	0.6	  	1.5
3500	  	0.7	  	0.7
4000	  	0.8		  0.8
4500	  	0.6	  	0.6
5000		  0.9	  	0.9
5500	  	0.7		  0.8
6000	  	0.8	  	0.7
6500	  	0.4	  	0.4
7000		  0.5	  	0.8
7500  		0.7		  0.9
8000	  	0.9	  	0.5
8500	  	0.8	  	0.6
9000	  	0.9	  	0.8


I need to conditionally average all values in columns C0 and C1 based upon
the bins I defined in the first data frame.  For example, for the bin G1 in
the first dataframe, the values are 200 to 700 so i would average the value
at pos 200 (0.9) and 500 (0.8) for C0 and then perform the same thing for
C1.  

I can do this in excel with array formulas but I'm relatively new to R and
would like know if there is a function that will perform the same action.  I
don't know if this will help, but the excel array function I used was
average(if(range>=start)*(range<=end),range)).  Where the range is the
entire pos column.  

Initially I looked at the aggregate function.   I can use aggregate when I
give a single vector to be used for grouping such as (A,B,C) but I'm not
sure how to define grouping as the bin 200-500 and the second bin as
500-1000 etc. and use that as my grouping vector.   

Any help would be greatly appreciated.  


--
View this message in context:
http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4291958.html
Sent from the R help mailing list archive at Nabble.com.

Jeff Newmiller

2012-Jan-13 15:05 UTC

head link

[R] Averaging within a range of values

Regarding your last question, read ?cut
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

doggysaywhat <chwhite at ucsd.edu> wrote:
>Hello all.
>
>I have two data frames.  
>Group       Start          End
>G1	               200	         700
>G2	               500	         1000
>G3	               2000        3000
>G4	               4000        6000
>G5	               7000        8000
>
>
>and
>
>Pos		    C0		    C1
>200		    0.9	  	  0.6
>500	   	  0.8	   	  0.8
>800		    0.9	  	  0.7
>1000		  0.7	  	0.6
>2000	  	0.6		  0.4
>2500	  	1.2	  	0.8
>3000	  	0.6	  	1.5
>3500	  	0.7	  	0.7
>4000	  	0.8		  0.8
>4500	  	0.6	  	0.6
>5000		  0.9	  	0.9
>5500	  	0.7		  0.8
>6000	  	0.8	  	0.7
>6500	  	0.4	  	0.4
>7000		  0.5	  	0.8
>7500  		0.7		  0.9
>8000	  	0.9	  	0.5
>8500	  	0.8	  	0.6
>9000	  	0.9	  	0.8
>
>
>I need to conditionally average all values in columns C0 and C1 based
>upon
>the bins I defined in the first data frame.  For example, for the bin
>G1 in
>the first dataframe, the values are 200 to 700 so i would average the
>value
>at pos 200 (0.9) and 500 (0.8) for C0 and then perform the same thing
>for
>C1.  
>
>I can do this in excel with array formulas but I'm relatively new to R
>and
>would like know if there is a function that will perform the same
>action.  I
>don't know if this will help, but the excel array function I used was
>average(if(range>=start)*(range<=end),range)).  Where the range is the
>entire pos column.  
>
>Initially I looked at the aggregate function.   I can use aggregate
>when I
>give a single vector to be used for grouping such as (A,B,C) but I'm
>not
>sure how to define grouping as the bin 200-500 and the second bin as
>500-1000 etc. and use that as my grouping vector.   
>
>Any help would be greatly appreciated.  
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4291958.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

doggysaywhat

2012-Jan-13 21:02 UTC

head link

[R] Averaging within a range of values

Hello Jeff, thank you for the reply.  I tried the cut function and I had two
questions.  How do I have the cut function take the first position in the
start column in df1 as the first cut point and the first position in column
2 as the second cut point.  The break variable seems to want a single
vector.  I tried compressing both vectors into one where I had say


200
700
500
1000
etc

then cut gives me the 200-500 range, 500-700, and 700-1000.  In this case I
wanted the range, 200-700, and 500-1000.  

Is there a way to define the first point of each cut as positions along the
START vector and all second points of the cut as positions along the END
vector?

I also had one additional question.  When playing around with this, I
noticed that I had to do this for the Pos column in the second data frame. 
But, when I get the ranges, how do I have it return the values in C0 or C1
in df2 that are in the same rows as those of the ranges?

Thanks again for the help.
 

--
View this message in context:
http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4293410.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

2012-Jan-14 02:25 UTC

head link

[R] Averaging within a range of values

Hello,

I believe this works

res <- NULL
for(i in 1:nrow(df1)){
	brk <- df1[i, 2:3]
	ix  <- df2$Pos >= brk$Start & df2$Pos <= brk$End
	res <- rbind(res, apply(df2[ix, -1], 2, mean))
}
res

It's not very pretty but since the ranges can overlap, I'm not finding
anything prettier (without a loop).

Rui Barradas


--
View this message in context:
http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4294017.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

2012-Jan-14 17:23 UTC

head link

[R] Averaging within a range of values

On Fri, Jan 13, 2012 at 6:34 AM, doggysaywhat <chwhite at ucsd.edu>
wrote:> Hello all.
>
> I have two data frames.
> Group ? ? ? Start ? ? ? ? ?End
> G1 ? ? ? ? ? ? ? ? ? ? 200 ? ? ? ? ? ? ? 700
> G2 ? ? ? ? ? ? ? ? ? ? 500 ? ? ? ? ? ? ? 1000
> G3 ? ? ? ? ? ? ? ? ? ? 2000 ? ? ? ?3000
> G4 ? ? ? ? ? ? ? ? ? ? 4000 ? ? ? ?6000
> G5 ? ? ? ? ? ? ? ? ? ? 7000 ? ? ? ?8000
>
>
> and
>
> Pos ? ? ? ? ? ? ? ? C0 ? ? ? ? ? ? ?C1
> 200 ? ? ? ? ? ? ? ? 0.9 ? ? ? ? ? 0.6
> 500 ? ? ? ? ? ? ? 0.8 ? ? ? ? ? ? 0.8
> 800 ? ? ? ? ? ? ? ? 0.9 ? ? ? ? ? 0.7
> 1000 ? ? ? ? ? ? ?0.7 ? ? ? ? ? 0.6
> 2000 ? ? ? ? ? ?0.6 ? ? ? ? ? ? ? 0.4
> 2500 ? ? ? ? ? ?1.2 ? ? ? ? ? ? 0.8
> 3000 ? ? ? ? ? ?0.6 ? ? ? ? ? ? 1.5
> 3500 ? ? ? ? ? ?0.7 ? ? ? ? ? ? 0.7
> 4000 ? ? ? ? ? ?0.8 ? ? ? ? ? ? ? 0.8
> 4500 ? ? ? ? ? ?0.6 ? ? ? ? ? ? 0.6
> 5000 ? ? ? ? ? ? ?0.9 ? ? ? ? ? 0.9
> 5500 ? ? ? ? ? ?0.7 ? ? ? ? ? ? ? 0.8
> 6000 ? ? ? ? ? ?0.8 ? ? ? ? ? ? 0.7
> 6500 ? ? ? ? ? ?0.4 ? ? ? ? ? ? 0.4
> 7000 ? ? ? ? ? ? ?0.5 ? ? ? ? ? 0.8
> 7500 ? ? ? ? ? ?0.7 ? ? ? ? ? ? ? 0.9
> 8000 ? ? ? ? ? ?0.9 ? ? ? ? ? ? 0.5
> 8500 ? ? ? ? ? ?0.8 ? ? ? ? ? ? 0.6
> 9000 ? ? ? ? ? ?0.9 ? ? ? ? ? ? 0.8
>
>
> I need to conditionally average all values in columns C0 and C1 based upon
> the bins I defined in the first data frame. ?For example, for the bin G1 in
> the first dataframe, the values are 200 to 700 so i would average the value
> at pos 200 (0.9) and 500 (0.8) for C0 and then perform the same thing for
> C1.
>
> I can do this in excel with array formulas but I'm relatively new to R
and
> would like know if there is a function that will perform the same action.
?I
> don't know if this will help, but the excel array function I used was
> average(if(range>=start)*(range<=end),range)). ?Where the range is
the
> entire pos column.
>
> Initially I looked at the aggregate function. ? I can use aggregate when I
> give a single vector to be used for grouping such as (A,B,C) but I'm
not
> sure how to define grouping as the bin 200-500 and the second bin as
> 500-1000 etc. and use that as my grouping vector.
>
Here is an sqldf solution where the two input data frames are d1 and
d2 (as in Jeff's post).  Note that Group is quoted since its an SQL
keyword:

library(sqldf)

sqldf("select d1.'Group', avg(d2.C0), avg(d2.C1)
   from d1, d2
   where d2.Pos between d1.Start and d1.End
   group by d1.'Group'")

The result is;

  Group avg(d2.C0) avg(d2.C1)
1    G1       0.85  0.7000000
2    G2       0.80  0.7000000
3    G3       0.80  0.9000000
4    G4       0.76  0.7600000
5    G5       0.70  0.7333333

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Reasonably Related Threads

Search for more maybe matching threads

R help - Jan 2012 - Averaging within a range of values

[R] Averaging within a range of values

[R] Averaging within a range of values

[R] Averaging within a range of values

[R] Averaging within a range of values

[R] Averaging within a range of values

Reasonably Related Threads