Hello all. I have two data frames. Group Start End G1 200 700 G2 500 1000 G3 2000 3000 G4 4000 6000 G5 7000 8000 and Pos C0 C1 200 0.9 0.6 500 0.8 0.8 800 0.9 0.7 1000 0.7 0.6 2000 0.6 0.4 2500 1.2 0.8 3000 0.6 1.5 3500 0.7 0.7 4000 0.8 0.8 4500 0.6 0.6 5000 0.9 0.9 5500 0.7 0.8 6000 0.8 0.7 6500 0.4 0.4 7000 0.5 0.8 7500 0.7 0.9 8000 0.9 0.5 8500 0.8 0.6 9000 0.9 0.8 I need to conditionally average all values in columns C0 and C1 based upon the bins I defined in the first data frame. For example, for the bin G1 in the first dataframe, the values are 200 to 700 so i would average the value at pos 200 (0.9) and 500 (0.8) for C0 and then perform the same thing for C1. I can do this in excel with array formulas but I'm relatively new to R and would like know if there is a function that will perform the same action. I don't know if this will help, but the excel array function I used was average(if(range>=start)*(range<=end),range)). Where the range is the entire pos column. Initially I looked at the aggregate function. I can use aggregate when I give a single vector to be used for grouping such as (A,B,C) but I'm not sure how to define grouping as the bin 200-500 and the second bin as 500-1000 etc. and use that as my grouping vector. Any help would be greatly appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4291958.html Sent from the R help mailing list archive at Nabble.com.
Regarding your last question, read ?cut --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. doggysaywhat <chwhite at ucsd.edu> wrote:>Hello all. > >I have two data frames. >Group Start End >G1 200 700 >G2 500 1000 >G3 2000 3000 >G4 4000 6000 >G5 7000 8000 > > >and > >Pos C0 C1 >200 0.9 0.6 >500 0.8 0.8 >800 0.9 0.7 >1000 0.7 0.6 >2000 0.6 0.4 >2500 1.2 0.8 >3000 0.6 1.5 >3500 0.7 0.7 >4000 0.8 0.8 >4500 0.6 0.6 >5000 0.9 0.9 >5500 0.7 0.8 >6000 0.8 0.7 >6500 0.4 0.4 >7000 0.5 0.8 >7500 0.7 0.9 >8000 0.9 0.5 >8500 0.8 0.6 >9000 0.9 0.8 > > >I need to conditionally average all values in columns C0 and C1 based >upon >the bins I defined in the first data frame. For example, for the bin >G1 in >the first dataframe, the values are 200 to 700 so i would average the >value >at pos 200 (0.9) and 500 (0.8) for C0 and then perform the same thing >for >C1. > >I can do this in excel with array formulas but I'm relatively new to R >and >would like know if there is a function that will perform the same >action. I >don't know if this will help, but the excel array function I used was >average(if(range>=start)*(range<=end),range)). Where the range is the >entire pos column. > >Initially I looked at the aggregate function. I can use aggregate >when I >give a single vector to be used for grouping such as (A,B,C) but I'm >not >sure how to define grouping as the bin 200-500 and the second bin as >500-1000 etc. and use that as my grouping vector. > >Any help would be greatly appreciated. > > >-- >View this message in context: >http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4291958.html >Sent from the R help mailing list archive at Nabble.com. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hello Jeff, thank you for the reply. I tried the cut function and I had two questions. How do I have the cut function take the first position in the start column in df1 as the first cut point and the first position in column 2 as the second cut point. The break variable seems to want a single vector. I tried compressing both vectors into one where I had say 200 700 500 1000 etc then cut gives me the 200-500 range, 500-700, and 700-1000. In this case I wanted the range, 200-700, and 500-1000. Is there a way to define the first point of each cut as positions along the START vector and all second points of the cut as positions along the END vector? I also had one additional question. When playing around with this, I noticed that I had to do this for the Pos column in the second data frame. But, when I get the ranges, how do I have it return the values in C0 or C1 in df2 that are in the same rows as those of the ranges? Thanks again for the help. -- View this message in context: http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4293410.html Sent from the R help mailing list archive at Nabble.com.
Hello, I believe this works res <- NULL for(i in 1:nrow(df1)){ brk <- df1[i, 2:3] ix <- df2$Pos >= brk$Start & df2$Pos <= brk$End res <- rbind(res, apply(df2[ix, -1], 2, mean)) } res It's not very pretty but since the ranges can overlap, I'm not finding anything prettier (without a loop). Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4294017.html Sent from the R help mailing list archive at Nabble.com.
On Fri, Jan 13, 2012 at 6:34 AM, doggysaywhat <chwhite at ucsd.edu> wrote:> Hello all. > > I have two data frames. > Group ? ? ? Start ? ? ? ? ?End > G1 ? ? ? ? ? ? ? ? ? ? 200 ? ? ? ? ? ? ? 700 > G2 ? ? ? ? ? ? ? ? ? ? 500 ? ? ? ? ? ? ? 1000 > G3 ? ? ? ? ? ? ? ? ? ? 2000 ? ? ? ?3000 > G4 ? ? ? ? ? ? ? ? ? ? 4000 ? ? ? ?6000 > G5 ? ? ? ? ? ? ? ? ? ? 7000 ? ? ? ?8000 > > > and > > Pos ? ? ? ? ? ? ? ? C0 ? ? ? ? ? ? ?C1 > 200 ? ? ? ? ? ? ? ? 0.9 ? ? ? ? ? 0.6 > 500 ? ? ? ? ? ? ? 0.8 ? ? ? ? ? ? 0.8 > 800 ? ? ? ? ? ? ? ? 0.9 ? ? ? ? ? 0.7 > 1000 ? ? ? ? ? ? ?0.7 ? ? ? ? ? 0.6 > 2000 ? ? ? ? ? ?0.6 ? ? ? ? ? ? ? 0.4 > 2500 ? ? ? ? ? ?1.2 ? ? ? ? ? ? 0.8 > 3000 ? ? ? ? ? ?0.6 ? ? ? ? ? ? 1.5 > 3500 ? ? ? ? ? ?0.7 ? ? ? ? ? ? 0.7 > 4000 ? ? ? ? ? ?0.8 ? ? ? ? ? ? ? 0.8 > 4500 ? ? ? ? ? ?0.6 ? ? ? ? ? ? 0.6 > 5000 ? ? ? ? ? ? ?0.9 ? ? ? ? ? 0.9 > 5500 ? ? ? ? ? ?0.7 ? ? ? ? ? ? ? 0.8 > 6000 ? ? ? ? ? ?0.8 ? ? ? ? ? ? 0.7 > 6500 ? ? ? ? ? ?0.4 ? ? ? ? ? ? 0.4 > 7000 ? ? ? ? ? ? ?0.5 ? ? ? ? ? 0.8 > 7500 ? ? ? ? ? ?0.7 ? ? ? ? ? ? ? 0.9 > 8000 ? ? ? ? ? ?0.9 ? ? ? ? ? ? 0.5 > 8500 ? ? ? ? ? ?0.8 ? ? ? ? ? ? 0.6 > 9000 ? ? ? ? ? ?0.9 ? ? ? ? ? ? 0.8 > > > I need to conditionally average all values in columns C0 and C1 based upon > the bins I defined in the first data frame. ?For example, for the bin G1 in > the first dataframe, the values are 200 to 700 so i would average the value > at pos 200 (0.9) and 500 (0.8) for C0 and then perform the same thing for > C1. > > I can do this in excel with array formulas but I'm relatively new to R and > would like know if there is a function that will perform the same action. ?I > don't know if this will help, but the excel array function I used was > average(if(range>=start)*(range<=end),range)). ?Where the range is the > entire pos column. > > Initially I looked at the aggregate function. ? I can use aggregate when I > give a single vector to be used for grouping such as (A,B,C) but I'm not > sure how to define grouping as the bin 200-500 and the second bin as > 500-1000 etc. and use that as my grouping vector. >Here is an sqldf solution where the two input data frames are d1 and d2 (as in Jeff's post). Note that Group is quoted since its an SQL keyword: library(sqldf) sqldf("select d1.'Group', avg(d2.C0), avg(d2.C1) from d1, d2 where d2.Pos between d1.Start and d1.End group by d1.'Group'") The result is; Group avg(d2.C0) avg(d2.C1) 1 G1 0.85 0.7000000 2 G2 0.80 0.7000000 3 G3 0.80 0.9000000 4 G4 0.76 0.7600000 5 G5 0.70 0.7333333 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com