Hello, I have a data frame (68,000 rows) of scores (V4) for a series of [genomic] coordinates ranges (V2 to V3). I also have a data frame (1.2 million rows) of single [genomic] coordinates. For each genomic coordinate (in coord), I would like to determine the average of all scores whose genomic ranges (in scores) encompass the coordinate (in coord). To accomplish this, I tried: The function works, but is extremely slow. It would take about 4 days for this to finish for a single data set, and I have 64 data sets. Why does the rate at which coordinate averages are calculated increase when coord is smaller, but not when scores is smaller? How can I accomplish the same thing more efficiently? Thanks, Dan -- View this message in context: http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156.html Sent from the R help mailing list archive at Nabble.com.
On Jul 2, 2012, at 12:15 PM, dlv04c wrote:> Hello, > > I have a data frame (68,000 rows) of scores (V4) for a series of > [genomic] > coordinates ranges (V2 to V3). > > > > I also have a data frame (1.2 million rows) of single [genomic] > coordinates. > > > > For each genomic coordinate (in coord), I would like to determine the > average of all scores whose genomic ranges (in scores) encompass the > coordinate (in coord). To accomplish this, I tried: > > > > The function works, but is extremely slow. > > It would take about 4 days for this to finish for a single data set, > and I > have 64 data sets. > > Why does the rate at which coordinate averages are calculated > increase when > coord is smaller, but not when scores is smaller? > > How can I accomplish the same thing more efficiently?You probably need to start by reading the vignettes for the IRanges package. It's difficult to be sure since you did not show the code for what you were doing currently. -- David Winsemius, MD West Hartford, CT
The code is in the original post, but here it is again: Thanks, Dan -- View this message in context: http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156p4635208.html Sent from the R help mailing list archive at Nabble.com.
On Jul 2, 2012, at 5:16 PM, dlv04c wrote:> The code is in the original post, but here it is again: >No code here or in original posting to rhelp. You are under the delusion that Nabble is R-help. It is not.> -- > View this message in context: http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156p4635208.html > Sent from the R help mailing list archive at Nabble.com.This is the rhelp mailing list. Not a website. -- David Winsemius, MD West Hartford, CT