Alexander Shenkin
2010-May-20 14:02 UTC
[R] finding euclidean proximate points in two datasets
Hello all, I've been pouring through the various spatial packages, but haven't come across the right thing yet. Given a set of points in 2-d space X, i'm trying to find the subset of points in Y proximate to each point in X. Furthermore, the proximity threshold of each point in X differs (X$threshold). I've constructed this myself already, but it's horrificly slow with a dataset of 40k+ points in one set, and a 700 in the other. A very inefficient example of what I'm looking for: for (pt in X$idx) { proximity[i] = euclidian_dist(X[pt]$x, X[pt]$y, Y$x, Y$y) < X$threshold i = i+1 } Perhaps crossdist() in spatstat is what I should use, and then code a comparison with X$threshold after the cross-distances are computed. However, I was wondering if there was another tool I should be considering. Any and all thoughts are very welcome. Thanks in advance. Thanks, Allie -- Alexander Shenkin PhD Candidate School of Natural Resources and Environment University of Florida http://snre.ufl.edu/people/students.asp
David Winsemius
2010-May-20 14:18 UTC
[R] finding euclidean proximate points in two datasets
On May 20, 2010, at 10:02 AM, Alexander Shenkin wrote:> Hello all, > > I've been pouring through the various spatial packages, but haven't > come > across the right thing yet.There is a SIG for such questions.> > Given a set of points in 2-d space X, i'm trying to find the subset of > points in Y proximate to each point in X. Furthermore, the proximity > threshold of each point in X differs (X$threshold). I've constructed > this myself already, but it's horrificly slow with a dataset of 40k+ > points in one set, and a 700 in the other. > > A very inefficient example of what I'm looking for:Not really a reproducible example. If euclidean_dist is a function , then it is not one in any of the packages I have installed.> > for (pt in X$idx) { > proximity[i] = euclidian_dist(X[pt]$x, X[pt]$y, Y$x, Y$y) < > X$threshold > i = i+1 > } >Have you considered first creating a subset of candidate points that are within "threshold" of each reference point on both coordinates. That might sidestep a lot of calculations on points that are easily eliminated on a single comparison. Then you could calculate distances within that surviving subset of points. On average that should give you an over 50% "hit rate": > (4/3)*pi*0.5^3 [1] 0.5235988> Perhaps crossdist() in spatstat is what I should use, and then code a > comparison with X$threshold after the cross-distances are computed. > However, I was wondering if there was another tool I should be > considering. Any and all thoughts are very welcome. Thanks in > advance. > > Thanks, > Allie > -- > Alexander Shenkin > PhD Candidate > School of Natural Resources and Environment > University of Florida-- David Winsemius, MD West Hartford, CT