I'm trying to do the following: For each ordered pair of a data frame (D1) containing longitudes and latitudes and unique point IDs, calculate the distance to every point in another data frame (D2) also containing longitudes, latitudes and point IDs, and return to a new variable in D1 the point ID of the nearest element of D2. Dramatis personae (mostly self-explanatory): D1$long D1$lat D1$point.id neighbor.id (to be created; for each ordered pair in D1 the point ID of the nearest ordered pair in D2) D2$long D2$lat D2$point.id dist.geo (to be created) I've been attempting this with nested for() loops that step through each ordered pair in D1, and for each ordered pair [i] in D1 create a vector (dist.geo) the length of D2$lat (say) that contains the distance calculated from every ordered pair in D2 to the current ordered pair [i] of D1, assign a value for D1$neighbor.id[i] based on D2$point.id[(which.min(dist.geo)], and move on to the next ordered pair of D1 to create another dist.geo, assign another neighbor.id, etc. There are no missings/NAs in any of the longs, lats or point.ids, although advice on generalizing this to deal with them would be appreciated. What I've been trying: neighbor.id <- vector(length=length(D1$lat)) dist.geo <- vector(length=length(D2$lat)) for(i in 1:length(neighbor.id)){ for(j in 1:length(dist.geo)){ dist.geo[j] <- D1$lat[i]-D2$lat[j]} # Yes, I know that isn't the right formula, this is just a test neighbor.id[i] <- D2$point.id[which.min(dist.geo)]} What I get is a neighbor.id of the appropriate length, but which consists only of the same value repeated. Should I instead pass the which.min(dist.geo) to a variable before exiting the inner (j) loop, and reference that variable in place of which.min(dist.geo) in the last line? Or is this whole approach wrongheaded? This should be elementary, I know, so I appreciate everyone's forbearance. Steven Sullivan, Ph.D. Senior Associate The QED Group, LLC 1250 Eye St. NW, Suite 802 Washington, DC 20005 ssullivan@qedgroupllc.com 202.898.1910.x15 (v) 202.898.0887 (f) 202.421.8161 (m) [[alternative HTML version deleted]]
On Wed, 30 Jul 2003, Steve Sullivan wrote:> I'm trying to do the following: > > > > For each ordered pair of a data frame (D1) containing longitudes and > latitudes and unique point IDs, calculate the distance to every point in > another data frame (D2) also containing longitudes, latitudes and point > IDs, and return to a new variable in D1 the point ID of the nearest > element of D2.I think you can get quite a long way with the function rdist.earth() in the fields package:> loc1 <- expand.grid(long=seq(-150,150,5), lat=seq(-70,70,5)) > dim(loc1)[1] 1769 2> loc2 <- expand.grid(long=seq(-150,150,7.5), lat=seq(-70,70,7.5)) > dim(loc2)[1] 779 2> dists <- rdist.earth(loc1, loc2) > id12 <- apply(dists, 1, which.min) > length(id12)[1] 1769> id21 <- apply(dists, 2, which.min) > length(id21)[1] 779 using id12 and id21 to choose the point.ids if need be> loc2$point.id[id12]Roger> > Dramatis personae (mostly self-explanatory): > > D1$long > > D1$lat > > D1$point.id > > neighbor.id (to be created; for each ordered pair in D1 the point ID of > the nearest ordered pair in D2) > > D2$long > > D2$lat > > D2$point.id > > dist.geo (to be created) > > > > I've been attempting this with nested for() loops that step through each > ordered pair in D1, and for each ordered pair [i] in D1 create a vector > (dist.geo) the length of D2$lat (say) that contains the distance > calculated from every ordered pair in D2 to the current ordered pair [i] > of D1, assign a value for D1$neighbor.id[i] based on > D2$point.id[(which.min(dist.geo)], and move on to the next ordered pair > of D1 to create another dist.geo, assign another neighbor.id, etc. > > > > There are no missings/NAs in any of the longs, lats or point.ids, > although advice on generalizing this to deal with them would be > appreciated. > > > > What I've been trying: > > > > neighbor.id <- vector(length=length(D1$lat)) > dist.geo <- vector(length=length(D2$lat)) > for(i in 1:length(neighbor.id)){ > for(j in 1:length(dist.geo)){ > dist.geo[j] <- D1$lat[i]-D2$lat[j]} > > # Yes, I know that isn't the right formula, this is just a test > > neighbor.id[i] <- D2$point.id[which.min(dist.geo)]} > > > > What I get is a neighbor.id of the appropriate length, but which > consists only of the same value repeated. Should I instead pass the > which.min(dist.geo) to a variable before exiting the inner (j) loop, and > reference that variable in place of which.min(dist.geo) in the last > line? Or is this whole approach wrongheaded? > > > > This should be elementary, I know, so I appreciate everyone's > forbearance. > > > > Steven Sullivan, Ph.D. > > Senior Associate > > The QED Group, LLC > > 1250 Eye St. NW, Suite 802 > > Washington, DC 20005 > > ssullivan at qedgroupllc.com > > 202.898.1910.x15 (v) > > 202.898.0887 (f) > > 202.421.8161 (m) > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93 e-mail: Roger.Bivand at nhh.no
Peter Dalgaard BSA
2003-Jul-30 16:58 UTC
[R] nested for() loops for returning a nearest point
"Steve Sullivan" <ssullivan at qedgroupllc.com> writes:> neighbor.id <- vector(length=length(D1$lat)) > dist.geo <- vector(length=length(D2$lat)) > for(i in 1:length(neighbor.id)){ > for(j in 1:length(dist.geo)){ > dist.geo[j] <- D1$lat[i]-D2$lat[j]} > > # Yes, I know that isn't the right formula, this is just a test > > neighbor.id[i] <- D2$point.id[which.min(dist.geo)]} > > > > What I get is a neighbor.id of the appropriate length, but which > consists only of the same value repeated. Should I instead pass the > which.min(dist.geo) to a variable before exiting the inner (j) loop, and > reference that variable in place of which.min(dist.geo) in the last > line? Or is this whole approach wrongheaded?Wouldn't you want to define dist.geo with an abs() ? Otherwise, the North Pole might have the largest negative difference every time... Apart from that, things look sane to me (but the heat is killing me today...). You can vectorize things as in dist.geo <- abs(D1$lat[i]-D2$lat) and get rid of the inner loop, but the basic idea looks correct. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
> From: Steve Sullivan [mailto:ssullivan at qedgroupllc.com] > > I'm trying to do the following: > > For each ordered pair of a data frame (D1) containing > longitudes and latitudes and unique point IDs, calculate the > distance to every point in another data frame (D2) also > containing longitudes, latitudes and point IDs, and return to > a new variable in D1 the point ID of the nearest element of D2. > > Dramatis personae (mostly self-explanatory): > > D1$long > > D1$lat > > D1$point.id > > neighbor.id (to be created; for each ordered pair in D1 the > point ID of the nearest ordered pair in D2) > > D2$long > > D2$lat > > D2$point.id > > dist.geo (to be created) > > > > I've been attempting this with nested for() loops that step > through each ordered pair in D1, and for each ordered pair > [i] in D1 create a vector > (dist.geo) the length of D2$lat (say) that contains the > distance calculated from every ordered pair in D2 to the > current ordered pair [i] of D1, assign a value for > D1$neighbor.id[i] based on D2$point.id[(which.min(dist.geo)], > and move on to the next ordered pair of D1 to create another > dist.geo, assign another neighbor.id, etc. > > > > There are no missings/NAs in any of the longs, lats or > point.ids, although advice on generalizing this to deal with > them would be appreciated. > > > > What I've been trying: > > > > neighbor.id <- vector(length=length(D1$lat)) > dist.geo <- vector(length=length(D2$lat)) > for(i in 1:length(neighbor.id)){ > for(j in 1:length(dist.geo)){ > dist.geo[j] <- D1$lat[i]-D2$lat[j]} > > # Yes, I know that isn't the right formula, this is just a test > > neighbor.id[i] <- D2$point.id[which.min(dist.geo)]} > > > > What I get is a neighbor.id of the appropriate length, but > which consists only of the same value repeated. Should I > instead pass the > which.min(dist.geo) to a variable before exiting the inner > (j) loop, and reference that variable in place of > which.min(dist.geo) in the last line? Or is this whole > approach wrongheaded? >For finding nearest neighbors, try the following: set.seed(1) d1 <- data.frame(long=rnorm(10), lat=rnorm(10), point.id=factor(1:10)) d2 <- data.frame(long=rnorm(5), lat=rnorm(5), point.id=factor(1:5)) ## For each point in D1, find nearest neighbor in D2. library(class) d1$neighbor.id <- knn1(as.matrix(d2[,1:2]), as.matrix(d1[,1:2]), d2$point.id) If you really want do, you could modify knn1() (and the C code it calls) so the distance is also returned. Otherwise, you can just compute the distance "by hand" in R once the nearest neighbors are found. HTH, Andy> > > This should be elementary, I know, so I appreciate everyone's > forbearance. > > > > Steven Sullivan, Ph.D. > > Senior Associate > > The QED Group, LLC > > 1250 Eye St. NW, Suite 802 > > Washington, DC 20005 > > ssullivan at qedgroupllc.com > > 202.898.1910.x15 (v) > > 202.898.0887 (f) > > 202.421.8161 (m) > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.