thr3ads.net - R help - [R] nested for() loops for returning a nearest point [Jul 2003]

If this information is useful, please help other people find it:
Share via:

Steve Sullivan

2003-Jul-30 15:52 UTC

[R] nested for() loops for returning a nearest point

I'm trying to do the following:

 

For each ordered pair of a data frame (D1) containing longitudes and
latitudes and unique point IDs, calculate the distance to every point in
another data frame (D2) also containing longitudes, latitudes and point
IDs, and return to a new variable in D1 the point ID of the nearest
element of D2.

 

Dramatis personae (mostly self-explanatory):

D1$long

D1$lat

D1$point.id

neighbor.id (to be created; for each ordered pair in D1 the point ID of
the nearest ordered pair in D2)

D2$long

D2$lat

D2$point.id

dist.geo (to be created)

 

I've been attempting this with nested for() loops that step through each
ordered pair in D1, and for each ordered pair [i] in D1 create a vector
(dist.geo) the length of D2$lat (say) that contains the distance
calculated from every ordered pair in D2 to the current ordered pair [i]
of D1, assign a value for D1$neighbor.id[i] based on
D2$point.id[(which.min(dist.geo)], and move on to the next ordered pair
of D1 to create another dist.geo, assign another neighbor.id, etc.

 

There are no missings/NAs in any of the longs, lats or point.ids,
although advice on generalizing this to deal with them would be
appreciated.

 

What I've been trying:

 

neighbor.id <- vector(length=length(D1$lat))
dist.geo <- vector(length=length(D2$lat))
for(i in 1:length(neighbor.id)){
for(j in 1:length(dist.geo)){
dist.geo[j] <- D1$lat[i]-D2$lat[j]}  

# Yes, I know that isn't the right formula, this is just a test

neighbor.id[i] <- D2$point.id[which.min(dist.geo)]}

 

What I get is a neighbor.id of the appropriate length, but which
consists only of the same value repeated.  Should I instead pass the
which.min(dist.geo) to a variable before exiting the inner (j) loop, and
reference that variable in place of which.min(dist.geo) in the last
line?  Or is this whole approach wrongheaded?

 

This should be elementary, I know, so I appreciate everyone's
forbearance.

 

Steven Sullivan, Ph.D.

Senior Associate

The QED Group, LLC

1250 Eye St. NW, Suite 802

Washington, DC  20005

ssullivan@qedgroupllc.com

202.898.1910.x15 (v)

202.898.0887 (f)

202.421.8161 (m)

 


	[[alternative HTML version deleted]]

Roger Bivand

2003-Jul-30 16:47 UTC

head link

[R] nested for() loops for returning a nearest point

On Wed, 30 Jul 2003, Steve Sullivan wrote:
> I'm trying to do the following:
> 
>  
> 
> For each ordered pair of a data frame (D1) containing longitudes and
> latitudes and unique point IDs, calculate the distance to every point in
> another data frame (D2) also containing longitudes, latitudes and point
> IDs, and return to a new variable in D1 the point ID of the nearest
> element of D2.
I think you can get quite a long way with the function rdist.earth() in 
the fields package:
> loc1 <- expand.grid(long=seq(-150,150,5), lat=seq(-70,70,5))
> dim(loc1)
[1] 1769    2> loc2 <- expand.grid(long=seq(-150,150,7.5), lat=seq(-70,70,7.5))
> dim(loc2)
[1] 779   2> dists <- rdist.earth(loc1, loc2)
> id12 <- apply(dists, 1, which.min)
> length(id12)
[1] 1769> id21 <- apply(dists, 2, which.min)
> length(id21)[1] 779

using id12 and id21 to choose the point.ids if need be
> loc2$point.id[id12]
Roger
> 
> Dramatis personae (mostly self-explanatory):
> 
> D1$long
> 
> D1$lat
> 
> D1$point.id
> 
> neighbor.id (to be created; for each ordered pair in D1 the point ID of
> the nearest ordered pair in D2)
> 
> D2$long
> 
> D2$lat
> 
> D2$point.id
> 
> dist.geo (to be created)
> 
>  
> 
> I've been attempting this with nested for() loops that step through
each
> ordered pair in D1, and for each ordered pair [i] in D1 create a vector
> (dist.geo) the length of D2$lat (say) that contains the distance
> calculated from every ordered pair in D2 to the current ordered pair [i]
> of D1, assign a value for D1$neighbor.id[i] based on
> D2$point.id[(which.min(dist.geo)], and move on to the next ordered pair
> of D1 to create another dist.geo, assign another neighbor.id, etc.
> 
>  
> 
> There are no missings/NAs in any of the longs, lats or point.ids,
> although advice on generalizing this to deal with them would be
> appreciated.
> 
>  
> 
> What I've been trying:
> 
>  
> 
> neighbor.id <- vector(length=length(D1$lat))
> dist.geo <- vector(length=length(D2$lat))
> for(i in 1:length(neighbor.id)){
> for(j in 1:length(dist.geo)){
> dist.geo[j] <- D1$lat[i]-D2$lat[j]}  
> 
> # Yes, I know that isn't the right formula, this is just a test
> 
> neighbor.id[i] <- D2$point.id[which.min(dist.geo)]}
> 
>  
> 
> What I get is a neighbor.id of the appropriate length, but which
> consists only of the same value repeated.  Should I instead pass the
> which.min(dist.geo) to a variable before exiting the inner (j) loop, and
> reference that variable in place of which.min(dist.geo) in the last
> line?  Or is this whole approach wrongheaded?
> 
>  
> 
> This should be elementary, I know, so I appreciate everyone's
> forbearance.
> 
>  
> 
> Steven Sullivan, Ph.D.
> 
> Senior Associate
> 
> The QED Group, LLC
> 
> 1250 Eye St. NW, Suite 802
> 
> Washington, DC  20005
> 
> ssullivan at qedgroupllc.com
> 
> 202.898.1910.x15 (v)
> 
> 202.898.0887 (f)
> 
> 202.421.8161 (m)
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no

Peter Dalgaard BSA

2003-Jul-30 16:58 UTC

head link

[R] nested for() loops for returning a nearest point

"Steve Sullivan" <ssullivan at qedgroupllc.com> writes:
> neighbor.id <- vector(length=length(D1$lat))
> dist.geo <- vector(length=length(D2$lat))
> for(i in 1:length(neighbor.id)){
> for(j in 1:length(dist.geo)){
> dist.geo[j] <- D1$lat[i]-D2$lat[j]}  
> 
> # Yes, I know that isn't the right formula, this is just a test
> 
> neighbor.id[i] <- D2$point.id[which.min(dist.geo)]}
> 
>  
> 
> What I get is a neighbor.id of the appropriate length, but which
> consists only of the same value repeated.  Should I instead pass the
> which.min(dist.geo) to a variable before exiting the inner (j) loop, and
> reference that variable in place of which.min(dist.geo) in the last
> line?  Or is this whole approach wrongheaded?
Wouldn't you want to define dist.geo with an abs() ? Otherwise, the
North Pole might have the largest negative difference every time...

Apart from that, things look sane to me (but the heat is killing me
today...). You can vectorize things as in 

dist.geo <- abs(D1$lat[i]-D2$lat)

and get rid of the inner loop, but the basic idea looks correct.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Liaw, Andy

2003-Jul-30 17:09 UTC

head link

[R] nested for() loops for returning a nearest point

> From: Steve Sullivan [mailto:ssullivan at qedgroupllc.com] 
> 
> I'm trying to do the following:
> 
> For each ordered pair of a data frame (D1) containing 
> longitudes and latitudes and unique point IDs, calculate the 
> distance to every point in another data frame (D2) also 
> containing longitudes, latitudes and point IDs, and return to 
> a new variable in D1 the point ID of the nearest element of D2.
> 
> Dramatis personae (mostly self-explanatory):
> 
> D1$long
> 
> D1$lat
> 
> D1$point.id
> 
> neighbor.id (to be created; for each ordered pair in D1 the 
> point ID of the nearest ordered pair in D2)
> 
> D2$long
> 
> D2$lat
> 
> D2$point.id
> 
> dist.geo (to be created)
> 
>  
> 
> I've been attempting this with nested for() loops that step 
> through each ordered pair in D1, and for each ordered pair 
> [i] in D1 create a vector
> (dist.geo) the length of D2$lat (say) that contains the 
> distance calculated from every ordered pair in D2 to the 
> current ordered pair [i] of D1, assign a value for 
> D1$neighbor.id[i] based on D2$point.id[(which.min(dist.geo)], 
> and move on to the next ordered pair of D1 to create another 
> dist.geo, assign another neighbor.id, etc.
> 
>  
> 
> There are no missings/NAs in any of the longs, lats or 
> point.ids, although advice on generalizing this to deal with 
> them would be appreciated.
> 
>  
> 
> What I've been trying:
> 
>  
> 
> neighbor.id <- vector(length=length(D1$lat))
> dist.geo <- vector(length=length(D2$lat))
> for(i in 1:length(neighbor.id)){
> for(j in 1:length(dist.geo)){
> dist.geo[j] <- D1$lat[i]-D2$lat[j]}  
> 
> # Yes, I know that isn't the right formula, this is just a test
> 
> neighbor.id[i] <- D2$point.id[which.min(dist.geo)]}
> 
>  
> 
> What I get is a neighbor.id of the appropriate length, but 
> which consists only of the same value repeated.  Should I 
> instead pass the
> which.min(dist.geo) to a variable before exiting the inner 
> (j) loop, and reference that variable in place of 
> which.min(dist.geo) in the last line?  Or is this whole 
> approach wrongheaded?
> 
For finding nearest neighbors, try the following:

set.seed(1)
d1 <- data.frame(long=rnorm(10), lat=rnorm(10), point.id=factor(1:10))
d2 <- data.frame(long=rnorm(5), lat=rnorm(5), point.id=factor(1:5))

## For each point in D1, find nearest neighbor in D2.
library(class)
d1$neighbor.id <- knn1(as.matrix(d2[,1:2]), as.matrix(d1[,1:2]),
d2$point.id)

If you really want do, you could modify knn1() (and the C code it calls) so
the distance is also returned.  Otherwise, you can just compute the distance
"by hand" in R once the nearest neighbors are found.

HTH,
Andy
>  
> 
> This should be elementary, I know, so I appreciate everyone's 
> forbearance.
> 
>  
> 
> Steven Sullivan, Ph.D.
> 
> Senior Associate
> 
> The QED Group, LLC
> 
> 1250 Eye St. NW, Suite 802
> 
> Washington, DC  20005
> 
> ssullivan at qedgroupllc.com
> 
> 202.898.1910.x15 (v)
> 
> 202.898.0887 (f)
> 
> 202.421.8161 (m)
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
> 
------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA),
and/or
its affiliates (which may be known outside the United States as Merck Frosst,
Merck Sharp & Dohme or MSD) that may be confidential, proprietary
copyrighted
and/or legally privileged, and is intended solely for the use of the
individual or entity named on this message.  If you are not the intended
recipient, and have received this message in error, please immediately return
this by e-mail and then delete it.

Reasonably Related Threads

Search for more maybe matching threads

R help - Jul 2003 - nested for() loops for returning a nearest point

[R] nested for() loops for returning a nearest point

[R] nested for() loops for returning a nearest point

[R] nested for() loops for returning a nearest point

[R] nested for() loops for returning a nearest point

Reasonably Related Threads