I am wondering if there is a function which will do a join between 2 data.frames by minimum distance, as it is done in ArcGIS for example. For people who are not familiar with ArcGIS here it is an explanation: Suppose you have a data.frame with x, y, coordinates called track, and a second data frame with different x, y coordinates and some other attributes called classif. The track data.frame has a different number of rows than classif. I want to join the rows from classif to track in such a way that for each row in track I add only the row from classif that has coordinates closest to the coordinates in the track row (and hence minimum distance in between the 2 rows), and also add a new column which will record this minimum distance. Even if the coordinates in the 2 data.frames have same name, the values are not identical between the data.frames, so a merge by column is not what I am after. I did an R Site Search but nothing related to this particular type of join emerged. Thanks, Monica _________________________________________________________________ Live.
Hi Monica: On Sep 12, 2008, at 11:59 AM, Monica Pisica wrote:> > I am wondering if there is a function which will do a join between 2 > data.frames by minimum distance, as it is done in ArcGIS for > example. For people who are not familiar with ArcGIS here it is an > explanation: > > Suppose you have a data.frame with x, y, coordinates called track, > and a second data frame with different x, y coordinates and some > other attributes called classif. The track data.frame has a > different number of rows than classif. I want to join the rows from > classif to track in such a way that for each row in track I add only > the row from classif that has coordinates closest to the coordinates > in the track row (and hence minimum distance in between the 2 rows), > and also add a new column which will record this minimum distance. > Even if the coordinates in the 2 data.frames have same name, the > values are not identical between the data.frames, so a merge by > column is not what I am after. > > I did an R Site Search but nothing related to this particular type > of join emerged. >Have you looked at the package "field". It has distance functions. You would then have to do a little programming to find the minimum distance and add the columns. HTH, -Roy M. ********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center 1352 Lighthouse Avenue Pacific Grove, CA 93950-2097 e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address) voice: (831)-648-9029 fax: (831)-648-8440 www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected"
> I am wondering if there is a function which will do a join between 2 data.frames by minimum distance, as it is done in ArcGIS for example. For people who are not familiar with ArcGIS here it is an explanation: > > Suppose you have a data.frame with x, y, coordinates called track, and a second data frame with different x, y coordinates and some other attributes called classif. The track data.frame has a different number of rows than classif. I want to join the rows from classif to track in such a way that for each row in track I add only the row from classif that has coordinates closest to the coordinates in the track row (and hence minimum distance in between the 2 rows), and also add a new column which will record this minimum distance. Even if the coordinates in the 2 data.frames have same name, the values are not identical between the data.frames, so a merge by column is not what I am after.#----------------------------------------------------------------------- # get the distance between two points on the globe. # # args: # lat1 - latitude of first point. # long1 - longitude of first point. # lat2 - latitude of first point. # long2 - longitude of first point. # radius - average radius of the earth in km # # see: http://en.wikipedia.org/wiki/Great_circle_distance #----------------------------------------------------------------------- greatCircleDistance <- function(lat1, long1, lat2, long2, radius=6372.795){ sf <- pi/180 lat1 <- lat1*sf lat2 <- lat2*sf long1 <- long1*sf long2 <- long2*sf lod <- abs(long1-long2) radius * atan2( sqrt((cos(lat1)*sin(lod))**2 + (cos(lat2)*sin(lat1)-sin(lat2)*cos(lat1)*cos(lod))**2), sin(lat2)*sin(lat1)+cos(lat2)*cos(lat1)*cos(lod) ) } #----------------------------------------------------------------------- # Calculate the nearest point using latitude and longitude. # and attach the other args and nearest distance from the # other data.frame. # # args: # x as you describe 'track' # y as you describe 'classif' # xlongnme name of longitude variable in x # xlatnme name of latitude location variable in x # ylongnme name of longitude location variable on y # ylatnme name of latitude location variable on y #----------------------------------------------------------------------- dist.merge <- function(x, y, xlongnme, xlatnme, ylongnme, ylatnme){ tmp <- t(apply(x[,c(xlongnme, xlatnme)], 1, function(x, y){ dists <- apply(y, 1, function(x, y) greatCircleDistance(x[2], x[1], y[2], y[1]), x) cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,] } , y[,c(ylongnme, ylatnme)])) tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(ylongnme, ylatnme), names(y))]) row.names(tmp) <- NULL tmp } # demo track <- data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90)) classif <- data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90), v1=letters[1:20], v2=1:20) dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc')