Hi all, Say I have a matrix A with dimension m x 2 and matrix B with dimension n x 2. I would like to find the row in A that is closest to the each row in B. Here's an example (using a loop): set.seed(1) A <- matrix(runif(12), 6, 2) # 6 x 2 B <- matrix(runif(6), 3, 2) # 3 x 2 m <- vector("numeric", nrow(B)) for(j in 1:nrow(B)) { d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 m[j] <- which.min(d) } All I need is m[]. I would like to accomplish this without using the loop if possible, since for my real data n > 140K and m > 1K. I hope this makes sense. Thanks, Sundar
Sounds like knn classification. See function knn1 in package class.> knn(A, B, 1:nrow(A))gives the same answers as your loop code, and is just a carefully tuned C equivalent. There are faster ways to do this by preprocessing set A discussed e.g. in my PRNN book but your numbers took only 11s on my PC. On Tue, 27 Jan 2004, Sundar Dorai-Raj wrote:> Hi all, > Say I have a matrix A with dimension m x 2 and matrix B with > dimension n x 2. I would like to find the row in A that is closest to > the each row in B. Here's an example (using a loop): > > set.seed(1) > A <- matrix(runif(12), 6, 2) # 6 x 2 > B <- matrix(runif(6), 3, 2) # 3 x 2 > m <- vector("numeric", nrow(B)) > for(j in 1:nrow(B)) { > d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 > m[j] <- which.min(d) > } > > All I need is m[]. I would like to accomplish this without using the > loop if possible, since for my real data n > 140K and m > 1K. I hope > this makes sense. > > Thanks, > Sundar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Sundar, Have a look at "knn1" from package "class". As I understand what you want is as.integer( knn1(train=A, test=B, cl=1:nrow(A)) ) Best regards Jens Oehlschl?gel> Say I have a matrix A with dimension m x 2 and matrix B with > dimension n x 2. I would like to find the row in A that is closest to > the each row in B. Here's an example (using a loop):> set.seed(1) > A <- matrix(runif(12), 6, 2) # 6 x 2 > B <- matrix(runif(6), 3, 2) # 3 x 2 > m <- vector("numeric", nrow(B)) > for(j in 1:nrow(B)) { > d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 > m[j] <- which.min(d) > }> All I need is m[]. I would like to accomplish this without using the > loop if possible, since for my real data n > 140K and m > 1K. I hope > this makes sense.--
On Tue, 27 Jan 2004, Sundar Dorai-Raj wrote:> Hi all, > Say I have a matrix A with dimension m x 2 and matrix B with > dimension n x 2. I would like to find the row in A that is closest to > the each row in B. Here's an example (using a loop): > > set.seed(1) > A <- matrix(runif(12), 6, 2) # 6 x 2 > B <- matrix(runif(6), 3, 2) # 3 x 2 > m <- vector("numeric", nrow(B)) > for(j in 1:nrow(B)) { > d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 > m[j] <- which.min(d) > } > > All I need is m[]. I would like to accomplish this without using the > loop if possible, since for my real data n > 140K and m > 1K. I hope > this makes sense.I think you need a quadtree of the larger set of points, the do lookup for buckets of the smaller one. There is a good deal of information on http://www.cs.umd.edu/~brabec/quadtree/ This isn't an answer within R, the functionality in the gstat contributed package doesn't seem to be at the user level, but it does point to the same site at UMD. Roger> > Thanks, > Sundar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Roger Bivand Econonic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway, voice: +47-55959355, fax: +47-55959393; Roger.Bivand at nhh.no
Sundar, I'm not sure how much faster (or slower) this might be (perhaps Professor Ripley will help on this one), but one can "trick" a function from the class package called knn1 into doing this for you, I think. From your example:> set.seed(1) > A <- matrix(runif(12), 6, 2) # 6 x 2 > B <- matrix(runif(6), 3, 2) # 3 x 2 > m <- vector("numeric", nrow(B)) > for(j in 1:nrow(B)) {+ d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 + m[j] <- which.min(d) + }> m[1] 3 2 3 Now using knn1: > knn1(A,B,seq(1,nrow(A),1)) [1] 3 2 3 Levels: 1 2 3 4 5 6 Sean ----- Original Message ----- From: "Sundar Dorai-Raj" <sundar.dorai-raj at pdf.com> To: "R-help" <r-help at stat.math.ethz.ch> Sent: Tuesday, January 27, 2004 3:00 PM Subject: [R] distance between two matrices> Hi all, > Say I have a matrix A with dimension m x 2 and matrix B with > dimension n x 2. I would like to find the row in A that is closest to > the each row in B. Here's an example (using a loop): > > set.seed(1) > A <- matrix(runif(12), 6, 2) # 6 x 2 > B <- matrix(runif(6), 3, 2) # 3 x 2 > m <- vector("numeric", nrow(B)) > for(j in 1:nrow(B)) { > d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 > m[j] <- which.min(d) > } > > All I need is m[]. I would like to accomplish this without using the > loop if possible, since for my real data n > 140K and m > 1K. I hope > this makes sense. > > Thanks, > Sundar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html>
> Hi all, > Say I have a matrix A with dimension m x 2 and matrix B with > dimension n x 2. I would like to find the row in A that is closest to > the each row in B. Here's an example (using a loop): > > set.seed(1) > A <- matrix(runif(12), 6, 2) # 6 x 2 > B <- matrix(runif(6), 3, 2) # 3 x 2 > m <- vector("numeric", nrow(B))make the lines below a function of a vector argument and apply it over the rows of B. ?apply for more info. You'll want to know about apply if you want to avoid loops (which is a good approach).> for(j in 1:nrow(B)) { > d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 > m[j] <- which.min(d) > } > > All I need is m[]. I would like to accomplish this without using the > loop if possible, since for my real data n > 140K and m > 1K. I hope > this makes sense.Thing is, the above approach requires all data to be in main memory. i hope this is not a problem.
You can try and see if knn1() in the `class' package (part of the VR bundle) can handle the job in one shot. If not, just do it in chunks of B. For your example:> id <- 1:nrow(A) > knn1(A, B, id)[1] 3 2 3 Levels: 1 2 3 4 5 6 (I believe it returns factor, but that can easily be converted back to numeric.) HTH, Andy> From: Sundar Dorai-Raj > > Hi all, > Say I have a matrix A with dimension m x 2 and matrix B with > dimension n x 2. I would like to find the row in A that is closest to > the each row in B. Here's an example (using a loop): > > set.seed(1) > A <- matrix(runif(12), 6, 2) # 6 x 2 > B <- matrix(runif(6), 3, 2) # 3 x 2 > m <- vector("numeric", nrow(B)) > for(j in 1:nrow(B)) { > d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 > m[j] <- which.min(d) > } > > All I need is m[]. I would like to accomplish this without using the > loop if possible, since for my real data n > 140K and m > 1K. I hope > this makes sense. > > Thanks, > Sundar > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
A bit cumbersome, but (somewhat) vectorised. dist = apply(B, 1, FUN = function(x, M) {rowSums(sweep(M, 2, x, "-")^2) }, A) m = row(t(dist))[t(dist==apply(dist,1,min))] -----Original Message----- From: Roger Bivand [mailto:Roger.Bivand at nhh.no] Sent: 28 January 2004 10:09 To: Sundar Dorai-Raj Cc: R-help Subject: Re: [R] distance between two matrices On Tue, 27 Jan 2004, Sundar Dorai-Raj wrote:> Hi all, > Say I have a matrix A with dimension m x 2 and matrix B with > dimension n x 2. I would like to find the row in A that is closest to > the each row in B. Here's an example (using a loop): > > set.seed(1) > A <- matrix(runif(12), 6, 2) # 6 x 2 > B <- matrix(runif(6), 3, 2) # 3 x 2 > m <- vector("numeric", nrow(B)) > for(j in 1:nrow(B)) { > d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2 > m[j] <- which.min(d) > } > > All I need is m[]. I would like to accomplish this without using the > loop if possible, since for my real data n > 140K and m > 1K. I hope > this makes sense.I think you need a quadtree of the larger set of points, the do lookup for buckets of the smaller one. There is a good deal of information on http://www.cs.umd.edu/~brabec/quadtree/ This isn't an answer within R, the functionality in the gstat contributed package doesn't seem to be at the user level, but it does point to the same site at UMD. Roger> > Thanks, > Sundar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Roger Bivand Econonic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway, voice: +47-55959355, fax: +47-55959393; Roger.Bivand at nhh.no ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] > On Wed, 28 Jan 2004, "H?sing, Johannes" wrote: > > ?apply for more info. You'll want to know about apply if > > you want to avoid loops (which is a good approach). > > Unfortunately apply() is a wrapper for a for() loop, so will > not help much > (if at all).whoops; so is the choice between both a matter of style? Or is it implementation-specific for R, and not generally true for S? (I know from LISP that if using map, roughly the equivalent to S' apply instead of the do loop, you cannot rely on the elements being executed in their original order, which might buy you efficiency depending on the implementation.)