thr3ads.net - R help - [R] distance between two matrices [Jan 2004]

If this information is useful, please help other people find it:
Share via:

Sundar Dorai-Raj

2004-Jan-27 20:00 UTC

[R] distance between two matrices

Hi all,
   Say I have a matrix A with dimension m x 2 and matrix B with 
dimension n x 2. I would like to find the row in A that is closest to 
the each row in B. Here's an example (using a loop):

set.seed(1)
A <- matrix(runif(12), 6, 2) # 6 x 2
B <- matrix(runif(6), 3, 2)  # 3 x 2
m <- vector("numeric", nrow(B))
for(j in 1:nrow(B)) {
   d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
   m[j] <- which.min(d)
}

All I need is m[]. I would like to accomplish this without using the 
loop if possible, since for my real data n > 140K and m > 1K. I hope 
this makes sense.

Thanks,
Sundar

Prof Brian Ripley

2004-Jan-28 08:34 UTC

head link

[R] distance between two matrices

Sounds like knn classification.  See function knn1 in package class.
> knn(A, B, 1:nrow(A))
gives the same answers as your loop code, and is just a carefully tuned C 
equivalent.

There are faster ways to do this by preprocessing set A discussed e.g. in 
my PRNN book but your numbers took only 11s on my PC.

On Tue, 27 Jan 2004, Sundar Dorai-Raj wrote:
> Hi all,
>    Say I have a matrix A with dimension m x 2 and matrix B with 
> dimension n x 2. I would like to find the row in A that is closest to 
> the each row in B. Here's an example (using a loop):
> 
> set.seed(1)
> A <- matrix(runif(12), 6, 2) # 6 x 2
> B <- matrix(runif(6), 3, 2)  # 3 x 2
> m <- vector("numeric", nrow(B))
> for(j in 1:nrow(B)) {
>    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
>    m[j] <- which.min(d)
> }
> 
> All I need is m[]. I would like to accomplish this without using the 
> loop if possible, since for my real data n > 140K and m > 1K. I hope 
> this makes sense.
> 
> Thanks,
> Sundar
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
> 
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

"Jens Oehlschlägel"

2004-Jan-28 10:05 UTC

head link

[R] distance between two matrices

Sundar,

Have a look at "knn1" from package "class".

As I understand what you want is

as.integer( knn1(train=A, test=B, cl=1:nrow(A)) )


Best regards


Jens Oehlschl?gel

>   Say I have a matrix A with dimension m x 2 and matrix B with 
> dimension n x 2. I would like to find the row in A that is closest to 
> the each row in B. Here's an example (using a loop):
> set.seed(1)
> A <- matrix(runif(12), 6, 2) # 6 x 2
> B <- matrix(runif(6), 3, 2)  # 3 x 2
> m <- vector("numeric", nrow(B))
> for(j in 1:nrow(B)) {
>    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
>    m[j] <- which.min(d)
> }
> All I need is m[]. I would like to accomplish this without using the 
> loop if possible, since for my real data n > 140K and m > 1K. I hope 
> this makes sense.


--

Roger Bivand

2004-Jan-28 10:08 UTC

head link

[R] distance between two matrices

On Tue, 27 Jan 2004, Sundar Dorai-Raj wrote:
> Hi all,
>    Say I have a matrix A with dimension m x 2 and matrix B with 
> dimension n x 2. I would like to find the row in A that is closest to 
> the each row in B. Here's an example (using a loop):
> 
> set.seed(1)
> A <- matrix(runif(12), 6, 2) # 6 x 2
> B <- matrix(runif(6), 3, 2)  # 3 x 2
> m <- vector("numeric", nrow(B))
> for(j in 1:nrow(B)) {
>    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
>    m[j] <- which.min(d)
> }
> 
> All I need is m[]. I would like to accomplish this without using the 
> loop if possible, since for my real data n > 140K and m > 1K. I hope 
> this makes sense.
I think you need a quadtree of the larger set of points, the do lookup for 
buckets of the smaller one. There is a good deal of information on

http://www.cs.umd.edu/~brabec/quadtree/

This isn't an answer within R, the functionality in the gstat contributed
package doesn't seem to be at the user level, but it does point to the
same site at UMD. 

Roger

> 
> Thanks,
> Sundar
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
-- 
Roger Bivand
Econonic Geography Section, Department of Economics, Norwegian School of 
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, 
Norway, voice: +47-55959355, fax: +47-55959393; Roger.Bivand at nhh.no

Sean Davis

2004-Jan-28 11:16 UTC

head link

[R] distance between two matrices

Sundar,

I'm not sure how much faster (or slower) this might be (perhaps Professor
Ripley will help on this one), but one can "trick" a function from the
class
package called knn1 into doing this for you, I think.  From your example:
> set.seed(1)
> A <- matrix(runif(12), 6, 2) # 6 x 2
> B <- matrix(runif(6), 3, 2)  # 3 x 2
> m <- vector("numeric", nrow(B))
> for(j in 1:nrow(B)) {+    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
+    m[j] <- which.min(d)
+ }> m[1] 3 2 3

Now using knn1:

 > knn1(A,B,seq(1,nrow(A),1))
[1] 3 2 3
Levels: 1 2 3 4 5 6

Sean

----- Original Message -----
From: "Sundar Dorai-Raj" <sundar.dorai-raj at pdf.com>
To: "R-help" <r-help at stat.math.ethz.ch>
Sent: Tuesday, January 27, 2004 3:00 PM
Subject: [R] distance between two matrices

> Hi all,
>    Say I have a matrix A with dimension m x 2 and matrix B with
> dimension n x 2. I would like to find the row in A that is closest to
> the each row in B. Here's an example (using a loop):
>
> set.seed(1)
> A <- matrix(runif(12), 6, 2) # 6 x 2
> B <- matrix(runif(6), 3, 2)  # 3 x 2
> m <- vector("numeric", nrow(B))
> for(j in 1:nrow(B)) {
>    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
>    m[j] <- which.min(d)
> }
>
> All I need is m[]. I would like to accomplish this without using the
> loop if possible, since for my real data n > 140K and m > 1K. I hope
> this makes sense.
>
> Thanks,
> Sundar
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html>

"Hüsing, Johannes"

2004-Jan-28 12:42 UTC

head link

[R] distance between two matrices

> Hi all,
>    Say I have a matrix A with dimension m x 2 and matrix B with 
> dimension n x 2. I would like to find the row in A that is closest to 
> the each row in B. Here's an example (using a loop):
> 
> set.seed(1)
> A <- matrix(runif(12), 6, 2) # 6 x 2
> B <- matrix(runif(6), 3, 2)  # 3 x 2
> m <- vector("numeric", nrow(B))
make the lines below a function of a vector argument and 
apply it over the rows of B.

?apply for more info. You'll want to know about apply if
you want to avoid loops (which is a good approach).
> for(j in 1:nrow(B)) {
>    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
>    m[j] <- which.min(d)
> }
> 
> All I need is m[]. I would like to accomplish this without using the 
> loop if possible, since for my real data n > 140K and m > 1K. I hope 
> this makes sense.
Thing is, the above approach requires all data to be in main memory.
i hope this is not a problem.

Liaw, Andy

2004-Jan-28 13:10 UTC

head link

[R] distance between two matrices

You can try and see if knn1() in the `class' package (part of the VR bundle)
can handle the job in one shot.  If not, just do it in chunks of B.  For
your example:
> id <- 1:nrow(A)
> knn1(A, B, id)[1] 3 2 3
Levels: 1 2 3 4 5 6

(I believe it returns factor, but that can easily be converted back to
numeric.)

HTH,
Andy
> From: Sundar Dorai-Raj
> 
> Hi all,
>    Say I have a matrix A with dimension m x 2 and matrix B with 
> dimension n x 2. I would like to find the row in A that is closest to 
> the each row in B. Here's an example (using a loop):
> 
> set.seed(1)
> A <- matrix(runif(12), 6, 2) # 6 x 2
> B <- matrix(runif(6), 3, 2)  # 3 x 2
> m <- vector("numeric", nrow(B))
> for(j in 1:nrow(B)) {
>    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
>    m[j] <- which.min(d)
> }
> 
> All I need is m[]. I would like to accomplish this without using the 
> loop if possible, since for my real data n > 140K and m > 1K. I hope 
> this makes sense.
> 
> Thanks,
> Sundar
> 
> 

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

David Khabie-Zeitoune

2004-Jan-28 13:28 UTC

head link

[R] distance between two matrices

A bit cumbersome, but (somewhat) vectorised.

dist = apply(B, 1, FUN = function(x, M) {rowSums(sweep(M, 2, x,
"-")^2)
}, A)
m = row(t(dist))[t(dist==apply(dist,1,min))]

-----Original Message-----
From: Roger Bivand [mailto:Roger.Bivand at nhh.no] 
Sent: 28 January 2004 10:09
To: Sundar Dorai-Raj
Cc: R-help
Subject: Re: [R] distance between two matrices


On Tue, 27 Jan 2004, Sundar Dorai-Raj wrote:
> Hi all,
>    Say I have a matrix A with dimension m x 2 and matrix B with
> dimension n x 2. I would like to find the row in A that is closest to 
> the each row in B. Here's an example (using a loop):
> 
> set.seed(1)
> A <- matrix(runif(12), 6, 2) # 6 x 2
> B <- matrix(runif(6), 3, 2)  # 3 x 2
> m <- vector("numeric", nrow(B))
> for(j in 1:nrow(B)) {
>    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
>    m[j] <- which.min(d)
> }
> 
> All I need is m[]. I would like to accomplish this without using the
> loop if possible, since for my real data n > 140K and m > 1K. I hope 
> this makes sense.
I think you need a quadtree of the larger set of points, the do lookup
for 
buckets of the smaller one. There is a good deal of information on

http://www.cs.umd.edu/~brabec/quadtree/

This isn't an answer within R, the functionality in the gstat
contributed package doesn't seem to be at the user level, but it does
point to the same site at UMD. 

Roger

> 
> Thanks,
> Sundar
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
-- 
Roger Bivand
Econonic Geography Section, Department of Economics, Norwegian School of

Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, 
Norway, voice: +47-55959355, fax: +47-55959393; Roger.Bivand at nhh.no

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

"Hüsing, Johannes"

2004-Jan-28 15:25 UTC

head link

[R] distance between two matrices

> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> On Wed, 28 Jan 2004, "H?sing, Johannes" wrote:
> > ?apply for more info. You'll want to know about apply if
> > you want to avoid loops (which is a good approach).
> 
> Unfortunately apply() is a wrapper for a for() loop, so will 
> not help much 
> (if at all).
whoops; so is the choice between both a matter of style? Or is
it implementation-specific for R, and not generally true for S?

(I know from LISP that if using map, roughly the equivalent to
S' apply instead of the do loop, you cannot rely on the elements 
being executed in their original order, which might buy you 
efficiency depending on the implementation.)

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jan 2004 - distance between two matrices

[R] distance between two matrices

[R] distance between two matrices

[R] distance between two matrices

[R] distance between two matrices

[R] distance between two matrices

[R] distance between two matrices

[R] distance between two matrices

[R] distance between two matrices

[R] distance between two matrices

Possibly Parallel Threads