thr3ads.net - R help - [R] Nearest Neighbor Algorithm in R -- again. [Feb 2004]

If this information is useful, please help other people find it:
Share via:

Hans W Borchers

2004-Feb-02 21:25 UTC

[R] Nearest Neighbor Algorithm in R -- again.

Several of the methods I use for analyzing large data sets, such as

  WinGamma: determining the level of noise in data
  Relief-F: estimating the influence of variables

depend on finding the k nearest neighbors of a point in a data frame or
matrix efficiently. (For large data sets it is not feasible to compute
the 'dist' matrix anyway.)

Seeing the proposed solution to "[R] distance between two matrices"
last month for finding _one_ nearest neighbor I came up with a solution
'nearest(A, n, k)' as appended.

Still, this is very clumsy and slow if you have to find the 3 nearest
neighbors for 1000 points in a data frame with 100000 entries at least
-- about 2 secs per data point on my computer or half an hour for an
application from real life.

Are there better/faster ways to perform this task using R functions?
Even better, is there a free implementation of kd-trees that I could
utilize (the one I found did not conform to the ANSI C standard)?

Someome pointed to 'spdep' of the R-Sig-Geo project, but
'knearneigh'
only accepts 2D data points.

Hans Werner Borchers
ABB Corporate Research, Germany
________________________________________________________________________

require (class)
nearest <- function (X, n, k=3)
##  Find k nearest neighbors of X[n, ] in the data frame
##  or matrix X, utilizing function knn1 k-times.
{
    N <- nrow(X)
    # inds contains the indices of nearest neighbors
    inds <- c(n); i <- 0
    while (i < k) {
        # use knn1 for one index...
        j  <- as.integer(knn1(X [-inds, ], X[n, ], 1:(N-length(inds))))
        # ...and change to true index of neighbor
        inds <- c(inds, setdiff(1:N, inds)[j])
        i <- i+1
    }
    # return nearest neighbor indices (without n, of course)
    return(inds[-1])
}

Prof Brian Ripley

2004-Feb-02 22:29 UTC

head link

[R] Nearest Neighbor Algorithm in R -- again.

Why not modify the C code of knn? (Not knn1)

On Mon, 2 Feb 2004, Hans W Borchers wrote:
> Several of the methods I use for analyzing large data sets, such as
> 
>   WinGamma: determining the level of noise in data
>   Relief-F: estimating the influence of variables
> 
> depend on finding the k nearest neighbors of a point in a data frame or
> matrix efficiently. (For large data sets it is not feasible to compute
> the 'dist' matrix anyway.)
> 
> Seeing the proposed solution to "[R] distance between two
matrices"
> last month for finding _one_ nearest neighbor I came up with a solution
> 'nearest(A, n, k)' as appended.
> 
> Still, this is very clumsy and slow if you have to find the 3 nearest
> neighbors for 1000 points in a data frame with 100000 entries at least
> -- about 2 secs per data point on my computer or half an hour for an
> application from real life.
> 
> Are there better/faster ways to perform this task using R functions?
> Even better, is there a free implementation of kd-trees that I could
> utilize (the one I found did not conform to the ANSI C standard)?
> 
> Someome pointed to 'spdep' of the R-Sig-Geo project, but
'knearneigh'
> only accepts 2D data points.
> 
> Hans Werner Borchers
> ABB Corporate Research, Germany
> ________________________________________________________________________
> 
> require (class)
> nearest <- function (X, n, k=3)
> ##  Find k nearest neighbors of X[n, ] in the data frame
> ##  or matrix X, utilizing function knn1 k-times.
> {
>     N <- nrow(X)
>     # inds contains the indices of nearest neighbors
>     inds <- c(n); i <- 0
>     while (i < k) {
>         # use knn1 for one index...
>         j  <- as.integer(knn1(X [-inds, ], X[n, ], 1:(N-length(inds))))
>         # ...and change to true index of neighbor
>         inds <- c(inds, setdiff(1:N, inds)[j])
>         i <- i+1
>     }
>     # return nearest neighbor indices (without n, of course)
>     return(inds[-1])
> }
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
> 
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Hans W Borchers

2004-Feb-03 06:41 UTC

head link

[R] Nearest Neighbor Algorithm in R -- again.

> /Why not modify the C code of knn? (Not knn1)/
Because I thought it would not be a good idea to apply 'knn' a 1000
times or more.

Instead I was looking for a procedure that returns a structure making
it easy to find nearest neighbors for any point one wants. This way I
used 'dist' for smaller data sets.

The next step could be to extent nearest neighbors to data frames with
factors generalizing, for example, the 'daisy' function.

But you might be right that looking at the 'knn' implementation will
be a faster road for the moment.

Hans Werner Borchers
ABB Corporate Research, Germany

Possibly Parallel Threads

Search for more maybe matching threads

R help - Feb 2004 - Nearest Neighbor Algorithm in R -- again.

[R] Nearest Neighbor Algorithm in R -- again.

[R] Nearest Neighbor Algorithm in R -- again.

[R] Nearest Neighbor Algorithm in R -- again.

Possibly Parallel Threads