thr3ads.net - R help - [R] Applying a function on n nearest neighbours [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Karl Ove Hufthammer

2009-Oct-30 09:28 UTC

[R] Applying a function on n nearest neighbours

I'm having a problem where I have to apply a function to a subset of a 
variable, where the subset is defined by the n nearest neighbours of a 
second variable.

Here's an example applied to the 'iris' dataset:

$ head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

For each row, I look at the value of Sepal.Length. I then figure out the 
n rows where the value of Sepal.Length is closest to that in the 
original row, and apply a function on the values of Sepal.Width to these 
rows (typically returning a scalar).

For example, setting n = 5 and calculcating the mean on a slightly 
modified dataset, based on the first row (Sepal.Length ~= 5.1):

$ set.seed(1)
$ iris[,1:4]=iris[,1:4]+runif(150)/100
$ x=iris$Sepal.Length[1]
$ (pos=which(order(abs(iris$Sepal.Length-x)) %in% 2:6))
[1] 18 26 40 42 52
$ mean(iris$Sepal.Width[pos])
[1] 3.086595

Now, I could easily use a 'for' loop or 'sapply' to do this for
all
rows, but I would think there is a better (and perhaps even faster?) 
way. Anyone know of a specific function in a package for this sort of 
thing?

Also note that this way of doing it won't necessarily work on the 
unmodified dataset, where a number of rows have the same values for 
'Sepal.Length', and the original row won't necessarily have
'order'
value equal to 1. (Exactly how to break ties when there are more than n 
number of observations with the same distance to the original row isn't 
very important, though. For example, using the ones with lowest row 
numbers would be an OK solution, or n random ones, would both OK.)

-- 
Karl Ove Hufthammer

Karl Ove Hufthammer

2009-Oct-30 09:39 UTC

head link

[R] Applying a function on n nearest neighbours

On Fri, 30 Oct 2009 10:28:49 +0100 Karl Ove Hufthammer <karl at
huftis.org>
wrote:> $ (pos=which(order(abs(iris$Sepal.Length-x)) %in% 2:6))
This should of course be:
(pos=order(abs(iris$Sepal.Length-x))[2:6])

-- 
Karl Ove Hufthammer

Apparently Analagous Threads

Search for more reasonably related threads

R help - Oct 2009 - Applying a function on n nearest neighbours

[R] Applying a function on n nearest neighbours

[R] Applying a function on n nearest neighbours

Apparently Analagous Threads