thr3ads.net - R help - [R] Vectorizing closest match [Mar 2002]

If this information is useful, please help other people find it:
Share via:

Frank E Harrell Jr

2002-Mar-28 13:19 UTC

[R] Vectorizing closest match

If anyone has a very fast vectorized method for doing the following I would
appreciate some help.  I want to avoid outer() to limit memory problems for very
large n.

Let

x = real vector of length n
y = real vector of length n
w = real vector of length m, m typically less than n/2 but can be > n
z = real vector of length m

For w[i], i=1,,,m, find the value of x that is closest to w[i].  In the case of
ties, select one (optimally at random or just take the first match).  Let z[i] =
value of y corresponding to the closest x.

If there is a Fortran or C function (builtin or otherwise) that does much of the
work, all the better.

Thanks,

Frank
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard BSA

2002-Mar-28 13:50 UTC

head link

[R] Vectorizing closest match

Frank E Harrell Jr <fharrell at virginia.edu> writes:
> If anyone has a very fast vectorized method for doing the following I would
appreciate some help.  I want to avoid outer() to limit memory problems for very
large n.
> 
> Let
> 
> x = real vector of length n
> y = real vector of length n
> w = real vector of length m, m typically less than n/2 but can be > n
> z = real vector of length m
> 
> For w[i], i=1,,,m, find the value of x that is closest to w[i].  In the
case of ties, select one (optimally at random or just take the first match). 
Let z[i] = value of y corresponding to the closest x.
> 
> If there is a Fortran or C function (builtin or otherwise) that does much
of the work, all the better.
Sounds rather close to what approx() does in the stepfunction cases,
except that you want jumps at the midpoints between the x values
rather than at the x themselves.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

2002-Mar-28 16:54 UTC

head link

[R] Vectorizing closest match

On Thu, 28 Mar 2002, Frank E Harrell Jr wrote:
> If anyone has a very fast vectorized method for doing the following I
> would appreciate some help.  I want to avoid outer() to limit memory
> problems for very large n.
>
> Let
>
> x = real vector of length n
> y = real vector of length n
> w = real vector of length m, m typically less than n/2 but can be > n
> z = real vector of length m
>
> For w[i], i=1,,,m, find the value of x that is closest to w[i].  In the
> case of ties, select one (optimally at random or just take the first
> match).  Let z[i] = value of y corresponding to the closest x.
>
I believe the following will work in pre1.5.0 using the new findInterval
function (modulo any remaining off-by-one errors that should be easy to
fix)

# sort them
oo<-order(x)
x<-x[oo]
y<-y[oo]
# find the right interval for w
j<-pmax(1,findInterval(w,x))
xlow<-x[j] #or possibly j-1 and j or something
xhi<-x[j+1]
# which end of the interval?
bestj<-ifelse(xhi-w<w-xlow,j+1,j)
# get z
z<-y[j]

For length(x)=10^5 it takes 0.5s with length(w)=length(x)/3 and 2.5s with
length(w)=length(x)*3 on my machine with a current pre1.5.0

	-thomas


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Mike Lonergan

2002-Mar-28 16:55 UTC

head link

[R] Vectorizing closest match

Frank,

Apologies if I'm being very stupid, but I'd guess that reducing the
overall
complexity would help for large datasets. The following is not vectorised,
but should be linear apart from the order() operations, which are hopefully
O(nlogn):

>closest<-function(w,x,y)  {   ow<-order(w)
   ox<-order(x)
   ws<-w[ow]
   xs<-x[ox]
   ys<-y[ox]
   res<-rep(length(x),length(w))
   i<-1
   j<-1

   while(i<=length(w) && j<length(x))   {
      if ( ws[i]<=(xs[j]+xs[j+1])/2 )  {
         res[i]<-j
         i<-i+1
      } else
         j<-j+1
   }
   reshuffle<-order((1:length(w))[ow])
   z<-ys[res[reshuffle]]
   return(z)
 }


It is all based on sorting the data first so that only m+n comparisons are
needed rather than mn.

I'd be glad to know whether I've messed up/completely misunderstood the
problem (& I've just realised you're more interested in space than
time so
this may all be completely irrelevant.)

Cheers,

Mike.


     > -----Original Message-----
     > From: owner-r-help at stat.math.ethz.ch
     > [mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of Frank E
     > Harrell Jr
     > Sent: 28 March 2002 13:20
     > To: rhelp
     > Subject: [R] Vectorizing closest match
     >
     >
     > If anyone has a very fast vectorized method for doing the
     > following I would appreciate some help.  I want to avoid
     > outer() to limit memory problems for very large n.
     >
     > Let
     >
     > x = real vector of length n
     > y = real vector of length n
     > w = real vector of length m, m typically less than n/2 but can be >
n
     > z = real vector of length m
     >
     > For w[i], i=1,,,m, find the value of x that is closest to
     > w[i].  In the case of ties, select one (optimally at random
     > or just take the first match).  Let z[i] = value of y
     > corresponding to the closest x.
     >
     > If there is a Fortran or C function (builtin or otherwise)
     > that does much of the work, all the better.
     >
     > Thanks,
     >
     > Frank
     > --
     > Frank E Harrell Jr              Prof. of Biostatistics &
Statistics
     > Div. of Biostatistics & Epidem. Dept. of Health Evaluation
Sciences
     > U. Virginia School of Medicine
hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Reasonably Related Threads

Search for more seemingly similar threads

R help - Mar 2002 - Vectorizing closest match

[R] Vectorizing closest match

[R] Vectorizing closest match

[R] Vectorizing closest match

[R] Vectorizing closest match

Reasonably Related Threads