Thank you all very much for your time and suggestions. The link to
stackoverflow was very helpful. Here are some timings in case someone wants to
know. (I noticed that microbenchmark results vary, depending on how many
functions one tries to benchmark at a time. However, the "min" stays
about the
same)
# just to refresh, most of the code is from stackoverflow link provided by
Martin Morgan : http://stackoverflow.com/questions/16213029/more-efficient-
strategy-for-which-or-match
f0 <- function(v) length(which(v < 0))
f1 <- function(v) sum(v < 0)
f2 <- function(v) which.min(v < 0) - 1L
f3 <- function(x) { # binary search implemented in R
imin <- 1L
imax <- length(x)
while (imax >= imin) {
imid <- as.integer(imin + (imax - imin) / 2)
if (x[imid] >= 0)
imax <- imid - 1L
else
imin <- imid + 1L
}
imax
}
f3.c <- cmpfun(f3) # pre-compiled
# binary search in C
f4 <- cfunction(c(x = "numeric"), "
int imin = 0, imax = Rf_length(x) - 1, imid;
while (imax >= imin) {
imid = imin + (imax - imin) / 2;
if (REAL(x)[imid] >= 0)
imax = imid - 1;
else
imin = imid + 1;
}
return ScalarInteger(imax + 1);
")
# this one is separate suggestion by William Dunlap :
f5 <- function(v) {
tabulate(findInterval(v, c(-Inf, 0, 1, Inf)))[1]
}
>vec <- c(seq(-100,-1,length.out=1e6), rep(0,20),
seq(1,100,length.out=1e6))
# the identity of results was verified
>microbenchmark(f1(vec), f2(vec), f3(vec), f3.c(vec), f4(vec), f5(vec))
Unit: microseconds
expr min lq median uq max neval
f1(vec) 17054.233 17831.1385 18514.305 19512.4705 54603.435 100
f2(vec) 23624.353 25026.4265 26034.785 29322.1150 60014.458 100
f3(vec) 76.902 93.2340 111.834 116.8370 129.888 100
f3.c(vec) 21.883 30.7530 37.757 54.1250 62.939 100
f4(vec) 6.575 10.5885 30.389 31.9385 37.610 100
f5(vec) 35365.088 36767.6175 38317.103 40671.2000 69209.425 100
So, i'll try to go with the inline binary search and see if I can precompile
complex conditions.
Thank you, again, for your help!
Mikhail.
On Friday, April 26, 2013 20:52:27 Suzen, Mehmet wrote:> Hello Mikhail,
>
> I could suggest you to use ff package for fast access to large data
> structures:
>
> http://cran.r-project.org/web/packages/ff/index.html
> http://wsopuppenkiste.wiso.uni-goettingen.de/ff/ff_1.0/inst/doc/ff.pdf
>
> Best
>
> Mehmet
>
> On 26 April 2013 18:12, Mikhail Umorin <mikeumo@gmail.com> wrote:
> > Hello,
> >
> > I am dealing with numeric vectors 10^5 to 10^6 elements long. The
values
> > are sorted (with duplicates) in the vector (v). I am obtaining the
length
> > of vectors such as (v < c) or (v > c1 & v < c2), where c,
c1, c2 are some
> > scalar variables. What is the most efficient way to do this?
> >
> > I am using sum(v < c) since TRUE's are 1's and FALSE's
are 0's. This seems
> > to me more efficient than length(which(v < c)), but, please,
correct me
> > if I'm wrong. So, is there anything faster than what I already
use?
> >
> > I'm running R 2.14.2 on Linux kernel 3.4.34.
> >
> > I appreciate your time,
> >
> > Mikhail
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]