thr3ads.net - R help - [R] Efficient subsetting [May 2003]

If this information is useful, please help other people find it:
Share via:

R A F

2003-May-16 18:16 UTC

[R] Efficient subsetting

Hi, I'm facing this problem quite a lot, so it seems worthwhile
to check to see what the most efficient solution is.

I've two vectors x (values ordered) and y.  I've ranges
x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x > xn
and want to construct a subvector yprime of y which consists
of the first/last value of y whose x values are in the range.

For example,

x   y
1   2
1   3
2   3
3   4
4   5
5   6

and let's say the ranges are 1 <= x < 3 and 3 <= x < 5.  I
should produce yprime as c( 2, 4 ) (if I ask for the first value
of y whose x is in the range).  [If there're no x values within
a given range, output an NA.]

Obviously I can do a loop and use which, etc., but it seems
like there should be a better way.

Thanks very much.

A general solution would be nice, but if it helps to make the
algorithm efficient, I'm happy to assume

(a) x values are ordered
(b) the ranges are always evenly spaced:  for example, x in
0 to 10, 10 to 20, 20 to 30, etc.

Jerome Asselin

2003-May-16 19:20 UTC

head link

[R] Efficient subsetting

Here I have a general solution. x need not be ordered and ranges need not 
be equally spaced.

x <- c(1,1,2,3,4,5)
y <- c(2,3,3,4,5,6)
xcut <- cut(x,breaks=c(1,3,5),right=F)

#If you want the FIRST value of y whose x are in the range
wh <- !duplicated(xcut) & !is.na(xcut)
y[wh]         #   [1] 2 4

#If you want the LAST value of y whose x are in the range
revxcut <- rev(xcut)
wh <- rev(!duplicated(revxcut) & !is.na(revxcut))
y[wh]         #   [1] 3 5

HTH,
Jerome

On May 16, 2003 11:16 am, R A F wrote:> Content-Length: 1109
> Status: R
> X-Status: N
>
> Hi, I'm facing this problem quite a lot, so it seems worthwhile
> to check to see what the most efficient solution is.
>
> I've two vectors x (values ordered) and y.  I've ranges
> x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x
> xn
> and want to construct a subvector yprime of y which consists
> of the first/last value of y whose x values are in the range.
>
> For example,
>
> x   y
> 1   2
> 1   3
> 2   3
> 3   4
> 4   5
> 5   6
>
> and let's say the ranges are 1 <= x < 3 and 3 <= x < 5.  I
> should produce yprime as c( 2, 4 ) (if I ask for the first value
> of y whose x is in the range).  [If there're no x values within
> a given range, output an NA.]
>
> Obviously I can do a loop and use which, etc., but it seems
> like there should be a better way.
>
> Thanks very much.
>
> A general solution would be nice, but if it helps to make the
> algorithm efficient, I'm happy to assume
>
> (a) x values are ordered
> (b) the ranges are always evenly spaced:  for example, x in
> 0 to 10, 10 to 20, 20 to 30, etc.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
-- 

Jerome Asselin (J?r?me), Statistical Analyst
British Columbia Centre for Excellence in HIV/AIDS
St. Paul's Hospital, 608 - 1081 Burrard Street
Vancouver, British Columbia, CANADA V6Z 1Y6
Email: jerome at hivnet.ubc.ca
Phone: 604 806-9112   Fax: 604 806-9044

Gabor Grothendieck

2003-May-18 16:57 UTC

head link

[R] Efficient subsetting

Regarding the problem of finding first and last indices of a vector
x in each of the closed-open intervals defined by the break points in b:

	x <- c(1,1,2,3,4,5) 
	y <- c(2,3,3,4,5,6) 
	b <- c(1,3,5)

	bseq <- 0:length(b)       # interval numbers
	fi <- findInterval(x,b)   # fi[i] is x[i]'s interval number
	ifirst <- match(bseq,fi)  # indices of first x[i] in each interval 
	ilast <- length(fi) - match(bseq,rev(fi)) + 1    # ditto for last

At this point ifirst and ilast are the indices of the first x[i] and last x[i]
in each interval and y[ifirst] and y[ilast] are corresponding values of y.
In the above we used bseq = 0:3 to get all 4 intervals defined by the break
points in b but if you only want the intervals [1,3) and [3,5) as per the
example below then bseq should be set to 1:2.

If x is ordered then the first x[i] in each interval is also the minimum
and the last x[i] is the maximum.

If x is unordered then the above still works for finding the first and last
elements in each interval.  However, if what is really wanted is the 
minimum and maximum in each interval then the following would do it in
the unordered case.  Note that xx is a logical matrix.  (x|NA) replaces
each FALSE with an NA and multiplying by seq(x) replaces each TRUE with its
position.  which.min or which.max finds the minimum or maximum position
returning
numeric(0), i.e. a zero length vector, if none.  Subscripting by [1] just 
returns the same value unless it was numeric(0) in which case it returns NA,
as required.

	xx <- outer(bseq,fi,"==")
	imin <- apply(xx,1,function(x)which.min((x|NA)*seq(x))[1])
	imax <- apply(xx,1,function(x)which.max((x|NA)*seq(x))[1])

> Hi, I'm facing this problem quite a lot, so it seems worthwhile 
> to check to see what the most efficient solution is. 
> 
> I've two vectors x (values ordered) and y. I've ranges 
> x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x
> xn
> and want to construct a subvector yprime of y which consists 
> of the first/last value of y whose x values are in the range. 
> 
> For example, 
> 
> x y 
> 1 2 
> 1 3 
> 2 3 
> 3 4 
> 4 5 
> 5 6 
> 
> and let's say the ranges are 1 <= x < 3 and 3 <= x < 5. I 
> should produce yprime as c( 2, 4 ) (if I ask for the first value 
> of y whose x is in the range). [If there're no x values within 
> a given range, output an NA.] 
> 
> Obviously I can do a loop and use which, etc., but it seems 
> like there should be a better way. 
> 
> Thanks very much. 
> 
> A general solution would be nice, but if it helps to make the 
> algorithm efficient, I'm happy to assume 
> 
> (a) x values are ordered 
> (b) the ranges are always evenly spaced: for example, x in 
> 0 to 10, 10 to 20, 20 to 30, etc.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - May 2003 - Efficient subsetting

[R] Efficient subsetting

[R] Efficient subsetting

[R] Efficient subsetting

Seemingly Similar Threads