Hi, I'm facing this problem quite a lot, so it seems worthwhile to check to see what the most efficient solution is. I've two vectors x (values ordered) and y. I've ranges x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x > xn and want to construct a subvector yprime of y which consists of the first/last value of y whose x values are in the range. For example, x y 1 2 1 3 2 3 3 4 4 5 5 6 and let's say the ranges are 1 <= x < 3 and 3 <= x < 5. I should produce yprime as c( 2, 4 ) (if I ask for the first value of y whose x is in the range). [If there're no x values within a given range, output an NA.] Obviously I can do a loop and use which, etc., but it seems like there should be a better way. Thanks very much. A general solution would be nice, but if it helps to make the algorithm efficient, I'm happy to assume (a) x values are ordered (b) the ranges are always evenly spaced: for example, x in 0 to 10, 10 to 20, 20 to 30, etc.
Here I have a general solution. x need not be ordered and ranges need not be equally spaced. x <- c(1,1,2,3,4,5) y <- c(2,3,3,4,5,6) xcut <- cut(x,breaks=c(1,3,5),right=F) #If you want the FIRST value of y whose x are in the range wh <- !duplicated(xcut) & !is.na(xcut) y[wh] # [1] 2 4 #If you want the LAST value of y whose x are in the range revxcut <- rev(xcut) wh <- rev(!duplicated(revxcut) & !is.na(revxcut)) y[wh] # [1] 3 5 HTH, Jerome On May 16, 2003 11:16 am, R A F wrote:> Content-Length: 1109 > Status: R > X-Status: N > > Hi, I'm facing this problem quite a lot, so it seems worthwhile > to check to see what the most efficient solution is. > > I've two vectors x (values ordered) and y. I've ranges > x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x > xn > and want to construct a subvector yprime of y which consists > of the first/last value of y whose x values are in the range. > > For example, > > x y > 1 2 > 1 3 > 2 3 > 3 4 > 4 5 > 5 6 > > and let's say the ranges are 1 <= x < 3 and 3 <= x < 5. I > should produce yprime as c( 2, 4 ) (if I ask for the first value > of y whose x is in the range). [If there're no x values within > a given range, output an NA.] > > Obviously I can do a loop and use which, etc., but it seems > like there should be a better way. > > Thanks very much. > > A general solution would be nice, but if it helps to make the > algorithm efficient, I'm happy to assume > > (a) x values are ordered > (b) the ranges are always evenly spaced: for example, x in > 0 to 10, 10 to 20, 20 to 30, etc. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help-- Jerome Asselin (J?r?me), Statistical Analyst British Columbia Centre for Excellence in HIV/AIDS St. Paul's Hospital, 608 - 1081 Burrard Street Vancouver, British Columbia, CANADA V6Z 1Y6 Email: jerome at hivnet.ubc.ca Phone: 604 806-9112 Fax: 604 806-9044
Regarding the problem of finding first and last indices of a vector x in each of the closed-open intervals defined by the break points in b: x <- c(1,1,2,3,4,5) y <- c(2,3,3,4,5,6) b <- c(1,3,5) bseq <- 0:length(b) # interval numbers fi <- findInterval(x,b) # fi[i] is x[i]'s interval number ifirst <- match(bseq,fi) # indices of first x[i] in each interval ilast <- length(fi) - match(bseq,rev(fi)) + 1 # ditto for last At this point ifirst and ilast are the indices of the first x[i] and last x[i] in each interval and y[ifirst] and y[ilast] are corresponding values of y. In the above we used bseq = 0:3 to get all 4 intervals defined by the break points in b but if you only want the intervals [1,3) and [3,5) as per the example below then bseq should be set to 1:2. If x is ordered then the first x[i] in each interval is also the minimum and the last x[i] is the maximum. If x is unordered then the above still works for finding the first and last elements in each interval. However, if what is really wanted is the minimum and maximum in each interval then the following would do it in the unordered case. Note that xx is a logical matrix. (x|NA) replaces each FALSE with an NA and multiplying by seq(x) replaces each TRUE with its position. which.min or which.max finds the minimum or maximum position returning numeric(0), i.e. a zero length vector, if none. Subscripting by [1] just returns the same value unless it was numeric(0) in which case it returns NA, as required. xx <- outer(bseq,fi,"==") imin <- apply(xx,1,function(x)which.min((x|NA)*seq(x))[1]) imax <- apply(xx,1,function(x)which.max((x|NA)*seq(x))[1])> Hi, I'm facing this problem quite a lot, so it seems worthwhile > to check to see what the most efficient solution is. > > I've two vectors x (values ordered) and y. I've ranges > x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x > xn > and want to construct a subvector yprime of y which consists > of the first/last value of y whose x values are in the range. > > For example, > > x y > 1 2 > 1 3 > 2 3 > 3 4 > 4 5 > 5 6 > > and let's say the ranges are 1 <= x < 3 and 3 <= x < 5. I > should produce yprime as c( 2, 4 ) (if I ask for the first value > of y whose x is in the range). [If there're no x values within > a given range, output an NA.] > > Obviously I can do a loop and use which, etc., but it seems > like there should be a better way. > > Thanks very much. > > A general solution would be nice, but if it helps to make the > algorithm efficient, I'm happy to assume > > (a) x values are ordered > (b) the ranges are always evenly spaced: for example, x in > 0 to 10, 10 to 20, 20 to 30, etc.