Greetings All! As is often the case on this list, the answer may well be under my nose but I can't see it! I am looking for a "smart" way to do the following. Say I have a vector of values, X. I set up bins" for X, say with breaks at B = c(b1,b2,...,b11) covering the range of X, i.e. bins numbered 1:10. The value x is in bin i if B[i] < x <= B[i+1] What I seek is a vector, of the same length as X, which for each x in X gives the number of the bin that x is in. Clearly this can be done in an "unsmart" way by looping through all of X along with something like which( (B[1:10] < X[j]) & (X[j] <= B[2:11]) ) However, I feel that this naturally occurring task must have received a smarter solution! The hist() function already does this implicitly, since it has to decide which bin a value in X should be counted in. But it apparently then discards this information, since there is nothing relevant in the return values from hist(). So is there a "smart" function somewhere for this? The motivation here is that I have multivariate data, (X,Y,Z,...) and I wish to study how it behaves in each different bin for X. So the "bin index", ixB aY, derived for X can be applied to select corresponding subsets of the other variables. Rather than doing it the clumsy way each time, e.g. according to Y[(B[i] < X) & (X <= B[j+1])] I would like to have the bin index permanently available -- for example it allows easy logical combinations of bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)]. With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 31-Aug-11 Time: 09:00:27 ------------------------------ XFMail ------------------------------
Probably you're looking for function findInterval(). I hope it helps. Best, Dimitris On 8/31/2011 10:00 AM, Ted Harding wrote:> Greetings All! > As is often the case on this list, the answer may well > be under my nose but I can't see it! > > I am looking for a "smart" way to do the following. > > Say I have a vector of values, X. I set up bins" for X, > say with breaks at B = c(b1,b2,...,b11) covering the > range of X, i.e. bins numbered 1:10. The value x is in > bin i if B[i]< x<= B[i+1] > > What I seek is a vector, of the same length as X, which > for each x in X gives the number of the bin that x is in. > > Clearly this can be done in an "unsmart" way by looping > through all of X along with something like > > which( (B[1:10]< X[j])& (X[j]<= B[2:11]) ) > > However, I feel that this naturally occurring task must > have received a smarter solution! The hist() function > already does this implicitly, since it has to decide > which bin a value in X should be counted in. But it > apparently then discards this information, since there > is nothing relevant in the return values from hist(). > > So is there a "smart" function somewhere for this? > > The motivation here is that I have multivariate data, > (X,Y,Z,...) and I wish to study how it behaves in each > different bin for X. So the "bin index", ixB aY, derived > for X can be applied to select corresponding subsets of > the other variables. Rather than doing it the clumsy > way each time, e.g. according to > > Y[(B[i]< X)& (X<= B[j+1])] > > I would like to have the bin index permanently available > -- for example it allows easy logical combinations of > bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)]. > > With thanks, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding)<ted.harding at wlandres.net> > Fax-to-email: +44 (0)870 094 0861 > Date: 31-Aug-11 Time: 09:00:27 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
On 08/31/2011 06:00 PM, Ted Harding wrote:> Greetings All! > As is often the case on this list, the answer may well > be under my nose but I can't see it! > > I am looking for a "smart" way to do the following. > > Say I have a vector of values, X. I set up bins" for X, > say with breaks at B = c(b1,b2,...,b11) covering the > range of X, i.e. bins numbered 1:10. The value x is in > bin i if B[i]< x<= B[i+1] > > What I seek is a vector, of the same length as X, which > for each x in X gives the number of the bin that x is in. > > Clearly this can be done in an "unsmart" way by looping > through all of X along with something like > > which( (B[1:10]< X[j])& (X[j]<= B[2:11]) ) > > However, I feel that this naturally occurring task must > have received a smarter solution! The hist() function > already does this implicitly, since it has to decide > which bin a value in X should be counted in. But it > apparently then discards this information, since there > is nothing relevant in the return values from hist(). > > So is there a "smart" function somewhere for this? > > The motivation here is that I have multivariate data, > (X,Y,Z,...) and I wish to study how it behaves in each > different bin for X. So the "bin index", ixB aY, derived > for X can be applied to select corresponding subsets of > the other variables. Rather than doing it the clumsy > way each time, e.g. according to > > Y[(B[i]< X)& (X<= B[j+1])] > > I would like to have the bin index permanently available > -- for example it allows easy logical combinations of > bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)]. >Hi Ted, Are you looking for something like this? x<-sample(1:10,20,TRUE) x [1] 5 10 10 9 1 1 1 7 2 1 2 1 1 1 9 7 8 5 6 8 binx<-cut(x,breaks=0:10) as.numeric(binx) [1] 5 10 10 9 1 1 1 7 2 1 2 1 1 1 9 7 8 5 6 8 As binx is a factor, coercing it to numeric should return the bin number for each value. Jim