is there a better way to bucket observations into more-or-less evenly
sized buckets than this? it seems like this must be a common operation:
dt = data.frame(points=rnorm(1000),bucket=NA)
breaks = quantile(dt$points,seq(0:1,.1))
for (i in 2:length(breaks)) {
if (i == 2) {
ind = which(dt$points >= breaks[i-1] & dt$points <breaks[i])
} else {
ind = which(dt$points > breaks[i-1] & dt$points <breaks[i])
}
dt$bucket[ind] = i-1
}
thanks!
?cut
?quantile (perhaps, to define the breaks)
Bert Gunter
Genentech Nonclinical Biostatistics
650-467-7374
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Dan Dube
Sent: Monday, April 06, 2009 12:45 PM
To: r-help at r-project.org
Subject: [R] "bucketing" observations
is there a better way to bucket observations into more-or-less evenly
sized buckets than this? it seems like this must be a common operation:
dt = data.frame(points=rnorm(1000),bucket=NA)
breaks = quantile(dt$points,seq(0:1,.1))
for (i in 2:length(breaks)) {
if (i == 2) {
ind = which(dt$points >= breaks[i-1] & dt$points <breaks[i])
} else {
ind = which(dt$points > breaks[i-1] & dt$points <breaks[i])
}
dt$bucket[ind] = i-1
}
thanks!
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
try this: dat <- data.frame(vals = rnorm(1000)) breaks <- quantile(dat$vals, seq(0, 1, .1)) dat$bucket <- cut(dat$vals, breaks, labels = FALSE, include.lowest = TRUE) I hope it helps. Best, Dimitris Dan Dube wrote:> is there a better way to bucket observations into more-or-less evenly > sized buckets than this? it seems like this must be a common operation: > > dt = data.frame(points=rnorm(1000),bucket=NA) > > breaks = quantile(dt$points,seq(0:1,.1)) > for (i in 2:length(breaks)) { > if (i == 2) { > ind = which(dt$points >= breaks[i-1] & dt$points <> breaks[i]) > } else { > ind = which(dt$points > breaks[i-1] & dt$points <> breaks[i]) > } > dt$bucket[ind] = i-1 > } > > thanks! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
great! the "cut" function was exactly what i needed. thank you both!> -----Original Message----- > From: Dimitris Rizopoulos [mailto:d.rizopoulos at erasmusmc.nl] > Sent: Monday, April 06, 2009 4:01 PM > To: Dan Dube > Cc: r-help at r-project.org > Subject: Re: [R] "bucketing" observations > > try this: > > dat <- data.frame(vals = rnorm(1000)) > breaks <- quantile(dat$vals, seq(0, 1, .1)) dat$bucket <- > cut(dat$vals, breaks, labels = FALSE, include.lowest = TRUE) > > > I hope it helps. > > Best, > Dimitris > > > Dan Dube wrote: > > is there a better way to bucket observations into > more-or-less evenly > > sized buckets than this? it seems like this must be a > common operation: > > > > dt = data.frame(points=rnorm(1000),bucket=NA) > > > > breaks = quantile(dt$points,seq(0:1,.1)) for (i in > 2:length(breaks)) { > > if (i == 2) { > > ind = which(dt$points >= breaks[i-1] & dt$points <> > breaks[i]) > > } else { > > ind = which(dt$points > breaks[i-1] & dt$points <> > breaks[i]) > > } > > dt$bucket[ind] = i-1 > > } > > > > thanks! > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Dimitris Rizopoulos > Assistant Professor > Department of Biostatistics > Erasmus University Medical Center > > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands > Tel: +31/(0)10/7043478 > Fax: +31/(0)10/7043014 >