Jonathan Greenberg
2012-Dec-06 22:00 UTC
[R] Best way to coerce numerical data to a predetermined histogram bin?
Folks: Say I have a set of histogram breaks: breaks=c(1:10,15) # With bin ids: bin_ids=1:(length(breaks)-1) # and some data (note that some of it falls outside the breaks: data=runif(min=1,max=20,n=100) *** What is the MOST EFFICIENT way to "classify" data into the histogram bins (return the bin_ids) and, say, return NA if the value falls outside of the bins. By classify, I mean if the data value is greater than one break, and less than or equal to the next break, it gets assigned that bin's ID (note that length(breaks) = length(bin_ids)+1) Also note that, as per this example, the bins are not necessarily equal widths. I can, of course, cycle through each element of data, and then move through breaks, stopping when it finds the correct bin, but I feel like there is probably a faster (and more elegant) approach to this. Thoughts? --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 217-300-1924 AIM: jgrn307, MSN: jgrn307@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 [[alternative HTML version deleted]]
Jeff Newmiller
2012-Dec-06 22:21 UTC
[R] Best way to coerce numerical data to a predetermined histogram bin?
?cut --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Jonathan Greenberg <jgrn at illinois.edu> wrote:>Folks: > >Say I have a set of histogram breaks: > >breaks=c(1:10,15) > ># With bin ids: > >bin_ids=1:(length(breaks)-1) > ># and some data (note that some of it falls outside the breaks: > >data=runif(min=1,max=20,n=100) > >*** > >What is the MOST EFFICIENT way to "classify" data into the histogram >bins >(return the bin_ids) and, say, return NA if the value falls outside of >the >bins. > >By classify, I mean if the data value is greater than one break, and >less >than or equal to the next break, it gets assigned that bin's ID (note >that >length(breaks) = length(bin_ids)+1) > >Also note that, as per this example, the bins are not necessarily equal >widths. > >I can, of course, cycle through each element of data, and then move >through >breaks, stopping when it finds the correct bin, but I feel like there >is >probably a faster (and more elegant) approach to this. Thoughts? > >--j
Greg Snow
2012-Dec-06 22:36 UTC
[R] Best way to coerce numerical data to a predetermined histogram bin?
?findInterval On Thu, Dec 6, 2012 at 3:00 PM, Jonathan Greenberg <jgrn@illinois.edu>wrote:> Folks: > > Say I have a set of histogram breaks: > > breaks=c(1:10,15) > > # With bin ids: > > bin_ids=1:(length(breaks)-1) > > # and some data (note that some of it falls outside the breaks: > > data=runif(min=1,max=20,n=100) > > *** > > What is the MOST EFFICIENT way to "classify" data into the histogram bins > (return the bin_ids) and, say, return NA if the value falls outside of the > bins. > > By classify, I mean if the data value is greater than one break, and less > than or equal to the next break, it gets assigned that bin's ID (note that > length(breaks) = length(bin_ids)+1) > > Also note that, as per this example, the bins are not necessarily equal > widths. > > I can, of course, cycle through each element of data, and then move through > breaks, stopping when it finds the correct bin, but I feel like there is > probably a faster (and more elegant) approach to this. Thoughts? > > --j > > > > > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 607 South Mathews Avenue, MC 150 > Urbana, IL 61801 > Phone: 217-300-1924 > AIM: jgrn307, MSN: jgrn307@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gregory (Greg) L. Snow Ph.D. 538280@gmail.com [[alternative HTML version deleted]]