Ben Fairbank
2008-Jan-30 19:59 UTC
[R] "hist" combines two lowest categories -- is there a workaround?
When preparing a series of histograms I found that hist was combining the two lowest categories or bins, 1 and 2. Specifying breaks, as illustrated below, resulted in the correct histogram: values <- sample(10,500,replace=TRUE) hist(values) hist(values,breaks = 0:10) Apparently, the number of values strictly less than 1 is shown in the first bin (and since none is less than 1, the value is 0), while the other bins appear to show the number of values less than or equal to the bin's upper bound. Is there a setting that will show the number of values less than or equal to the first bin's upper bound? And, while on the subject of hist, what commands govern the axis label line that shows the values of x? Is there an option that will cause it to show all values from lowest to highest rather than by jumps of 2 or 5? With thanks for any suggestions Version 2.5.0, Windows XP professional Ben Fairbank [[alternative HTML version deleted]]
Dieter Menne
2008-Jan-31 07:58 UTC
[R] "hist" combines two lowest categories -- is there a workaround?
Ben Fairbank <BEN <at> SSANET.COM> writes:> > When preparing a series of histograms I found that hist was combining > the two lowest categories or bins, 1 and 2. Specifying breaks, as > illustrated below, resulted in the correct histogram: > > values <- sample(10,500,replace=TRUE) > > hist(values) > > hist(values,breaks = 0:10) > > Apparently, the number of values strictly less than 1 is shown in the > first bin (and since none is less than 1, the value is 0), while the > other bins appear to show the number of values less than or equal to the > bin's upper bound. Is there a setting that will show the number of > values less than or equal to the first bin's upper bound? >For irregular spacing, it's best when you do the factoring first, for example with cut; and use histogram (lattice), which is more flexible than hist. Below an example I use for age groups: Dieter ----------------------- library(lattice) set.seed(4711) age = floor(rnorm(100,50,15)) ageg = cut(age %/% 10 *10,c(0,seq(20,70,10),100),included.lowest=TRUE, right=FALSE, ordered_result=TRUE) # default plot histogram(~ageg) # if you really need it: levels(ageg) = c("<20","20-29","30-39","40-49","50-59","60-69","70+") histogram(~ageg)