Think I'm missing something to understand what is going on with hist(...) http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.html For my example I count 7 unique years, however, on the histogram there only 6. It looks like the bin to the left of the tic mark on the x-axis represents the number of entries for that year, i.e. Frequency. I guess it looks like the bin for 1990 is missing. Is there a better way or a different histogram R command to use in order to see all the age bins and them for them to be aligned directly over the year tic mark on the x-axis? Thanks again for any insights that can be provided.
On Wed, Jun 03, 2009 at 09:00:11PM -0700, Jason Rupert wrote:> > http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.html > > For my example I count 7 unique years, however, on the histogram > there only 6. It looks like the bin to the left of the tic mark on > the x-axis represents the number of entries for that year, i.e. > Frequency. > > I guess it looks like the bin for 1990 is missing. Is there a > better way or a different histogram R command to use in order to see > all the age bins and them for them to be aligned directly over the > year tic mark on the x-axis?hist() is most useful for non-integer data. Each bin represents an interval and for discrete data several values can of course end up in the same bin - just as for floating point numbers. What you want is a graphical representation of counts for discret values (dates) - try plot(table(HouseYear_array)) or barplot(table(HouseYear_array)) instead. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/
On 04-Jun-09 04:00:11, Jason Rupert wrote:> > Think I'm missing something to understand what is going on with > hist(...) > > http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm > l > > For my example I count 7 unique years, however, on the histogram there > only 6. It looks like the bin to the left of the tic mark on the > x-axis represents the number of entries for that year, i.e. Frequency. > > I guess it looks like the bin for 1990 is missing. Is there a better > way or a different histogram R command to use in order to see all the > age bins and them for them to be aligned directly over the year tic > mark on the x-axis? > > Thanks again for any insights that can be provided.It's doing what it's supposed to -- which admitredly could be confusing when all your data lie on the exact boundaries between bins.>From ?hist, by default "include.lowest = TRUE, right = TRUE", and:If 'right = TRUE' (default), the histogram cells are intervals of the form '(a, b]', i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when 'include.lowest' is 'TRUE'. In your data: sort(HouseYear_array) [1] "1990" "1991" "1992" "1993" "1993" "1993" "1993" "1994" "1994" [10] "1994" "1994" "1995" "1995" "1995" "1995" "1995" "1995" "1995" [20] "1995" "1996" "1996" "1996" "1996" "1996" "1996" "1996" "1996" and, with H<-hist(as.numeric(HouseYear_array)) H$breaks # [1] 1990 1991 1992 1993 1994 1995 1996 so you get 2 (1990,1991) in the [1990-1] bin, 1 in the [1991-2] bin, 4 in [1992-3], and so on, exactly as observed. You can get what you're expecting to see by setting the 'breaks' parameter explicitly, and making sure the breakpoints do not coincide with data (which ensures that there is no confusion about what goes in which bin): hist(as.numeric(HouseYear_array),breaks=0.5+(1989:1996)) Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 04-Jun-09 Time: 11:13:22 ------------------------------ XFMail ------------------------------
Thank you again for all the R help folks who responded. I again appreciate all the help and insight and will investigate the options suggested. I guess I still doing a little head scratching at how the division occurred: It looks like the default hist(...) behavior is doing the following: HouseHist<-hist(as.numeric(HouseYear_array)) HouseHist$counts [1] 2 1 4 4 8 8 That would equate to the following grouping of the years: [90, 91] (91, 92] (92, 93] (93, 94] (94, 95] (95, 96] However, the true division is something like the following: table(as.numeric(HouseYear_array)) 1990 1991 1992 1993 1994 1995 1996 1 1 1 4 4 8 8 Seems like hist behavior could have been: (89, 90] (90, 91] (91, 92] (92, 93] (93, 94] (94, 95] (95, 96] Of course, I haven't had any coffee yet... This goes with the following example: http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm --- On Thu, 6/4/09, Ted.Harding at manchester.ac.uk <Ted.Harding at manchester.ac.uk> wrote:> From: Ted.Harding at manchester.ac.uk <Ted.Harding at manchester.ac.uk> > Subject: RE: [R] Understanding R Hist() Results... > To: R-help at r-project.org > Cc: "Jason Rupert" <jasonkrupert at yahoo.com> > Date: Thursday, June 4, 2009, 5:13 AM > On 04-Jun-09 04:00:11, Jason Rupert > wrote: > > > > Think I'm missing something to understand what is > going on with > > hist(...) > > > > http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm > > l > > > > For my example I count 7 unique years, however, on the > histogram there > > only 6.? It looks like the bin to the left of the > tic mark on the > > x-axis represents the number of entries for that year, > i.e. Frequency. > > > > I guess it looks like the bin for 1990 is > missing.? Is there a better > > way or a different histogram R command to use in order > to see all the > > age bins and them for them to be aligned directly over > the year tic > > mark on the x-axis?? > > > > Thanks again for any insights that can be provided. > > It's doing what it's supposed to -- which admitredly could > be confusing > when all your data lie on the exact boundaries between > bins. > > From ?hist, by default "include.lowest = TRUE, right > TRUE", and: > > ? If 'right = TRUE' (default), the histogram cells are > intervals of > ? the form '(a, b]', i.e., they include their > right-hand endpoint, > ? but not their left one, with the exception of the > first cell when > ? 'include.lowest' is 'TRUE'. > > In your data: > > sort(HouseYear_array) > [1] "1990" "1991" "1992" "1993" "1993" "1993" "1993" > "1994" "1994" > [10] "1994" "1994" "1995" "1995" "1995" "1995" "1995" > "1995" "1995" > [20] "1995" "1996" "1996" "1996" "1996" "1996" "1996" > "1996" "1996" > > and, with > > ? H<-hist(as.numeric(HouseYear_array)) > ? H$breaks > ? # [1] 1990 1991 1992 1993 1994 1995 1996 > > so you get 2 (1990,1991) in the [1990-1] bin, 1 in the > [1991-2] bin, > 4 in [1992-3], and so on, exactly as observed. > > You can get what you're expecting to see by setting the > 'breaks' > parameter explicitly, and making sure the breakpoints do > not > coincide with data (which ensures that there is no > confusion about > what goes in which bin): > > ? > hist(as.numeric(HouseYear_array),breaks=0.5+(1989:1996)) > > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 04-Jun-09? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ???Time: > 11:13:22 > ------------------------------ XFMail > ------------------------------ >