thr3ads.net - R help - [R] Understanding R Hist() Results... [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Jason Rupert

2009-Jun-04 04:00 UTC

[R] Understanding R Hist() Results...

Think I'm missing something to understand what is going on with hist(...)

http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.html

For my example I count 7 unique years, however, on the histogram there only 6. 
It looks like the bin to the left of the tic mark on the x-axis represents the
number of entries for that year, i.e. Frequency.

I guess it looks like the bin for 1990 is missing.  Is there a better way or a
different histogram R command to use in order to see all the age bins and them
for them to be aligned directly over the year tic mark on the x-axis?

Thanks again for any insights that can be provided.

Philipp Pagel

2009-Jun-04 07:50 UTC

head link

[R] Understanding R Hist() Results...

On Wed, Jun 03, 2009 at 09:00:11PM -0700, Jason Rupert
wrote:> 
> http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.html
> 
> For my example I count 7 unique years, however, on the histogram
> there only 6.  It looks like the bin to the left of the tic mark on
> the x-axis represents the number of entries for that year, i.e.
> Frequency.   
> 
> I guess it looks like the bin for 1990 is missing.  Is there a
> better way or a different histogram R command to use in order to see
> all the age bins and them for them to be aligned directly over the
> year tic mark on the x-axis?  
hist() is most useful for non-integer data. Each bin represents an
interval and for discrete data several values can of course end up in
the same bin - just as for floating point numbers. What you want is a
graphical representation of counts for discret values (dates) - try

plot(table(HouseYear_array))

  or

barplot(table(HouseYear_array))

instead.

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

(Ted Harding)

2009-Jun-04 10:13 UTC

head link

[R] Understanding R Hist() Results...

On 04-Jun-09 04:00:11, Jason Rupert wrote:> 
> Think I'm missing something to understand what is going on with
> hist(...)
> 
> http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm
> l
> 
> For my example I count 7 unique years, however, on the histogram there
> only 6.  It looks like the bin to the left of the tic mark on the
> x-axis represents the number of entries for that year, i.e. Frequency. 
> 
> I guess it looks like the bin for 1990 is missing.  Is there a better
> way or a different histogram R command to use in order to see all the
> age bins and them for them to be aligned directly over the year tic
> mark on the x-axis?  
> 
> Thanks again for any insights that can be provided.
It's doing what it's supposed to -- which admitredly could be confusing
when all your data lie on the exact boundaries between bins.
>From ?hist, by default "include.lowest = TRUE, right = TRUE", and:
  If 'right = TRUE' (default), the histogram cells are intervals of
  the form '(a, b]', i.e., they include their right-hand endpoint,
  but not their left one, with the exception of the first cell when
  'include.lowest' is 'TRUE'.

In your data:

 sort(HouseYear_array)
 [1] "1990" "1991" "1992" "1993"
"1993" "1993" "1993" "1994"
"1994"
[10] "1994" "1994" "1995" "1995"
"1995" "1995" "1995" "1995"
"1995"
[20] "1995" "1996" "1996" "1996"
"1996" "1996" "1996" "1996"
"1996"

and, with

  H<-hist(as.numeric(HouseYear_array))
  H$breaks
  # [1] 1990 1991 1992 1993 1994 1995 1996

so you get 2 (1990,1991) in the [1990-1] bin, 1 in the [1991-2] bin,
4 in [1992-3], and so on, exactly as observed.

You can get what you're expecting to see by setting the 'breaks'
parameter explicitly, and making sure the breakpoints do not
coincide with data (which ensures that there is no confusion about
what goes in which bin):

  hist(as.numeric(HouseYear_array),breaks=0.5+(1989:1996))

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Jun-09                                       Time: 11:13:22
------------------------------ XFMail ------------------------------

Jason Rupert

2009-Jun-04 11:52 UTC

head link

[R] Understanding R Hist() Results...

Thank you again for all the R help folks who responded.  I again appreciate all
the help and insight and will investigate the options suggested.

I guess I still doing a little head scratching at how the division occurred:

It looks like the default hist(...) behavior is doing the following:
HouseHist<-hist(as.numeric(HouseYear_array)) 
HouseHist$counts
[1] 2 1 4 4 8 8

That would equate to the following grouping of the years:
[90, 91] (91, 92] (92, 93] (93, 94] (94, 95] (95, 96] 


However, the true division is something like the following:
table(as.numeric(HouseYear_array))
1990 1991 1992 1993 1994 1995 1996 
   1    1    1    4    4    8    8 

Seems like hist behavior could have been:
(89, 90] (90, 91] (91, 92] (92, 93] (93, 94] (94, 95] (95, 96]

Of course, I haven't had any coffee yet...

This goes with the following example:
http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm


--- On Thu, 6/4/09, Ted.Harding at manchester.ac.uk <Ted.Harding at
manchester.ac.uk> wrote:
> From: Ted.Harding at manchester.ac.uk <Ted.Harding at
manchester.ac.uk>
> Subject: RE: [R] Understanding R Hist() Results...
> To: R-help at r-project.org
> Cc: "Jason Rupert" <jasonkrupert at yahoo.com>
> Date: Thursday, June 4, 2009, 5:13 AM
> On 04-Jun-09 04:00:11, Jason Rupert
> wrote:
> > 
> > Think I'm missing something to understand what is
> going on with
> > hist(...)
> > 
> >
http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm
> > l
> > 
> > For my example I count 7 unique years, however, on the
> histogram there
> > only 6.? It looks like the bin to the left of the
> tic mark on the
> > x-axis represents the number of entries for that year,
> i.e. Frequency. 
> > 
> > I guess it looks like the bin for 1990 is
> missing.? Is there a better
> > way or a different histogram R command to use in order
> to see all the
> > age bins and them for them to be aligned directly over
> the year tic
> > mark on the x-axis?? 
> > 
> > Thanks again for any insights that can be provided.
> 
> It's doing what it's supposed to -- which admitredly could
> be confusing
> when all your data lie on the exact boundaries between
> bins.
> 
> From ?hist, by default "include.lowest = TRUE, right > TRUE",
and:
> 
> ? If 'right = TRUE' (default), the histogram cells are
> intervals of
> ? the form '(a, b]', i.e., they include their
> right-hand endpoint,
> ? but not their left one, with the exception of the
> first cell when
> ? 'include.lowest' is 'TRUE'.
> 
> In your data:
> 
>  sort(HouseYear_array)
>  [1] "1990" "1991" "1992" "1993"
"1993" "1993" "1993"
> "1994" "1994"
> [10] "1994" "1994" "1995" "1995"
"1995" "1995" "1995"
> "1995" "1995"
> [20] "1995" "1996" "1996" "1996"
"1996" "1996" "1996"
> "1996" "1996"
> 
> and, with
> 
> ? H<-hist(as.numeric(HouseYear_array))
> ? H$breaks
> ? # [1] 1990 1991 1992 1993 1994 1995 1996
> 
> so you get 2 (1990,1991) in the [1990-1] bin, 1 in the
> [1991-2] bin,
> 4 in [1992-3], and so on, exactly as observed.
> 
> You can get what you're expecting to see by setting the
> 'breaks'
> parameter explicitly, and making sure the breakpoints do
> not
> coincide with data (which ensures that there is no
> confusion about
> what goes in which bin):
> 
> ?
> hist(as.numeric(HouseYear_array),breaks=0.5+(1989:1996))
> 
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 04-Jun-09? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ? ???Time:
> 11:13:22
> ------------------------------ XFMail
> ------------------------------
>

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Jun 2009 - Understanding R Hist() Results...

[R] Understanding R Hist() Results...

[R] Understanding R Hist() Results...

[R] Understanding R Hist() Results...

[R] Understanding R Hist() Results...

Seemingly Similar Threads