Hello, I have problem running the histogram function "hist". The area under the histogram is much lower than 1. Could anyone tell me what the problem is? Thanks, (The total number of observation is 992 (close to 1000), so the probability that 0<Y1<35 is approximately 0.277) miao rm(list=ls()) par(mfrow=c(1, 1)) Y <- cbind(matrix(35*0.5,1,277), matrix(35*1.5, 1, 146), matrix(35*2.5, 1, 99), matrix(35*3.5,1,80), matrix(35*4.5, 1, 69), matrix(35*5.5, 1, 63), matrix(35*6.5, 1, 52), matrix(35*7.5,1, 53), matrix(35*8.5, 1, 55), matrix(35*9.5, 1, 98)) Y1<-as.vector(Y) par(mar=c(4.5, 4.1, 3.1, 0)) hist(Y1, breaks=seq(0, 350, by=35), ylim=c(0, 0.3), col="grey80", freq=FALSE) par(mar=c(5.1, 4.1, 4.1, 2.1)) [[alternative HTML version deleted]]
Pascal Oettli
2012-Mar-12 08:03 UTC
[R] Re : A question on histogram - area much less than 1
Hi Miao, With option freq=FALSE, the function hist calculates densities, i.e. in your case, counts/total/length in-between breaks. h <- hist(Y1, breaks=seq(0, 350, by=35),freq=FALSE) If you calculate:? h$counts/sum(h$counts)/35 = h$density Regards, Pascal ----- Mail original ----- De?: jpm miao <miaojpm at gmail.com> ??: r-help at r-project.org Cc?: Envoy? le : Lundi 12 mars 2012 16h42 Objet?: [R] A question on histogram - area much less than 1 Hello, ? I have problem running the histogram function "hist". The area under the histogram is much lower than 1. Could anyone tell me what the problem is? Thanks, ? (The total number of observation is 992 (close to 1000), so the probability that 0<Y1<35 is approximately 0.277) miao rm(list=ls()) par(mfrow=c(1, 1)) Y <- cbind(matrix(35*0.5,1,277), matrix(35*1.5, 1, 146), matrix(35*2.5, 1, 99), matrix(35*3.5,1,80), matrix(35*4.5, 1, 69), matrix(35*5.5, 1, 63), matrix(35*6.5, 1, 52), matrix(35*7.5,1, 53), matrix(35*8.5, 1, 55), matrix(35*9.5, 1, 98)) Y1<-as.vector(Y) par(mar=c(4.5, 4.1, 3.1, 0)) hist(Y1, breaks=seq(0, 350, by=35), ylim=c(0, 0.3), col="grey80", freq=FALSE) par(mar=c(5.1, 4.1, 4.1, 2.1)) ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Why do you say that the area is much lower than 1? It is exactly equal to 1. How did you calculate this area? Your code seems extremely convoluted and confused. You could construct your (rather bizarre) vector Y1 much more simply as Y1 <- rep((0.5 + 0:9)*35,c(277,146,99,80,69,63,52,53,55,98)) What do you think you are accomplishing via that call to par() given *after* the plotting has been done? Why set ylim to a value that is so incommensurate with the heights of the histogram bars? cheers, Rolf Turner On 12/03/12 20:42, jpm miao wrote:> Hello, > > I have problem running the histogram function "hist". The area under the > histogram is much lower than 1. Could anyone tell me what the problem is? > Thanks, > (The total number of observation is 992 (close to 1000), so the > probability that 0<Y1<35 is approximately 0.277)Why are you approximating? The empirical probability is exactly equal to 277/992 = 0.2792339.> miao > > > > rm(list=ls()) > par(mfrow=c(1, 1)) > Y<- cbind(matrix(35*0.5,1,277), matrix(35*1.5, 1, 146), matrix(35*2.5, 1, > 99), matrix(35*3.5,1,80), matrix(35*4.5, 1, 69), matrix(35*5.5, 1, 63), > matrix(35*6.5, 1, 52), matrix(35*7.5,1, 53), matrix(35*8.5, 1, 55), > matrix(35*9.5, 1, 98)) > Y1<-as.vector(Y) > par(mar=c(4.5, 4.1, 3.1, 0)) > hist(Y1, breaks=seq(0, 350, by=35), ylim=c(0, 0.3), col="grey80", > freq=FALSE) > par(mar=c(5.1, 4.1, 4.1, 2.1))
You expected the sum to be 1 and it is:> a <- hist(Y1, breaks=seq(0, 350, by=35), col="grey80", freq=FALSE) > a$density*35[1] 0.27923387 0.14717742 0.09979839 0.08064516 0.06955645 0.06350806 [7] 0.05241935 0.05342742 0.05544355 0.09879032> sum(a$density*35)[1] 1 Note that the first density multiplied by 35 is .279 exactly what you expected and the sum of the densities multiplied by the width of each bar (35) is 1. The height of the bar is not the probability, the area of the bar is the probability. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Rolf Turner > Sent: Monday, March 12, 2012 3:53 AM > To: jpm miao > Cc: r-help at r-project.org > Subject: Re: [R] A question on histogram - area much less than 1 > > On 12/03/12 21:44, jpm miao wrote: > > Hello, > > > > Thanks very much for your kind response. Yes, if I multiply by the > > width "35", the area should be equal to one. > > > > How can I plot the probability bars rather than density bars? That > > is, I would like the height of the first bar to be 0.279, which is > the > > probability that the variable falls between 0 and 35. > > If you want probabilities rather than densities you should be using > barplot() rather > than histogram: > > TBL <- table(Y1) > barplot(TBL/sum(TBL)) > > cheers, > > Rolf Turner > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Apparently Analagous Threads
- Why can't R understand if(num!=NA)?
- How can I access an element of a string?
- Generating a bivariate joint t distribution in R
- Definition of "lag" is opposite in ts and xts objects!
- How to print the frequency table (produced by the command "table" to Excel