thr3ads.net - R help - [R] Histograms, density, and relative frequencies [Jul 2004]

If this information is useful, please help other people find it:
Share via:

Bret Collier

2004-Jul-07 17:29 UTC

[R] Histograms, density, and relative frequencies

R-users,
         I have been using R for about 1 year, and I have run across a 
couple of graphics problem that I am not quite sure how to address.  I have 
read up on the email threads regarding the differences between density and 
relative frequencies (count/sum(count) on the R list, and I am hoping that 
someone could provide me with some advice/comments concerning my 
approach.  I will admit that some of the underlying mathematics of the 
density discussion are beyond my current understanding, but I am looking 
into it.

I have a data set (600,000 obs) used to parameterize a probabilistic causal 
model where each obs is a population response for one of 2 classes (either 
regs1 and regs2).  I have been attempting to create 1 marginal probability 
plot with 2 lines (one for each class).  Using my rather rough code, I 
created a plot that seems to adhere to the commonly used (although from 
what I can understand wrong) relative frequency histogram approach.

My rough code looks like this:

bk <- c(0, .05, .1, .15, .2, .25,.3, .35, 1)
par(mfrow=c(1, 1))
fawn1 <- hist(MFAWNRESID[regs1], plot=F, breaks=bk)
fawn2 <- hist(MFAWNRESID[regs2], plot=F, breaks=bk)
count1 <- fawn1$counts/sum(fawn1$counts)
count2 <- fawn2$counts/sum(fawn2$counts)
b <- c(0, .05, .1, .15, .2, .25, .3, .35)
plot(count1~b,xaxt="n", xlim=c(0, .5), ylim=c(0, .40),
pch=".", bty="l")
lines(spline(count1~b), lty=c(1), lwd=c(2), col="black")
lines(spline(count2~b), lty=c(2), lwd=c(2), col="black")
axis(side=1, at=c(0, .05, .1, .15, .2,  .25, .3, .35))

Using the above, I get frequency values for regs1 that look like this 
(which is the same as output for my probabilistic model):
 > count1
[1] 1.213378e-01 3.454324e-01 3.365343e-01 1.580839e-01 3.342101e-02
[6] 4.698426e-03 4.488942e-04 4.322685e-05

First, count1 is the frequency of occurrence within range 0-0.05, but when 
plotted is the value at b=0 and does not really represent the range?  Are 
there any suggestions on a technique to approach this?

Next:  Using the above code, the x-axis values end at 0.35, but the axis 
continues (because bk ends at 1)?  While there is the chance of occurrence 
out past .35, it is low and I want to extend the lines to about .35 and 
clip the x-axis.  But, I have been unable to figure out how to clip  Could 
someone point me in the correct direction?


TIA,

Bret A. Collier
Arkansas Cooperative Fish and Wildlife Research Unit
Department of Biological Sciences University of Arkansas

Adaikalavan Ramasamy

2004-Jul-07 18:36 UTC

head link

[R] Histograms, density, and relative frequencies

On Wed, 2004-07-07 at 18:29, Bret Collier wrote:> R-users,
>          I have been using R for about 1 year, and I have run across a 
> couple of graphics problem that I am not quite sure how to address.  I have
> read up on the email threads regarding the differences between density and 
> relative frequencies (count/sum(count) on the R list, and I am hoping that 
> someone could provide me with some advice/comments concerning my 
> approach.  I will admit that some of the underlying mathematics of the 
> density discussion are beyond my current understanding, but I am looking 
> into it.
> 
> I have a data set (600,000 obs) used to parameterize a probabilistic causal
> model where each obs is a population response for one of 2 classes (either 
> regs1 and regs2).  I have been attempting to create 1 marginal probability 
> plot with 2 lines (one for each class).  Using my rather rough code, I 
> created a plot that seems to adhere to the commonly used (although from 
> what I can understand wrong) relative frequency histogram approach.
> 
> My rough code looks like this:
> 
> bk <- c(0, .05, .1, .15, .2, .25,.3, .35, 1)
> par(mfrow=c(1, 1))
> fawn1 <- hist(MFAWNRESID[regs1], plot=F, breaks=bk)
> fawn2 <- hist(MFAWNRESID[regs2], plot=F, breaks=bk)
> count1 <- fawn1$counts/sum(fawn1$counts)
> count2 <- fawn2$counts/sum(fawn2$counts)
> b <- c(0, .05, .1, .15, .2, .25, .3, .35)
> plot(count1~b,xaxt="n", xlim=c(0, .5), ylim=c(0, .40),
pch=".", bty="l")
> lines(spline(count1~b), lty=c(1), lwd=c(2), col="black")
> lines(spline(count2~b), lty=c(2), lwd=c(2), col="black")
> axis(side=1, at=c(0, .05, .1, .15, .2,  .25, .3, .35))
Have you considered density() and plot.density() by any change ?
> Using the above, I get frequency values for regs1 that look like this 
> (which is the same as output for my probabilistic model):
>  > count1
> [1] 1.213378e-01 3.454324e-01 3.365343e-01 1.580839e-01 3.342101e-02
> [6] 4.698426e-03 4.488942e-04 4.322685e-05
I would tend to use the term proportion rather than frequency.
> First, count1 is the frequency of occurrence within range 0-0.05, but when 
> plotted is the value at b=0 and does not really represent the range?  Are 
> there any suggestions on a technique to approach this?
You can plot it in the mid-points like hist() does. fawn1$mids would
give you these values.
> Next:  Using the above code, the x-axis values end at 0.35, but the axis 
> continues (because bk ends at 1)?  While there is the chance of occurrence 
> out past .35, it is low and I want to extend the lines to about .35 and 
> clip the x-axis.  But, I have been unable to figure out how to clip  Could 
> someone point me in the correct direction?
In your plot() function, set xlim=c(0,0.35). If you mean 'clipping' as
in truncating the density, then you probably need to do re-adjust your
proportions such that they sum up to 1.

Reasonably Related Threads

Search for more apparently analagous threads

R help - Jul 2004 - Histograms, density, and relative frequencies

[R] Histograms, density, and relative frequencies

[R] Histograms, density, and relative frequencies

Reasonably Related Threads