R-users, I have been using R for about 1 year, and I have run across a couple of graphics problem that I am not quite sure how to address. I have read up on the email threads regarding the differences between density and relative frequencies (count/sum(count) on the R list, and I am hoping that someone could provide me with some advice/comments concerning my approach. I will admit that some of the underlying mathematics of the density discussion are beyond my current understanding, but I am looking into it. I have a data set (600,000 obs) used to parameterize a probabilistic causal model where each obs is a population response for one of 2 classes (either regs1 and regs2). I have been attempting to create 1 marginal probability plot with 2 lines (one for each class). Using my rather rough code, I created a plot that seems to adhere to the commonly used (although from what I can understand wrong) relative frequency histogram approach. My rough code looks like this: bk <- c(0, .05, .1, .15, .2, .25,.3, .35, 1) par(mfrow=c(1, 1)) fawn1 <- hist(MFAWNRESID[regs1], plot=F, breaks=bk) fawn2 <- hist(MFAWNRESID[regs2], plot=F, breaks=bk) count1 <- fawn1$counts/sum(fawn1$counts) count2 <- fawn2$counts/sum(fawn2$counts) b <- c(0, .05, .1, .15, .2, .25, .3, .35) plot(count1~b,xaxt="n", xlim=c(0, .5), ylim=c(0, .40), pch=".", bty="l") lines(spline(count1~b), lty=c(1), lwd=c(2), col="black") lines(spline(count2~b), lty=c(2), lwd=c(2), col="black") axis(side=1, at=c(0, .05, .1, .15, .2, .25, .3, .35)) Using the above, I get frequency values for regs1 that look like this (which is the same as output for my probabilistic model): > count1 [1] 1.213378e-01 3.454324e-01 3.365343e-01 1.580839e-01 3.342101e-02 [6] 4.698426e-03 4.488942e-04 4.322685e-05 First, count1 is the frequency of occurrence within range 0-0.05, but when plotted is the value at b=0 and does not really represent the range? Are there any suggestions on a technique to approach this? Next: Using the above code, the x-axis values end at 0.35, but the axis continues (because bk ends at 1)? While there is the chance of occurrence out past .35, it is low and I want to extend the lines to about .35 and clip the x-axis. But, I have been unable to figure out how to clip Could someone point me in the correct direction? TIA, Bret A. Collier Arkansas Cooperative Fish and Wildlife Research Unit Department of Biological Sciences University of Arkansas
Adaikalavan Ramasamy
2004-Jul-07 18:36 UTC
[R] Histograms, density, and relative frequencies
On Wed, 2004-07-07 at 18:29, Bret Collier wrote:> R-users, > I have been using R for about 1 year, and I have run across a > couple of graphics problem that I am not quite sure how to address. I have > read up on the email threads regarding the differences between density and > relative frequencies (count/sum(count) on the R list, and I am hoping that > someone could provide me with some advice/comments concerning my > approach. I will admit that some of the underlying mathematics of the > density discussion are beyond my current understanding, but I am looking > into it. > > I have a data set (600,000 obs) used to parameterize a probabilistic causal > model where each obs is a population response for one of 2 classes (either > regs1 and regs2). I have been attempting to create 1 marginal probability > plot with 2 lines (one for each class). Using my rather rough code, I > created a plot that seems to adhere to the commonly used (although from > what I can understand wrong) relative frequency histogram approach. > > My rough code looks like this: > > bk <- c(0, .05, .1, .15, .2, .25,.3, .35, 1) > par(mfrow=c(1, 1)) > fawn1 <- hist(MFAWNRESID[regs1], plot=F, breaks=bk) > fawn2 <- hist(MFAWNRESID[regs2], plot=F, breaks=bk) > count1 <- fawn1$counts/sum(fawn1$counts) > count2 <- fawn2$counts/sum(fawn2$counts) > b <- c(0, .05, .1, .15, .2, .25, .3, .35) > plot(count1~b,xaxt="n", xlim=c(0, .5), ylim=c(0, .40), pch=".", bty="l") > lines(spline(count1~b), lty=c(1), lwd=c(2), col="black") > lines(spline(count2~b), lty=c(2), lwd=c(2), col="black") > axis(side=1, at=c(0, .05, .1, .15, .2, .25, .3, .35))Have you considered density() and plot.density() by any change ?> Using the above, I get frequency values for regs1 that look like this > (which is the same as output for my probabilistic model): > > count1 > [1] 1.213378e-01 3.454324e-01 3.365343e-01 1.580839e-01 3.342101e-02 > [6] 4.698426e-03 4.488942e-04 4.322685e-05I would tend to use the term proportion rather than frequency.> First, count1 is the frequency of occurrence within range 0-0.05, but when > plotted is the value at b=0 and does not really represent the range? Are > there any suggestions on a technique to approach this?You can plot it in the mid-points like hist() does. fawn1$mids would give you these values.> Next: Using the above code, the x-axis values end at 0.35, but the axis > continues (because bk ends at 1)? While there is the chance of occurrence > out past .35, it is low and I want to extend the lines to about .35 and > clip the x-axis. But, I have been unable to figure out how to clip Could > someone point me in the correct direction?In your plot() function, set xlim=c(0,0.35). If you mean 'clipping' as in truncating the density, then you probably need to do re-adjust your proportions such that they sum up to 1.
Apparently Analagous Threads
- [LLVMdev] Concerning not relevant argument count in TableGen Patterns.
- [LLVMdev] Load Instruction that changes value of two registers
- Switch and integer
- an error about " return some vectors from some functions within a function"
- Subset based on multiple values