Hi:
On Sun, Oct 17, 2010 at 8:46 AM, Federico Bonofiglio
<bonoricus@gmail.com>wrote:
> Dear Masters,
>
> I have a question to submit
>
>
> consider the following script
>
> m<-4.95
> obs<-rpois(36,m) # i generate 36 realization from a poisson(m)
>
> hist(obs,freq=F)
> curve(dpois(x,m),add=T,col="red") #i wish to overlay on the
histogram the
> theorical poisson density function
>
A histogram is meant to be a crude density estimate of a continuous random
variable; similarly, curve() is meant to display a continuous function of x.
You have a [discrete] Poisson probability mass function (pmf) with a
corresponding sample of size 36 from it. These should be displayed as
probability mass functions and bar charts, respectively.
I wouldn't be surprised if this were homework, but it's an opportunity
to
teach you a few things about the graphics packages , including lattice and
ggplot2, which will be worth your time if you intend to pursue statistics
beyond the present. Luckily(?), I have a little time today.... :)
Discrete distributions should be plotted as discrete distributions. Bar
charts are good for this task - the separation placed between categories is
meant to denote that the values on the x-axis are discrete. You can
superimpose a theoretical Poisson pmf onto a bar chart rather easily in both
ggplot2 and lattice. I'll leave the base graphics solution to the end.
There are two ways to go:
(1) Compare the relative frequency bar chart of the sample with its
theoretical pmf;
(2) Compare the cumulative relative frequencies with the theoretical
cumulative pmf.
# Approach 1: Bar charts
# Preliminary step: data manipulation
Your goal is to overlay the theoretical pmf onto the empirical distribution,
but the latter has to be a relative frequency distribution in order for the
two to be comparable. The hardest part, in some sense, is to arrange the
data properly so that the observed relative frequencies and the Poisson
probabilities can all be plotted - from a sample of size 36, it is possible
that one or more x values will be missing and cause gaps in the plot. We
need to handle that contingency first before plotting. (I found that out the
hard way while proofing the code.) The below is a little messy, but it's
functional.
m <- 4.95
obs <- rpois(36, 4.95)
# Create a data frame from the frequency table generated from obs
d <- as.data.frame(table(obs))
d$Obs <- as.numeric(as.character(d$obs) ) # convert obs from factor to
numeric
d$rfreq <- with(d, Freq/sum(Freq)) # compute the relative frequencies
# Now create a data frame that will compute Poisson probabilities for x = 0
to max(d$Obs):
x <- seq(0, max(d$Obs))
dpm <- data.frame(x = x, pm = dpois(x, 4.95))
# Merge d and dpm, retaining all the rows of dpm - there may be NAs
# in some places with respect to d
dd <- merge(d, dpm, by.x = 'Obs', by.y = 'x', all.y = TRUE)
dd$obs <- NULL # remove factor obs
dd$Freq <- NULL # remove raw frequencies - no longer needed
# Convert NA relative frequencies to zero
dd$rfreq[is.na(dd$rfreq)] <- 0
dd
str(dd) # everything should be numeric
# Add the graphics packages to the session (make sure they're installed
first)
library(ggplot2)
library(lattice)
# Superimposed relative frequency bar charts with Poisson pmf:
# the first is the lattice version, the second from ggplot2.
# Lattice: the panel function is necessary to superimpose two different
# graphs onto the same graphics surface. Type 'h' plots vertical lines
from
0
# to the y-value for each x; lwd represents line width.
xyplot(pm ~ Obs, data = dd,
panel = function(x, y, ...) {
panel.barchart(x, dd$rfreq, horizontal = FALSE, col
'orange', ...)
panel.xyplot(x, y, type = 'h', col = 'blue',
lwd = 4)
},
xlab = 'x', ylab = '', ylim = c(0, 0.3))
# ggplot2 is a modular approach to graphics. One puts down graphical layers
# with geoms to which other graphical elements can be added. The resulting
# graph is pretty much the same as the xyplot, but the default background is
# different. Since geom_segment() doesn't use the fill 'aesthetic'
(it's
black
# no matter what), we adjust the legend accordingly. Notice how the quoted
# strings in fill = become the labels of the legend. Adjust the ylab() as
needed.
p <- ggplot(dd, aes(x = Obs))
p + geom_bar(aes(y = rfreq, fill = 'observed'), stat =
'identity') +
geom_segment(aes(xend = x, yend = 0, y = pm, fill = 'Poisson(4.95)'),
size = 4) +
scale_fill_manual('Type', values = c('observed' =
'orange',
'Poisson(4.95)' = 'black')) +
xlab('x') + ylab("") + ylim(0, 0.3)
Another way to compare empirical and theoretical distributions is to compare
their corresponding cdf's.
# Approach 2: Empirical cdfs:
# ggplot2's geom_step() will plot a step function from its input
# arguments. In this case, it is convenient to 'melt' the data frame dd
so
that
# the empirical and theoretical cumulative distributions can easily be
plotted
# separately in both ggplot2 and lattice.
# 'Melt' dd- we want the Obs values to remain the
# same for each stacked variable, so it is the 'id' variable:
ddm <- melt(dd, id = 'Obs')
# to be used for the legend...the value of the each argument
# needs to be parameterized because the
# number of rows in dd can vary from one sample to another
ddm$Type <- factor(rep(c('Observed', 'Poisson(4.95)'), each =
nrow(dd)))
# Concatenate cumulative relative frequency of sample and cdf of Poisson,
# add to melted data frame ddm:
dd$crf <- cumsum(dd$rfreq) # cumulative rel freq of sample
dd$cpm <- cumsum(dd$pm) # Poisson cdf
ddm$cdf <- with(dd, c(crf, cpm)) # add to ddm
# Plot of empirical and theoretical cdfs: first one is the ggplot2 version,
# second is the lattice version
# scale_colour_manual() is used to override default colors
r <- ggplot(ddm, aes(x = Obs, y = cdf, groups = Type, colour = Type))
r + geom_step(size = 1) +
scale_colour_manual(values = c('Observed' = 'red',
'Poisson(4.95)' 'blue')) +
xlab('x') + ylab("Cumulative relative
frequency/probability")
# For a different look...
last_plot() + theme_bw() + opts(legend.position = c(0.2, 0.9))
# lattice versions
# Different ways of specifying a legend key, differing in position. First
one
# is in upper left corner inside plot region, other two are outside it.
mykey1 = list(corner = c(0, 1),
text = list(levels(ddm$Type), cex = 0.8),
lines = list(lty = 1, col = c('red', 'blue'), lwd =
2),
title = 'Type',
cex.title = 1.1)
mykey2 <- list(space = 'right',
text = list(levels(ddm$Type), cex = 0.8),
lines = list(lty = 1, col = c('red', 'blue'), lwd =
2),
title = 'Type',
cex.title = 1.1)
mykey3 <- list(space = 'top',
text = list(levels(ddm$Type), cex = 0.8),
lines = list(lty = 1, col = c('red', 'blue'), lwd =
2),
title = 'Type',
cex.title = 1.1,
columns = 2)
xyplot(cdf ~ Obs, data = ddm, groups = Type, type = 's', col =
c('red',
'blue'), lwd = 2,
key = mykey1, xlab = 'x', ylab = 'Cumulative relative
frequency/probability')
# Substitute mykey2 and mykey3 for mykey1 to get different looks wrt the
legend.
I had to do the base graphics version just because, which turned out to be a
pain because you have to engage in trial and error to get the spacing
correct in the Poisson pmf graph (the second bar chart). Here is one way to
do it:
# Base graphics version:
u <- with(dd, barplot(rfreq, col = 'orange'))
# u is a matrix with one column containing thex-coordinates where the bars
are centered
with(dd, barplot(pm, width = 0.4, space = c(1.3, rep(2, length(u) - 1)),
col = 'blue', add = TRUE))
axis(1, at = as.vector(u), labels = c((1:length(u)) - 1)) # insert
x-axis labels
# Generate a legend - the coordinates will likely need to be adjusted from
sample to sample
# (first two arguments)
legend(8.5, 0.23, legend = c('Observed', 'Poisson(4.95)'), lty =
1, lwd = 2,
col = c('orange', 'blue'))
# Note: if you change width, you will have to adjust the spacing
accordingly. That's a fun little exercise.
I'll let you figure out how to do the cumulative distribution plot in base
graphics. I'm tired :)
HTH,
Dennis
errors are returned saing the x vector doesn't contain
integers....>
> really bizarre i can't give explanation (R version 2.11.1, no source
codes)
>
> would u be so kind to suggest me a solution???
>
> thank u
>
> FB student of statistics at milano bicocca
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]