On Jan 23, 2014, at 2:27 PM, Ruhil, Anirudh <ruhil at ohio.edu> wrote:
> A student asked: Why does R's summary() command yield the Mean and the
Median, quartiles, min, and max but was written to exclude the Mode?
>
> I said I had no clue, googled the question without much luck, and am now
posting it to see if anybody knows why.
>
> Ani
It has been discussed various times over the years. Presuming that there is
interest in knowing it, the problem is how to estimate the mode, depending upon
the nature of the data.
That is, if the data are discrete (eg. a factor), a simple tabulation using
table() can yield the one or perhaps more than one, most frequently occurring
value. In this case:
set.seed(1)
x <- sample(letters, 500, replace = TRUE)
tab <- table(x)
# Get the first maximum value
tab[which.max(tab)]
If the data are continuous, then strictly speaking the mode is not well defined
and you need to utilize something along the lines of a density estimation. In
that case:
set.seed(1)
x <- rnorm(500)
# Get the density estimates
dx <- density(x)
# Which value is at the peak
dx$x[which.max(dx$y)]
Visual inspection is also helpful in this case:
plot(dx)
abline(v = dx$x[which.max(dx$y)])
See ?table, ?density and ?which.max
Regards,
Marc Schwartz