I like the idea of median and friends working on ordered factors. Just a couple
of thoughts on possible implementations.
Adding extra checks and functionality will slow down the function. For a single
evaluation on a given dataset this slowdown will not be noticeable, but inside
of a simulation, bootstrap, or other high iteration technique, it could matter.
I would suggest creating a core function that does just the calculations
(median, quantile, iqr) assuming that the data passed in is correct without
doing any checks or anything fancy. Then the user callable function (median et.
al.) would do the checks dispatch to other functions for anything fancy, etc.
then call the core function with the clean data. The common user would not
really notice a difference, but someone programming a high iteration technique
could clean the data themselves, then call the core function directly bypassing
the checks/branches.
Just out of curiosity (from someone who only learned from English (Americanized
at that) and not Italian texts), what would the median of [Low, Low, Medium,
High] be?
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-
> project.org] On Behalf Of Simone Giannerini
> Sent: Thursday, March 05, 2009 4:49 PM
> To: R-devel
> Subject: [Rd] quantile(), IQR() and median() for factors
>
> Dear all,
>
> from the help page of quantile:
>
> "x ??? numeric vectors whose sample quantiles are wanted. Missing
> values are ignored."
>
> from the help page of IQR:
>
> "x ??? a numeric vector."
>
> as a matter of facts it seems that both quantile() and IQR() do not
> check for the presence of a numeric input.
> See the following:
>
> set.seed(11)
> x <- rbinom(n=11,size=2,prob=.5)
> x <- factor(x,ordered=TRUE)
> x
> ?[1] 1 0 1 0 0 2 0 1 2 0 0
> Levels: 0 < 1 < 2
>
> > quantile(x)
> ? 0%? 25%? 50%? 75% 100%
> ?? 0 <NA>??? 0 <NA>??? 2
> Levels: 0 < 1 < 2
> Warning messages:
> 1: In Ops.ordered((1 - h), qs[i]) :
> ? '*' is not meaningful for ordered factors
> 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered
> factors
>
> > IQR(x)
> [1] 1
>
> whereas median has the check:
>
> > median(x)
> Error in median.default(x) : need numeric data
>
> I also take the opportunity to ask your comments on the following
> related subject:
>
> In my opinion it would be convenient that median() and the like
> (quantile(), IQR()) be implemented for ordered factors for which in
> fact
> they can be well defined. For instance, in this way functions like
> apply(x,FUN=median,...) could be used without the need of further
> processing for
> data frames that contain both numeric variables and ordered factors.
> If on the one hand, to my limited knowledge, in English introductory
> statistics
> textbooks the fact that the median is well defined for ordered
> categorical variables is only mentioned marginally,
> on the other hand, in the Italian Statistics literature this is often
> discussed in detail and this could mislead students and practitioners
> that might
> expect median() to work for ordered factors.
>
> In this message
>
> https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html
>
> Martin Maechler considers the possibility of doing such a job by
> allowing for extra arguments "low" and "high" as it is
done for mad().
> I am willing to give a contribution if requested, and comments are
> welcome.
>
> Thank you for the attention,
>
> kind regards,
>
> Simone
>
> > R.version
> ?????????????? _
> platform?????? i386-pc-mingw32
> arch?????????? i386
> os???????????? mingw32
> system???????? i386, mingw32
> status
> major????????? 2
> minor????????? 8.1
> year?????????? 2008
> month????????? 12
> day??????????? 22
> svn rev??????? 47281
> language?????? R
> version.string R version 2.8.1 (2008-12-22)
>
> ?LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY>
Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252
>
> --
> ______________________________________________________
>
> Simone Giannerini
> Dipartimento di Scienze Statistiche "Paolo Fortunati"
> Universita' di Bologna
> Via delle belle arti 41 - 40126 ?Bologna, ?ITALY
> Tel: +39 051 2098262 ?Fax: +39 051 232153
> http://www2.stat.unibo.it/giannerini/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel