Antonio, Fabio Di Narzo
2006-Apr-27 10:31 UTC
[Rd] as.factor: changed behaviour for Date class
Dear all, I have noticed a little change in the behaviour of as.factor from R-2.2.1 to R-2.3.0, and can't find it in the NEWS. In R-2.3.0:> times <- 1:5 > class(times) <- "Date" > as.factor(times)[1] 1 2 3 4 5 Levels: 1 2 3 4 5 In R-2.2.1:> as.factor(times)[1] 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 Levels: 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 Is this the intended behaviour? Note that the change is reflected on other functions which seems to use as.factor internally, for example 'tapply'. Consider the following code: times <- 1:5 class(times) <- "Date" id <- rep(times, each=2) vals <- rep(1:2,5) tapply(vals, id, mean) Under R-2.2.1 this gives: 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 1.5 1.5 1.5 1.5 1.5 But under R-2.3.0 the output is: 1 2 3 4 5 1.5 1.5 1.5 1.5 1.5 Antonio, Fabio Di Narzo. [[alternative HTML version deleted]]
Prof Brian Ripley
2006-Apr-27 11:05 UTC
[R] [Rd] as.factor: changed behaviour for Date class
The change is not in as.factor: it is in sort (as called by factor) and it is documented in the NEWS file. Why do you expect as.factor to convert a Date object to character and then to factor? Please do the conversion explicitly. On Thu, 27 Apr 2006, Antonio, Fabio Di Narzo wrote:> Dear all, > I have noticed a little change in the behaviour of as.factor from R-2.2.1 to > R-2.3.0, and can't find it in the NEWS. > > In R-2.3.0: >> times <- 1:5 >> class(times) <- "Date" >> as.factor(times) > [1] 1 2 3 4 5 > Levels: 1 2 3 4 5 > > In R-2.2.1: >> as.factor(times) > [1] 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 > Levels: 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 > > Is this the intended behaviour? > Note that the change is reflected on other functions which seems to use > as.factor internally, for example 'tapply'. > Consider the following code: > > times <- 1:5 > class(times) <- "Date" > id <- rep(times, each=2) > vals <- rep(1:2,5) > tapply(vals, id, mean) > > Under R-2.2.1 this gives: > 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 > 1.5 1.5 1.5 1.5 1.5 > > But under R-2.3.0 the output is: > 1 2 3 4 5 > 1.5 1.5 1.5 1.5 1.5 > > Antonio, Fabio Di Narzo. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Have you received a reply to this post? I haven't seen one. > t9 <- 9:10 > class(t9) <- "Date" > t9 [1] "1970-01-10" "1970-01-11" > as.factor(t9) [1] 9 10 Levels: 9 10 I confirmed what you got with a slightly different example: > t9 <- 9:10 > class(t9) <- "Date" > t9 [1] "1970-01-10" "1970-01-11" # Apparently, 'class(times) <- "Date"' had the anticipated effect As you indicated, 'as.factor' in R-2.2.1 used character string representations of the dates as levels, while in R-2.3.0 it did not: > as.factor(t9) # in R-2.3.0 [1] 9 10 Levels: 9 10 > as.factor(t9) # in R-2.2.1 [1] 1970-01-10 1970-01-11 Levels: 1970-01-10 1970-01-11 R-2.2.1 and R-2.3.0 returned the same for the following: > as.numeric(as.factor(t9)) [1] 1 2 > sessionInfo() Version 2.3.0 (2006-04-24) i386-pc-mingw32 attached base packages: [1] "grid" "methods" "stats" "graphics" "grDevices" "utils" [7] "datasets" "base" other attached packages: tseries zoo quadprog lattice "0.10-0" "1.0-6" "1.4-8" "0.13-8" > sessionInfo() Version 2.3.0 (2006-04-24) i386-pc-mingw32 attached base packages: [1] "grid" "methods" "stats" "graphics" "grDevices" "utils" [7] "datasets" "base" other attached packages: tseries zoo quadprog lattice "0.10-0" "1.0-6" "1.4-8" "0.13-8" hope this helps. spencer graves Antonio, Fabio Di Narzo wrote:> Dear all, > I have noticed a little change in the behaviour of as.factor from R-2.2.1 to > R-2.3.0, and can't find it in the NEWS. > > In R-2.3.0: > >>times <- 1:5 >>class(times) <- "Date" >>as.factor(times) > > [1] 1 2 3 4 5 > Levels: 1 2 3 4 5 > > In R-2.2.1: > >>as.factor(times) > > [1] 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 > Levels: 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 > > Is this the intended behaviour? > Note that the change is reflected on other functions which seems to use > as.factor internally, for example 'tapply'. > Consider the following code: > > times <- 1:5 > class(times) <- "Date" > id <- rep(times, each=2) > vals <- rep(1:2,5) > tapply(vals, id, mean) > > Under R-2.2.1 this gives: > 1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 > 1.5 1.5 1.5 1.5 1.5 > > But under R-2.3.0 the output is: > 1 2 3 4 5 > 1.5 1.5 1.5 1.5 1.5 > > Antonio, Fabio Di Narzo. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel