Hello, I have set up a data.frame and one of the columns contains a date of the form (with slashes as separators): mm/dd/yyyy I would like to use formulas on other columns in the data.frame organized by date, for example: tapply(var1, sort(date), mean) However, when I try sort(date) it sorts based on the first two entries in the date field: 9/1/2001 9/1/2002 9/1/2003 9/2/2001 ... 5.6 7.5 6.4 7.0 ... Instead of: 9/1/2001 9/2/2001 9/3/2001 9/4/2001 ... 5.6 6.1 7.2 6.8 ... I would greatly appreciate any help in sorting chronologically. Do I need to create separate columns for month, day, and year, and then use order() and then stipulate the hierarchy for which to sort the output? Or, is there some other more efficient way? Thanks, Jeff
Convert to POSIXct and sort. Note that tapply will coerce to a factor, so you need to create a factor with the levels sorted as you want them: just sorting date will not help. Something like udate <- unique(date) lev <- udate[sort.list(as.POSIXct(strptime(udate, "%m/%d/%Y")))] date <- factor(date, levels=lev) On Mon, 2 Feb 2004, Jeff Jorgensen wrote:> I have set up a data.frame and one of the columns contains a date of the > form (with slashes as separators): > > mm/dd/yyyy > > I would like to use formulas on other columns in the data.frame organized > by date, for example: > > tapply(var1, sort(date), mean) > > However, when I try sort(date) it sorts based on the first two entries in > the date field: > > 9/1/2001 9/1/2002 9/1/2003 9/2/2001 ... > 5.6 7.5 6.4 7.0 ... > > Instead of: > > 9/1/2001 9/2/2001 9/3/2001 9/4/2001 ... > 5.6 6.1 7.2 6.8 ... > > I would greatly appreciate any help in sorting chronologically. Do I need > to create separate columns for month, day, and year, and then use order() > and then stipulate the hierarchy for which to sort the output? Or, is > there some other more efficient way?-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Jeff Jorgensen <jcjorgensen at wisc.edu> writes:> Hello, > > I have set up a data.frame and one of the columns contains a date of > the form (with slashes as separators): > > mm/dd/yyyy > > I would like to use formulas on other columns in the data.frame > organized by date, for example: > > tapply(var1, sort(date), mean)I don't think that does what I think you think it does!> However, when I try sort(date) it sorts based on the first two entries > in the date field: > > 9/1/2001 9/1/2002 9/1/2003 9/2/2001 ... > 5.6 7.5 6.4 7.0 ... > > Instead of: > > 9/1/2001 9/2/2001 9/3/2001 9/4/2001 ... > 5.6 6.1 7.2 6.8 ... > > I would greatly appreciate any help in sorting chronologically. Do I > need to create separate columns for month, day, and year, and then use > order() and then stipulate the hierarchy for which to sort the output? > Or, is there some other more efficient way?You now know why the ISO standard has yyyy-mm-dd ... It's a bit awkward, but I think you need something like pdate <- as.POSIXct(strptime(date,"%m/%d/%Y")) tapply(var1, pdate, mean) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
I assume the dates are strings. If they are factors use as.character(date) in place of date below. str(date) will tell you what you have. It so happens that chron maps dates in your format to days since an origin (which sort properly) so you could try this: require(chron) z <- tapply( var1, date, mean ) z[order(chron(names(z)))] Note that date() is a function in R so you might want to choose a different variable name to prevent confusion. Date: Mon, 02 Feb 2004 16:16:13 -0600 From: Jeff Jorgensen <jcjorgensen at wisc.edu> To: <r-help at stat.math.ethz.ch> Subject: [R] sorting by date Hello, I have set up a data.frame and one of the columns contains a date of the form (with slashes as separators): mm/dd/yyyy I would like to use formulas on other columns in the data.frame organized by date, for example: tapply(var1, sort(date), mean) However, when I try sort(date) it sorts based on the first two entries in the date field: 9/1/2001 9/1/2002 9/1/2003 9/2/2001 ... 5.6 7.5 6.4 7.0 ... Instead of: 9/1/2001 9/2/2001 9/3/2001 9/4/2001 ... 5.6 6.1 7.2 6.8 ... I would greatly appreciate any help in sorting chronologically. Do I need to create separate columns for month, day, and year, and then use order() and then stipulate the hierarchy for which to sort the output? Or, is there some other more efficient way? Thanks, Jeff ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html