Gavin Rudge
2014-Jan-29 16:16 UTC
[R] generating a rank variable using date in a data.frame: overcoming a date origin error
I've got a simple data.frame of a facotr variable called 'case' which indicates one subject and a date of an event ('obs'), each row representing an observation. One case can have many (or few) observations over time in the data set. I've created a crude data.frame by way of a clunky but reproducible example. My objective is simply to create a variable that captures a rank of the occurrence of the events for each case in date order, 1 being the first up to n being the nth. To this end I've used the 'ave' command as below. set.seed(66) d<-(seq(as.Date("2001/01/01"),as.Date("2011/12/31"),"days")) obs<-(as.Date(sample(d,200,replace=TRUE))) obs<-as.data.frame(obs) case<-(case=(sample(LETTERS[1:8],200,replace=TRUE))) case<-as.data.frame(case) df<-cbind(case,obs) df$rank<-ave(df$obs,df$case, FUN=rank) This throws one of those "Error in as.Date.numeric(value) : 'origin' must be supplied" errors I get why this is happening, that I have not explicitly set the date origin when I set up the date variables, but my question is where do I do this? I've tried variations of the above where I've used an origin="1900-01-01".in various lines in the above code but I am still getting the error. Also by way of a supplementary question, in my actual application I am bringing in a lot of data from .csv files which contain data originally generated by the data owner in excel, so does this mean that I need to always set the origin at 1st Jan 1900? Any help gratefully recieved, GavinR
jim holtman
2014-Jan-29 16:53 UTC
[R] generating a rank variable using date in a data.frame: overcoming a date origin error
use 'xtfrm' so the ranking: set.seed(66) d<-(seq(as.Date("2001/01/01"),as.Date("2011/12/31"),"days")) obs<-(as.Date(sample(d,200,replace=TRUE))) obs<-as.data.frame(obs) case<-(case=(sample(LETTERS[1:8],200,replace=TRUE))) case<-as.data.frame(case) df<-cbind(case,obs) df$rank<-ave(xtfrm(df$obs),df$case, FUN=rank) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Jan 29, 2014 at 11:16 AM, Gavin Rudge <g.rudge at bham.ac.uk> wrote:> I've got a simple data.frame of a facotr variable called 'case' which indicates one subject and a date of an event ('obs'), each row representing an observation. One case can have many (or few) observations over time in the data set. > > I've created a crude data.frame by way of a clunky but reproducible example. > > My objective is simply to create a variable that captures a rank of the occurrence of the events for each case in date order, 1 being the first up to n being the nth. To this end I've used the 'ave' command as below. > > set.seed(66) > d<-(seq(as.Date("2001/01/01"),as.Date("2011/12/31"),"days")) > obs<-(as.Date(sample(d,200,replace=TRUE))) > obs<-as.data.frame(obs) > case<-(case=(sample(LETTERS[1:8],200,replace=TRUE))) > case<-as.data.frame(case) > df<-cbind(case,obs) > df$rank<-ave(df$obs,df$case, FUN=rank) > > This throws one of those "Error in as.Date.numeric(value) : 'origin' must be supplied" errors > > I get why this is happening, that I have not explicitly set the date origin when I set up the date variables, but my question is where do I do this? I've tried variations of the above where I've used an origin="1900-01-01".in various lines in the above code but I am still getting the error. > Also by way of a supplementary question, in my actual application I am bringing in a lot of data from .csv files which contain data originally generated by the data owner in excel, so does this mean that I need to always set the origin at 1st Jan 1900? > > Any help gratefully recieved, > > GavinR > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.