Gavin Rudge
2014-Jan-29 16:16 UTC
[R] generating a rank variable using date in a data.frame: overcoming a date origin error
I've got a simple data.frame of a facotr variable called 'case'
which indicates one subject and a date of an event ('obs'), each row
representing an observation. One case can have many (or few) observations over
time in the data set.
I've created a crude data.frame by way of a clunky but reproducible example.
My objective is simply to create a variable that captures a rank of the
occurrence of the events for each case in date order, 1 being the first up to n
being the nth. To this end I've used the 'ave' command as below.
set.seed(66)
d<-(seq(as.Date("2001/01/01"),as.Date("2011/12/31"),"days"))
obs<-(as.Date(sample(d,200,replace=TRUE)))
obs<-as.data.frame(obs)
case<-(case=(sample(LETTERS[1:8],200,replace=TRUE)))
case<-as.data.frame(case)
df<-cbind(case,obs)
df$rank<-ave(df$obs,df$case, FUN=rank)
This throws one of those "Error in as.Date.numeric(value) :
'origin' must be supplied" errors
I get why this is happening, that I have not explicitly set the date origin when
I set up the date variables, but my question is where do I do this? I've
tried variations of the above where I've used an
origin="1900-01-01".in various lines in the above code but I am still
getting the error.
Also by way of a supplementary question, in my actual application I am bringing
in a lot of data from .csv files which contain data originally generated by the
data owner in excel, so does this mean that I need to always set the origin at
1st Jan 1900?
Any help gratefully recieved,
GavinR
jim holtman
2014-Jan-29 16:53 UTC
[R] generating a rank variable using date in a data.frame: overcoming a date origin error
use 'xtfrm' so the ranking:
set.seed(66)
d<-(seq(as.Date("2001/01/01"),as.Date("2011/12/31"),"days"))
obs<-(as.Date(sample(d,200,replace=TRUE)))
obs<-as.data.frame(obs)
case<-(case=(sample(LETTERS[1:8],200,replace=TRUE)))
case<-as.data.frame(case)
df<-cbind(case,obs)
df$rank<-ave(xtfrm(df$obs),df$case, FUN=rank)
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Wed, Jan 29, 2014 at 11:16 AM, Gavin Rudge <g.rudge at bham.ac.uk>
wrote:> I've got a simple data.frame of a facotr variable called 'case'
which indicates one subject and a date of an event ('obs'), each row
representing an observation. One case can have many (or few) observations over
time in the data set.
>
> I've created a crude data.frame by way of a clunky but reproducible
example.
>
> My objective is simply to create a variable that captures a rank of the
occurrence of the events for each case in date order, 1 being the first up to n
being the nth. To this end I've used the 'ave' command as below.
>
> set.seed(66)
>
d<-(seq(as.Date("2001/01/01"),as.Date("2011/12/31"),"days"))
> obs<-(as.Date(sample(d,200,replace=TRUE)))
> obs<-as.data.frame(obs)
> case<-(case=(sample(LETTERS[1:8],200,replace=TRUE)))
> case<-as.data.frame(case)
> df<-cbind(case,obs)
> df$rank<-ave(df$obs,df$case, FUN=rank)
>
> This throws one of those "Error in as.Date.numeric(value) :
'origin' must be supplied" errors
>
> I get why this is happening, that I have not explicitly set the date origin
when I set up the date variables, but my question is where do I do this?
I've tried variations of the above where I've used an
origin="1900-01-01".in various lines in the above code but I am still
getting the error.
> Also by way of a supplementary question, in my actual application I am
bringing in a lot of data from .csv files which contain data originally
generated by the data owner in excel, so does this mean that I need to always
set the origin at 1st Jan 1900?
>
> Any help gratefully recieved,
>
> GavinR
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.