I have a data set (.csv) with date (eg date of birth) information stored as
character vectors that I?m attempting to transform to POSIXct objects using the
package lubridate (1.7.4). The problem that I?m trying to address is that my two
digit years are invariably (?) parsed to 20xx. For example,
x <- c("45-12-03","01-06-24","64-9-15?)
ymd(x)
[1] "2045-12-03" "2001-06-24" "2064-09-15?
These should be parsed as ?1945-12-03? ?2001-06-24? ?1964-09-15?.
I've tried to use parse_date_time()?based on the documentation it looks to
me as though the argument cutoff_2000 should allow me to address this, but it?s
unclear to me how to implement this. As an example, I?ve tried
parse_date_time(x, cutoff_2000 = 01)
but get the following error message (and similar for other similar attempts,
including cutoff_2000 = 01L)
Error in parse_date_time(x, cutoff_2000 = 1) :
unused argument (cutoff_2000 = 1)
Thanks for your help!
Peter Nelson, PhD
Institute of Marine Sciences
University of California, Santa Cruz
Center for Ocean Health, Long Marine Lab
115 McAllistair Way
Santa Cruz, CA, 95076, USA
707-267-5896
[[alternative HTML version deleted]]
Hi!
For more solutions look at
https://stackoverflow.com/questions/33221603/r-lubridate-returns-unwanted-century-when-given-two-digit-year
The proposed solution:
some_dates <- c("3/18/75", "March 10, 1994",
"10/1/80", "June 15, 1979")
dates <- mdy(some_dates)
future_dates <- year(dates) > year(Sys.Date())
year(dates[future_dates]) <- year(dates[future_dates]) - 100
Should work for your case with one adaption (change mdy to ymd). At least it
worked on your example.
Yours,
/P?r
--
P?r Leijonhufvud .
par.leijonhufvud at regionjh.se
Sjukhuskemist
+46(0)63-153 376, +46-(0)70-242 7006
Laboratoriemedicin
?stersunds sjukhus
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Peter Nelson
via R-help
Sent: den 15 april 2020 20:31
To: r-help at r-project.org
Subject: [R] parsing DOB data
I have a data set (.csv) with date (eg date of birth) information stored as
character vectors that I?m attempting to transform to POSIXct objects using the
package lubridate (1.7.4). The problem that I?m trying to address is that my two
digit years are invariably (?) parsed to 20xx. For example,
x <- c("45-12-03","01-06-24","64-9-15?)
ymd(x)
[1] "2045-12-03" "2001-06-24" "2064-09-15?
These should be parsed as ?1945-12-03? ?2001-06-24? ?1964-09-15?.
I've tried to use parse_date_time()?based on the documentation it looks to
me as though the argument cutoff_2000 should allow me to address this, but it?s
unclear to me how to implement this. As an example, I?ve tried
parse_date_time(x, cutoff_2000 = 01)
but get the following error message (and similar for other similar attempts,
including cutoff_2000 = 01L)
Error in parse_date_time(x, cutoff_2000 = 1) :
unused argument (cutoff_2000 = 1)
Thanks for your help!
Peter Nelson, PhD
Institute of Marine Sciences
University of California, Santa Cruz
Center for Ocean Health, Long Marine Lab
115 McAllistair Way
Santa Cruz, CA, 95076, USA
707-267-5896
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Region J?mtland H?rjedalen behandlar dina personuppgifter vid kommunikation via
e-post. Hanteringen av personuppgifter f?ljer g?llande dataskyddslagstiftning.
Du kan l?sa mer om hur vi behandlar dina uppgifter p? https://regionjh.se/gdpr
Hi Peter,
One way is to process the strings before converting them to dates:
x2<-c("45-12-03","01-06-24","04-9-15","1901-03-04")
add_century<-function(x,changeover=68,previous=19,current=20) {
centuries<-sapply(sapply(x,strsplit,"-"),"[",1)
shortyears<-which(!(nchar(centuries)>2))
century<-rep("",length(x))
century[shortyears]<-ifelse(centuries[shortyears]>changeover,previous,current)
newx<-paste0(century,x)
return(newx)
}
add_century(x2,1)
Jim
On Fri, Apr 17, 2020 at 12:34 AM Peter Nelson via R-help
<r-help at r-project.org> wrote:>
> I have a data set (.csv) with date (eg date of birth) information stored as
character vectors that I?m attempting to transform to POSIXct objects using the
package lubridate (1.7.4). The problem that I?m trying to address is that my two
digit years are invariably (?) parsed to 20xx. For example,
>
> x <- c("45-12-03","01-06-24","64-9-15?)
> ymd(x)
> [1] "2045-12-03" "2001-06-24" "2064-09-15?
>
> These should be parsed as ?1945-12-03? ?2001-06-24? ?1964-09-15?.
>
> I've tried to use parse_date_time()?based on the documentation it looks
to me as though the argument cutoff_2000 should allow me to address this, but
it?s unclear to me how to implement this. As an example, I?ve tried
>
> parse_date_time(x, cutoff_2000 = 01)
>
> but get the following error message (and similar for other similar
attempts, including cutoff_2000 = 01L)
>
> Error in parse_date_time(x, cutoff_2000 = 1) :
> unused argument (cutoff_2000 = 1)
>
> Thanks for your help!
>
> Peter Nelson, PhD
> Institute of Marine Sciences
> University of California, Santa Cruz
> Center for Ocean Health, Long Marine Lab
> 115 McAllistair Way
> Santa Cruz, CA, 95076, USA
> 707-267-5896
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hi Peter,
I worked out a neat function to add the century to short dates. It
works fine on its own, but sadly it bombs when used with sapply. Maybe
someone else can point out my mistake:
add_century<-function(x,changeover=68,previous=19,current=20,pos=1,sep="-")
{
xsplit<-unlist(strsplit(x,sep))
# only add century to short dates
if(nchar(xsplit[pos]) < 3) {
century<-ifelse(as.numeric(xsplit[pos]) <= changeover,current,previous)
xsplit[pos]<-paste0(century,xsplit[[pos]])
}
return(paste(xsplit,collapse=sep))
}
# these work
add_century(x3[1],changeover=1,pos=3,sep="/")
add_century(x3[2],changeover=1,pos=3,sep="/")
add_century(x3[3],changeover=1,pos=3,sep="/")
# this doesn't
sapply(x3,add_century,list(changeover=1,pos=3,sep="/"))
Jim
On Fri, Apr 17, 2020 at 11:30 AM Jim Lemon <drjimlemon at gmail.com>
wrote:>
> Hi Peter,
> One way is to process the strings before converting them to dates:
>
>
x2<-c("45-12-03","01-06-24","04-9-15","1901-03-04")
> add_century<-function(x,changeover=68,previous=19,current=20) {
> centuries<-sapply(sapply(x,strsplit,"-"),"[",1)
> shortyears<-which(!(nchar(centuries)>2))
> century<-rep("",length(x))
>
century[shortyears]<-ifelse(centuries[shortyears]>changeover,previous,current)
> newx<-paste0(century,x)
> return(newx)
> }
> add_century(x2,1)
>
> Jim
>
> On Fri, Apr 17, 2020 at 12:34 AM Peter Nelson via R-help
> <r-help at r-project.org> wrote:
> >
> > I have a data set (.csv) with date (eg date of birth) information
stored as character vectors that I?m attempting to transform to POSIXct objects
using the package lubridate (1.7.4). The problem that I?m trying to address is
that my two digit years are invariably (?) parsed to 20xx. For example,
> >
> > x <- c("45-12-03","01-06-24","64-9-15?)
> > ymd(x)
> > [1] "2045-12-03" "2001-06-24" "2064-09-15?
> >
> > These should be parsed as ?1945-12-03? ?2001-06-24? ?1964-09-15?.
> >
> > I've tried to use parse_date_time()?based on the documentation it
looks to me as though the argument cutoff_2000 should allow me to address this,
but it?s unclear to me how to implement this. As an example, I?ve tried
> >
> > parse_date_time(x, cutoff_2000 = 01)
> >
> > but get the following error message (and similar for other similar
attempts, including cutoff_2000 = 01L)
> >
> > Error in parse_date_time(x, cutoff_2000 = 1) :
> > unused argument (cutoff_2000 = 1)
> >
> > Thanks for your help!
> >
> > Peter Nelson, PhD
> > Institute of Marine Sciences
> > University of California, Santa Cruz
> > Center for Ocean Health, Long Marine Lab
> > 115 McAllistair Way
> > Santa Cruz, CA, 95076, USA
> > 707-267-5896
> >
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.