Hi,? I have a data set with various date formats in one column and not sure how to unify it.Here is a few formats: 02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014 I tried parse_date_time from lubridate library but it failed.Thanks so much.?Best,Farnoosh [[alternative HTML version deleted]]
Hey, Are all the dates connected? So no comma or space btw? Regards, Christoph> On 29 Jun 2017, at 2:02 pm, Farnoosh Sheikhi via R-help <r-help at r-project.org> wrote: > > Hi, > I have a data set with various date formats in one column and not sure how to unify it.Here is a few formats: > 02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014 > I tried parse_date_time from lubridate library but it failed.Thanks so much. Best,Farnoosh > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I doubt your actual file looks like the mess that made it to my email
software (below) because you posted HTML-format email. Read the Posting
Guide, and in particular figure out how to send plain text email.
You might try the "anytime" contributed package, though I suspect it
too
will choke on your mess. Otherwise, that will pretty much leave only a
brute-force series of regular expression tests to recognize which date
format patterns you have, and even that may not be able to get them all
right unless you know something that limits the range of possible formats.
Below is an example of how this can be done. There are many tutorials on
the internet that describe regular expressions... they are not unique to
R.
#-----
dta <- read.table( text"DtStr
020917
2/22/17
May-2-2015
May-12-15
", header=TRUE, as.is=TRUE )
dta$Dt <- as.Date( NA )
idx <- grepl(
"^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]+-[0-9]{4}$",
dta$DtStr, perl=TRUE, ignore.case = TRUE )
dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%B-%d-%Y" )
idx <- grepl(
"^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]+-[0-9]{2}$",
dta$DtStr, perl=TRUE, ignore.case = TRUE )
dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%B-%d-%y" )
idx <- grepl( "^(0[1-9]|1[0-2])[0-9]{2}[0-9]{2}$", dta$DtStr,
perl=TRUE )
dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%m%d%y" )
idx <- grepl( "^([1-9]|1[0-2])/[0-9]{1,2}/[0-9]{2}$", dta$DtStr,
perl=TRUE
)
dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%m/%d/%y" )
On Wed, 28 Jun 2017, Farnoosh Sheikhi via R-help wrote:
> Hi,?
> I have a data set with various date formats in one column and not sure how
to unify it.Here is a few formats:
>
02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014
> I tried parse_date_time from lubridate library but it failed.Thanks so
much.?Best,Farnoosh
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Hi Christoph,
There is "," between dates.Many thanks.?Best,Farnoosh
On Wednesday, June 28, 2017 9:05 PM, Christoph Puschmann <c.puschmann at
student.unsw.edu.au> wrote:
Hey,
Are all the dates connected? So no comma or space btw?
Regards,
Christoph
> On 29 Jun 2017, at 2:02 pm, Farnoosh Sheikhi via R-help <r-help at
r-project.org> wrote:
>
> Hi,
> I have a data set with various date formats in one column and not sure how
to unify it.Here is a few formats:
>
02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014
> I tried parse_date_time from lubridate library but it failed.Thanks so
much. Best,Farnoosh
>
>
>? ? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
Thanks Jeff. This is a nice way of solving this problem. What about the cases
with 0015-02-21?Many thanks.?Best,Farnoosh
On Wednesday, June 28, 2017 10:49 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us> wrote:
I doubt your actual file looks like the mess that made it to my email
software (below) because you posted HTML-format email. Read the Posting
Guide, and in particular figure out how to send plain text email.
You might try the "anytime" contributed package, though I suspect it
too
will choke on your mess. Otherwise, that will pretty much leave only a
brute-force series of regular expression tests to recognize which date
format patterns you have, and even that may not be able to get them all
right unless you know something that limits the range of possible formats.
Below is an example of how this can be done. There are many tutorials on
the internet that describe regular expressions... they are not unique to
R.
#-----
dta <- read.table( text"DtStr
020917
2/22/17
May-2-2015
May-12-15
", header=TRUE, as.is=TRUE )
dta$Dt <- as.Date( NA )
idx <- grepl(
"^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]+-[0-9]{4}$",
dta$DtStr, perl=TRUE, ignore.case = TRUE )
dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%B-%d-%Y" )
idx <- grepl(
"^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]+-[0-9]{2}$",
dta$DtStr, perl=TRUE, ignore.case = TRUE )
dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%B-%d-%y" )
idx <- grepl( "^(0[1-9]|1[0-2])[0-9]{2}[0-9]{2}$", dta$DtStr,
perl=TRUE )
dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%m%d%y" )
idx <- grepl( "^([1-9]|1[0-2])/[0-9]{1,2}/[0-9]{2}$", dta$DtStr,
perl=TRUE
)
dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%m/%d/%y" )
On Wed, 28 Jun 2017, Farnoosh Sheikhi via R-help wrote:
> Hi,?
> I have a data set with various date formats in one column and not sure how
to unify it.Here is a few formats:
>
02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014
> I tried parse_date_time from lubridate library but it failed.Thanks so
much.?Best,Farnoosh
>
>
> ??? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller? ? ? ? ? ? ? ? ? ? ? ? The? ? .....? ? ? .....? Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>? ? ? ? Basics: ##.#.? ? ? ##.#.? Live
Go...
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Live:? OO#.. Dead: OO#..? Playing
Research Engineer (Solar/Batteries? ? ? ? ? ? O.O#.? ? ? #.O#.? with
/Software/Embedded Controllers)? ? ? ? ? ? ? .OO#.? ? ? .OO#.? rocks...1k
---------------------------------------------------------------------------
[[alternative HTML version deleted]]