Hi,? I have a data set with various date formats in one column and not sure how to unify it.Here is a few formats: 02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014 I tried parse_date_time from lubridate library but it failed.Thanks so much.?Best,Farnoosh [[alternative HTML version deleted]]
Hey, Are all the dates connected? So no comma or space btw? Regards, Christoph> On 29 Jun 2017, at 2:02 pm, Farnoosh Sheikhi via R-help <r-help at r-project.org> wrote: > > Hi, > I have a data set with various date formats in one column and not sure how to unify it.Here is a few formats: > 02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014 > I tried parse_date_time from lubridate library but it failed.Thanks so much. Best,Farnoosh > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I doubt your actual file looks like the mess that made it to my email software (below) because you posted HTML-format email. Read the Posting Guide, and in particular figure out how to send plain text email. You might try the "anytime" contributed package, though I suspect it too will choke on your mess. Otherwise, that will pretty much leave only a brute-force series of regular expression tests to recognize which date format patterns you have, and even that may not be able to get them all right unless you know something that limits the range of possible formats. Below is an example of how this can be done. There are many tutorials on the internet that describe regular expressions... they are not unique to R. #----- dta <- read.table( text"DtStr 020917 2/22/17 May-2-2015 May-12-15 ", header=TRUE, as.is=TRUE ) dta$Dt <- as.Date( NA ) idx <- grepl( "^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]+-[0-9]{4}$", dta$DtStr, perl=TRUE, ignore.case = TRUE ) dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%B-%d-%Y" ) idx <- grepl( "^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]+-[0-9]{2}$", dta$DtStr, perl=TRUE, ignore.case = TRUE ) dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%B-%d-%y" ) idx <- grepl( "^(0[1-9]|1[0-2])[0-9]{2}[0-9]{2}$", dta$DtStr, perl=TRUE ) dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%m%d%y" ) idx <- grepl( "^([1-9]|1[0-2])/[0-9]{1,2}/[0-9]{2}$", dta$DtStr, perl=TRUE ) dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%m/%d/%y" ) On Wed, 28 Jun 2017, Farnoosh Sheikhi via R-help wrote:> Hi,? > I have a data set with various date formats in one column and not sure how to unify it.Here is a few formats: > 02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014 > I tried parse_date_time from lubridate library but it failed.Thanks so much.?Best,Farnoosh > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k ---------------------------------------------------------------------------
Hi Christoph, There is "," between dates.Many thanks.?Best,Farnoosh On Wednesday, June 28, 2017 9:05 PM, Christoph Puschmann <c.puschmann at student.unsw.edu.au> wrote: Hey, Are all the dates connected? So no comma or space btw? Regards, Christoph> On 29 Jun 2017, at 2:02 pm, Farnoosh Sheikhi via R-help <r-help at r-project.org> wrote: > > Hi, > I have a data set with various date formats in one column and not sure how to unify it.Here is a few formats: > 02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014 > I tried parse_date_time from lubridate library but it failed.Thanks so much. Best,Farnoosh > > >? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Thanks Jeff. This is a nice way of solving this problem. What about the cases with 0015-02-21?Many thanks.?Best,Farnoosh On Wednesday, June 28, 2017 10:49 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote: I doubt your actual file looks like the mess that made it to my email software (below) because you posted HTML-format email. Read the Posting Guide, and in particular figure out how to send plain text email. You might try the "anytime" contributed package, though I suspect it too will choke on your mess. Otherwise, that will pretty much leave only a brute-force series of regular expression tests to recognize which date format patterns you have, and even that may not be able to get them all right unless you know something that limits the range of possible formats. Below is an example of how this can be done. There are many tutorials on the internet that describe regular expressions... they are not unique to R. #----- dta <- read.table( text"DtStr 020917 2/22/17 May-2-2015 May-12-15 ", header=TRUE, as.is=TRUE ) dta$Dt <- as.Date( NA ) idx <- grepl( "^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]+-[0-9]{4}$", dta$DtStr, perl=TRUE, ignore.case = TRUE ) dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%B-%d-%Y" ) idx <- grepl( "^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]+-[0-9]{2}$", dta$DtStr, perl=TRUE, ignore.case = TRUE ) dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%B-%d-%y" ) idx <- grepl( "^(0[1-9]|1[0-2])[0-9]{2}[0-9]{2}$", dta$DtStr, perl=TRUE ) dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%m%d%y" ) idx <- grepl( "^([1-9]|1[0-2])/[0-9]{1,2}/[0-9]{2}$", dta$DtStr, perl=TRUE ) dta$Dt[ idx ] <- as.Date( dta$DtStr[ idx ], format="%m/%d/%y" ) On Wed, 28 Jun 2017, Farnoosh Sheikhi via R-help wrote:> Hi,? > I have a data set with various date formats in one column and not sure how to unify it.Here is a few formats: > 02091702/22/170221201703/17/160015-08-239/2/1500170806May-2-201522-March-2014 > I tried parse_date_time from lubridate library but it failed.Thanks so much.?Best,Farnoosh > > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.--------------------------------------------------------------------------- Jeff Newmiller? ? ? ? ? ? ? ? ? ? ? ? The? ? .....? ? ? .....? Go Live... DCN:<jdnewmil at dcn.davis.ca.us>? ? ? ? Basics: ##.#.? ? ? ##.#.? Live Go... ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Live:? OO#.. Dead: OO#..? Playing Research Engineer (Solar/Batteries? ? ? ? ? ? O.O#.? ? ? #.O#.? with /Software/Embedded Controllers)? ? ? ? ? ? ? .OO#.? ? ? .OO#.? rocks...1k --------------------------------------------------------------------------- [[alternative HTML version deleted]]