Farrel Buchinsky
2009-Mar-24 00:06 UTC
[R] two different date formats in the same variable
How does one convert to a date format when survey respondents have used two different date formats whilst entering their data. There were clearly told to use mm/dd/yyyy but humans being humans some entered mm/dd/yy. There was even validity checks on the forms but I allowed them to be overridden since the data is more holy than the format. The data was downloaded as a csv and read.csv was used to read in. There are several date variables (for example date of birth, date of diagnosis). Some became character vectors and others become factor vectors. Nevertheless I have accomplished most of what I want using lines such as strptime(init.consent$consent.rec,"%m/%d/%Y") strptime(x,"%m/%d/%Y") as.Date(x, "%m/%d/%Y") But what happens when a few of the entries get messed up because they are actually formatted %m/%d/%y. Is there a robust date formatter? Alternatively how would one code (presumably using regular expressions) a transforamtion or substitution only on the errant entries and thereby turn 06/25/04 into 06/25/2004 and 03/03/59 into 03/03/1959? sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 Farrel Buchinsky
Try using 'strsplit' to split your string on the '/' and then create a series of 'if's to determine how you want to output the new string. You will probably need this approach since you may have to check the validity and ranges of the numbers. On Mon, Mar 23, 2009 at 8:06 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:> How does one convert to a date format when survey respondents have > used two different date formats whilst entering their data. There were > clearly told to use mm/dd/yyyy but humans being humans some entered > mm/dd/yy. There was even validity checks on the forms but I allowed > them to be overridden since the data is more holy than the format. > > The data was downloaded as a csv and read.csv was used to read in. > There are several date variables (for example date of birth, date of > diagnosis). Some became character vectors and others become factor > vectors. Nevertheless I have accomplished most of what I want using > lines such as > > strptime(init.consent$consent.rec,"%m/%d/%Y") > strptime(x,"%m/%d/%Y") > ?as.Date(x, "%m/%d/%Y") > > But what happens when a few of the entries get messed up because they > are actually formatted %m/%d/%y. > > Is there a robust date formatter? Alternatively how would one code > (presumably using regular expressions) a transforamtion or > substitution only on the errant entries and thereby turn 06/25/04 into > 06/25/2004 and 03/03/59 into 03/03/1959? > > sessionInfo() > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > > Farrel Buchinsky > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Gabor Grothendieck
2009-Mar-24 02:40 UTC
[R] two different date formats in the same variable
Try this:> library(chron) > x <- c("06/25/04", "06/25/2004", "03/03/59", "03/03/1959") > chron(x)[1] 06/25/04 06/25/04 03/03/59 03/03/59 On Mon, Mar 23, 2009 at 8:06 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:> How does one convert to a date format when survey respondents have > used two different date formats whilst entering their data. There were > clearly told to use mm/dd/yyyy but humans being humans some entered > mm/dd/yy. There was even validity checks on the forms but I allowed > them to be overridden since the data is more holy than the format. > > The data was downloaded as a csv and read.csv was used to read in. > There are several date variables (for example date of birth, date of > diagnosis). Some became character vectors and others become factor > vectors. Nevertheless I have accomplished most of what I want using > lines such as > > strptime(init.consent$consent.rec,"%m/%d/%Y") > strptime(x,"%m/%d/%Y") > ?as.Date(x, "%m/%d/%Y") > > But what happens when a few of the entries get messed up because they > are actually formatted %m/%d/%y. > > Is there a robust date formatter? Alternatively how would one code > (presumably using regular expressions) a transforamtion or > substitution only on the errant entries and thereby turn 06/25/04 into > 06/25/2004 and 03/03/59 into 03/03/1959? > > sessionInfo() > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > > Farrel Buchinsky > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >