I am using read.csv to read a CSV file (produced by saving an Excel file as a CSV file). The columns containing dates are being read as factors. Because of this, I can not compute follow-up time, i.e. Followup<-postDate-preDate. I would appreciate any suggestion that would help me read the dates as dates and thus allow me to calculate follow-up time. Thanks John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 - NOTE NEW EMAIL ADDRESS: jsorkin at grecc.umaryland.edu
Frank E Harrell Jr
2005-Jul-28 11:55 UTC
[R] CSV file and date. Dates are read as factors!
John Sorkin wrote:> I am using read.csv to read a CSV file (produced by saving an Excel file > as a CSV file). The columns containing dates are being read as factors. > Because of this, I can not compute follow-up time, i.e. > Followup<-postDate-preDate. I would appreciate any suggestion that would > help me read the dates as dates and thus allow me to calculate follow-up > time. > Thanks > Johnlibrary(Hmisc) ?csv.get (see datevars argument) Frank> > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC and > University of Maryland School of Medicine Claude Pepper OAIC > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > 410-605-7119 > -- NOTE NEW EMAIL ADDRESS: > jsorkin at grecc.umaryland.edu > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Working with dates is not easy (for me at least). I always manage to get it done, but the code is somewhat messy. I have not tried using the Hmisc package as Frank suggested, but I will show you my code as an alternate way: w <- unclass((as.Date(as.character(dataMat$fy1_period_end_date), format="%m/%d/%Y") - as.Date(datec[i], format="%m/%d/%Y"))/365) w is the time (in days) between two dates. You can see that I had to "unclasss" the first date vector. I read my files in csv also, so I am sure something similar can be made to work for you. HTH, Roger On 7/27/05, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:> I am using read.csv to read a CSV file (produced by saving an Excel file > as a CSV file). The columns containing dates are being read as factors. > Because of this, I can not compute follow-up time, i.e. > Followup<-postDate-preDate. I would appreciate any suggestion that would > help me read the dates as dates and thus allow me to calculate follow-up > time. > Thanks > John > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC and > University of Maryland School of Medicine Claude Pepper OAIC > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > 410-605-7119 > -- NOTE NEW EMAIL ADDRESS: > jsorkin at grecc.umaryland.edu > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Jorge de la Vega Gongora
2005-Jul-28 13:57 UTC
[R] CSV file and date. Dates are read as factors!
Use the package chron. Before importing the data to R from the cvs file, convert dates to numeric format. Dates are just a sequence from a starting point. I use the following to work with dates. Asuming you have a column in your cvs file with header "date": options(chron.origin=c(month=12,day=31,year=1899)) x <- read.csv("../bla/bla.csv") x$date <- chron(x$date,format="y-m-d") Or if you have your cvs file with labels like "02/03/2005", then replace the last line with: x$date <- chron(as.character(x$date),format="y-m-d") Then you can use your field date to do date operations Hope this is useful. On 7/27/05, John Sorkin <jsorkin@grecc.umaryland.edu> wrote:> > I am using read.csv to read a CSV file (produced by saving an Excel file > as a CSV file). The columns containing dates are being read as factors. > Because of this, I can not compute follow-up time, i.e. > Followup<-postDate-preDate. I would appreciate any suggestion that would > help me read the dates as dates and thus allow me to calculate follow-up > time. > Thanks > John > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC and > University of Maryland School of Medicine Claude Pepper OAIC > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > 410-605-7119 > -- NOTE NEW EMAIL ADDRESS: > jsorkin@grecc.umaryland.edu > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >[[alternative HTML version deleted]]
It's really pretty simple. First, if you supply as.is=TRUE to read.csv() [or read.table()] then your dates will be read as character strings, not factors. That saves the step of converting them from factor to character. Then, use as.Date() to convert the date columns to objects of class "Date". You will have to specify the format, if your dates are not in the default format.> tmp <- as.Date('2002-5-1') > as.Date(Sys.time())-tmpTime difference of 1184 days If your dates include times, then use as.POSIXct() instead of as.Date().> tmp <- as.POSIXct('2002-5-1 13:21') > Sys.time()-tmpTime difference of 1183.746 days If you don't want to use as.is, perhaps because you have other columns that you *want* to have as factors, then either supply colClasses to read.csv, or else just use format() to convert the factors to character. as.Date(format(your_date_column)) As an aside, you might save yourself some time by using read.xls() from the gdata package. And of course, there's always the ugly work-around. In your Excel, create new columns in which the dates are formatted as numbers, presumably as the number of days since whatever Excel uses for its origin. Then, in R, you can simply subtract the numbers. If you have date-time values in Excel, this might be a little trickier. -Don At 9:28 PM -0400 7/27/05, John Sorkin wrote:>I am using read.csv to read a CSV file (produced by saving an Excel file >as a CSV file). The columns containing dates are being read as factors. >Because of this, I can not compute follow-up time, i.e. >Followup<-postDate-preDate. I would appreciate any suggestion that would >help me read the dates as dates and thus allow me to calculate follow-up >time. >Thanks >John > >John Sorkin M.D., Ph.D. >Chief, Biostatistics and Informatics >Baltimore VA Medical Center GRECC and >University of Maryland School of Medicine Claude Pepper OAIC > >University of Maryland School of Medicine >Division of Gerontology >Baltimore VA Medical Center >10 North Greene Street >GRECC (BT/18/GR) >Baltimore, MD 21201-1524 > >410-605-7119 >-- NOTE NEW EMAIL ADDRESS: >jsorkin at grecc.umaryland.edu > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA