Chriss, Neil
2003-Jun-25 12:43 UTC
[R] Help on using read.table with files containing dates
I am a relatively new user of R and have a question about using the read.table command for reading in .csv files that contain dates. My ultimate goal is to be able to read several different files into data frames and then do a merge along the "Date" column of each data frame. It seems I should be able to specify the name of the column that contains dates and then automatically convert that to dates, all in a single read.table statement. It's such a natural thing to do (and the help for read.table gives me hope, but I have not been able to figure out some of the options, e.g., colNames). Here is a specific example of what I mean and what the problems are. An First I load a sample data file whose first column is a date. This gives the wrong answer as we shall see:> library(date) > sampData <- read.table("sampleData.csv",sep=",",header=TRUE) > sampDataDate Col1 Col2 Col3 1 1/1/2003 1.2 1.4 0.160 2 1/4/2003 1.8 1.2 0.900 3 1/5/2003 0.9 1.1 -0.003> mode(sampData$Date)[1] "numeric">Note that the Date column is coerced incorrectly into being numeric. Right now what I do is force R to read this in as a character using "as.is" and then convert it to a date as follows:> sampData <- read.table("sampleData.csv",sep=",",header=TRUE,as.is=1) > sampDataDate Col1 Col2 Col3 1 1/1/2003 1.2 1.4 0.160 2 1/4/2003 1.8 1.2 0.900 3 1/5/2003 0.9 1.1 -0.003> mode(sampData$Date)[1] "character"> sampData$Date <- as.date(sampData$Date) > sampDataDate Col1 Col2 Col3 1 15706 1.2 1.4 0.160 2 15709 1.8 1.2 0.900 3 15710 0.9 1.1 -0.003>Now the Data column is converted to Julian, which is good b/c if I repeat this procedure with other dataframes I can do a merge on the date columns and line up the dates. But, there are two drawbacks with this approach which I do not know how to solve: 1. it's generally clumsy and takes too many lines. 2. (more importantly) it forces me to specify which column number contains the dates instead of simply stating which column header name contains date. I would rather simply the column header name of the column that contains the date (in this example: Date) so that whichever column contains the date will be automatically converted to dates. So, is there a way to specify that I want the column labeled "Date" to be read as a date class (as in as.date)? Any hints will be much appreciated. Neil Chriss DISCLAIMER: This e-mail message and any attachments are inte...{{dropped}}
Prof Brian Ripley
2003-Jun-25 13:16 UTC
[R] Help on using read.table with files containing dates
On Wed, 25 Jun 2003, Chriss, Neil wrote:> I am a relatively new user of R and have a question about using the > read.table command for reading in .csv files that contain dates.Unfortuantely them appear to contain perversions of dates, not the dates recognised by the ISO standard. I read your dates as the first of Jan, Apr and May 2003, but some illogical people use mm/dd/yy order.> My > ultimate goal is to be able to read several different files into data frames > and then do a merge along the "Date" column of each data frame. It seems I > should be able to specify the name of the column that contains dates and > then automatically convert that to dates, all in a single read.table > statement. It's such a natural thing to do (and the help for read.table > gives me hope, but I have not been able to figure out some of the options, > e.g., colNames). > > Here is a specific example of what I mean and what the problems are. An > > First I load a sample data file whose first column is a date. This gives > the wrong answer as we shall see: > > > library(date) > > sampData <- read.table("sampleData.csv",sep=",",header=TRUE) > > sampData > Date Col1 Col2 Col3 > 1 1/1/2003 1.2 1.4 0.160 > 2 1/4/2003 1.8 1.2 0.900 > 3 1/5/2003 0.9 1.1 -0.003 > > mode(sampData$Date) > [1] "numeric" > > > > Note that the Date column is coerced incorrectly into being numeric.It is not: mode is inappropriate here: you actually have a factor whose mode is numeric. Use class(), not mode().> Right > now what I do is force R to read this in as a character using "as.is" and > then convert it to a date as follows: > > > sampData <- read.table("sampleData.csv",sep=",",header=TRUE,as.is=1) > > sampData > Date Col1 Col2 Col3 > 1 1/1/2003 1.2 1.4 0.160 > 2 1/4/2003 1.8 1.2 0.900 > 3 1/5/2003 0.9 1.1 -0.003 > > mode(sampData$Date) > [1] "character" > > sampData$Date <- as.date(sampData$Date) > > sampData > Date Col1 Col2 Col3 > 1 15706 1.2 1.4 0.160 > 2 15709 1.8 1.2 0.900 > 3 15710 0.9 1.1 -0.003 > > > > Now the Data column is converted to Julian, which is good b/c if I repeat > this procedure with other dataframes I can do a merge on the date columns > and line up the dates. But, there are two drawbacks with this approach > which I do not know how to solve: > > 1. it's generally clumsy and takes too many lines. > > 2. (more importantly) it forces me to specify which column number contains > the dates instead of simply stating which column header name contains date. > I would rather simply the column header name of the column that contains the > date (in this example: Date) so that whichever column contains the date will > be automatically converted to dates. > > So, is there a way to specify that I want the column labeled "Date" to be > read as a date class (as in as.date)?Yes, that's what the argument colClasses is for. But since as.date is not part of R (it is in contributed packages survival and date) you will need to provide an as() method. I suggest you use sampData[["Date"]] <- strptime(as.character(sampData[["Date"]]), "%d/%m/%y") since the conversion to factor can be undone very easily. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595