My first thought was that all it looked a bit complicated for something that
should be straightforward.
I created a file called t.txt. I worked out the way I would have done it and
then I tested to see which was fastest. One little hiccup is that the two
objects are not identical and I though they would be. Of course I could have
made a typo somewhere. But then there may be something I have not come across.
Guess it's time to see what identical really means.
> system.time({
+ file <- read.csv("t.txt",header=F,
+ col.names =c("c_field_1",
+ "n_field_2",
+ "d_field_3",
+ "d_field_4",
+ "n_field_5"),
+ colClasses = c("character",
+ "numeric",
+ "character",
+ "character",
+ "numeric")
+ )
+ file$d_field_3 <-
as.POSIXct(strptime(file$d_field_3,format="%m/%d/%Y" ))
+ file$d_field_4 <- as.POSIXct(strptime(file$d_field_4,format="%m/%d/%Y
%I:%M:%S %p" ))
+ })
[1] 0.00 0.00 0.02 NA NA>
>
>
> read_file <- function(file,nrows=-1) {
+
+ # create temp classes
+ setClass("t_class_",representation("character"))
+ setAs("character", "t_class_", function(from)
+ as.POSIXct(strptime(from,format="%m/%d/%Y")))
+
+ setClass("t_class2_", representation("character"))
+ setAs("character", "t_class2_", function(from)
+ as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))
+
+ # read the file
+ file <- read.csv(file,
+ header=FALSE,
+ comment.char = "",
+ nrows=nrows,
+ as.is=FALSE,
+ col.names=c("c_field_1",
+ "n_field_2",
+ "d_field_3",
+ "d_field_4",
+ "n_field_5"),
+ colClasses=c("character",
+ "numeric",
+ "t_class_",
+ "t_class2_",
+ "numeric")
+ )
+
+ # remove them now that we are done with them
+ removeClass("t_class_")
+ removeClass("t_class2_")
+
+ return(file)
+
+ }> system.time(file2 <- read_file("t.txt"))
[1] 0.14 0.00 0.16 NA NA>
> identical(file, file2)
[1] FALSE>
> file
c_field_1 n_field_2 d_field_3 d_field_4 n_field_5
1 MHK 76.53 2004-05-21 2004-05-04 16:00:00 60
2 MHK 76.53 2004-06-21 2004-05-05 16:00:00 60
3 MHK 76.53 2004-07-21 2004-05-06 16:00:00 65
4 MHK 76.53 2004-08-21 2004-05-07 16:00:00 65
5 MHK 76.53 2004-09-21 2004-05-08 16:00:00
70> file2
c_field_1 n_field_2 d_field_3 d_field_4 n_field_5
1 MHK 76.53 2004-05-21 2004-05-04 16:00:00 60
2 MHK 76.53 2004-06-21 2004-05-05 16:00:00 60
3 MHK 76.53 2004-07-21 2004-05-06 16:00:00 65
4 MHK 76.53 2004-08-21 2004-05-07 16:00:00 65
5 MHK 76.53 2004-09-21 2004-05-08 16:00:00
70> str(file)
`data.frame': 5 obs. of 5 variables:
$ c_field_1: chr "MHK" "MHK" "MHK"
"MHK" ...
$ n_field_2: num 76.5 76.5 76.5 76.5 76.5
$ d_field_3:`POSIXct', format: chr "2004-05-21"
"2004-06-21" "2004-07-21" "2004-08-21" ...
$ d_field_4:`POSIXct', format: chr "2004-05-04 16:00:00"
"2004-05-05 16:00:00" "2004-05-06 16:00:00" "2004-05-07
16:00:00" ...
$ n_field_5: num 60 60 65 65 70> str(file2)
`data.frame': 5 obs. of 5 variables:
$ c_field_1: chr "MHK" "MHK" "MHK"
"MHK" ...
$ n_field_2: num 76.5 76.5 76.5 76.5 76.5
$ d_field_3:`POSIXct', format: chr "2004-05-21"
"2004-06-21" "2004-07-21" "2004-08-21" ...
$ d_field_4:`POSIXct', format: chr "2004-05-04 16:00:00"
"2004-05-05 16:00:00" "2004-05-06 16:00:00" "2004-05-07
16:00:00" ...
$ n_field_5: num 60 60 65 65 70>
> -----Original Message-----
> From: Charles and Kimberly Maner [mailto:ckjmaner at carolina.rr.com]
> Sent: Tuesday, 8 February 2005 12:08 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] RE: Reading Dates in a csv File
>
>
>
> Hi all. Thanks for all of your help/suggestions. I found an
> old email in
> the R-help archives, pieced together a couple things and
> arrived at the
> solution below. As an additional followup, I thought I would
> go ahead and
> post it should other readers come across this same situation.
> Here goes..
>
> Raw data:
> MHK,76.53,05/21/2004,5/4/2004 4:00:00 PM,60
> MHK,76.53,06/21/2004,5/5/2004 4:00:00 PM,60
> MHK,76.53,07/21/2004,5/6/2004 4:00:00 PM,65
> MHK,76.53,08/21/2004,5/7/2004 4:00:00 PM,65
> MHK,76.53,09/21/2004,5/8/2004 4:00:00 PM,70
>
> Code:
> read_file <- function(file,nrows=-1) {
>
> # create temp classes
> setClass("t_class_",representation("character"))
> setAs("character", "t_class_", function(from)
> as.POSIXct(strptime(from,format="%m/%d/%Y")))
>
> setClass("t_class2_", representation("character"))
> setAs("character", "t_class2_", function(from)
> as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))
>
> # read the file
> file <- read.csv(file,
> header=FALSE,
> comment.char = "",
> nrows=nrows,
> as.is=FALSE,
> col.names=c("c_field_1",
> "n_field_2",
> "d_field_3",
> "d_field_4",
> "n_field_5),
> colClasses=c("character",
> "numeric",
> "t_class_",
> "t_class2_",
> "numeric")
> )
>
> # remove them now that we are done with them
> removeClass("t_class_")
> removeClass("t_class2_")
>
> return(file)
>
> }
>
> If any of you folks know a better way and/or have
> comments/enhancements to
> this code, feel free to post/email your critique.
>
>
> Thanks,
> Charles
>
>
>
>
> > _____________________________________________
> > From: Charles and Kimberly Maner
> [mailto:ckjmaner at carolina.rr.com]
> >
> > Sent: Thursday, February 03, 2005 8:35 AM
> > To: 'r-help at stat.math.ethz.ch'
> > Subject: Reading Dates in a csv File
> >
> >
> > Hi all. I'm reading in a flat, comma-delimited flat file
> using read.csv.
> > It works marvelously for the most part. I am using the colClasses
> > argument to, basically, create numeric, factor and
> character classes for
> > the columns I'm reading in. However, a couple of the
> fields in the file
> > are date fields. I'm fully aware that POSIXct can be used
> as a class,
> > however the field must obey, (I think), the standard/default POSIXct
> > format. Hence the following question: Does anyone have a
> method they can
> > share to read in a non-standard formatted date to convert
> to POSIXct? I
> > can read it in then convert it, but that's a two pass
> approach and not as
> > elegant as a single pass through read.csv. I've read, from the
> > documentation, that "[o]therwise there needs to be an as
> method (from
> > package methods) for conversion from "character" to the
> specified formal
> > class" but I do not know and have not figured out how to do that.
> >
> > Any suggestion(s) would be greatly appreciated.
> >
> >
> > Thanks,
> > Charles
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>