thr3ads.net - R help - [R] Reading Dates in a csv File [Feb 2005]

If this information is useful, please help other people find it:
Share via:

Charles and Kimberly Maner

2005-Feb-03 13:34 UTC

[R] Reading Dates in a csv File

Hi all.  I'm reading in a flat, comma-delimited flat file using read.csv.
It works marvelously for the most part.  I am using the colClasses argument
to, basically, create numeric, factor and character classes for the columns
I'm reading in.  However, a couple of the fields in the file are date
fields.  I'm fully aware that POSIXct can be used as a class, however the
field must obey, (I think), the standard/default POSIXct format.  Hence the
following question:  Does anyone have a method they can share to read in a
non-standard formatted date to convert to POSIXct?  I can read it in then
convert it, but that's a two pass approach and not as elegant as a single
pass through read.csv.  I've read, from the documentation, that
"[o]therwise
there needs to be an as method (from package methods) for conversion from
"character" to the specified formal class" but I do not know and
have not
figured out how to do that.

Any suggestion(s) would be greatly appreciated.


Thanks,
Charles

	[[alternative HTML version deleted]]

Frank E Harrell Jr

2005-Feb-03 15:47 UTC

head link

[R] Reading Dates in a csv File

Charles and Kimberly Maner wrote:> Hi all.  I'm reading in a flat, comma-delimited flat file using
read.csv.
> It works marvelously for the most part.  I am using the colClasses argument
> to, basically, create numeric, factor and character classes for the columns
> I'm reading in.  However, a couple of the fields in the file are date
> fields.  I'm fully aware that POSIXct can be used as a class, however
the
> field must obey, (I think), the standard/default POSIXct format.  Hence the
> following question:  Does anyone have a method they can share to read in a
> non-standard formatted date to convert to POSIXct?  I can read it in then
> convert it, but that's a two pass approach and not as elegant as a
single
> pass through read.csv.  I've read, from the documentation, that
"[o]therwise
> there needs to be an as method (from package methods) for conversion from
> "character" to the specified formal class" but I do not know
and have not
> figured out how to do that.
> 
> Any suggestion(s) would be greatly appreciated.
> 
> 
> Thanks,
> Charles
The csv.get function in the Hmisc package may do most of what you want.

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Charles and Kimberly Maner

2005-Feb-08 04:07 UTC

head link

[R] RE: Reading Dates in a csv File

Hi all.  Thanks for all of your help/suggestions.  I found an old email in
the R-help archives, pieced together a couple things and arrived at the
solution below.  As an additional followup, I thought I would go ahead and
post it should other readers come across this same situation.  Here goes..

Raw data:
MHK,76.53,05/21/2004,5/4/2004 4:00:00 PM,60
MHK,76.53,06/21/2004,5/5/2004 4:00:00 PM,60
MHK,76.53,07/21/2004,5/6/2004 4:00:00 PM,65
MHK,76.53,08/21/2004,5/7/2004 4:00:00 PM,65
MHK,76.53,09/21/2004,5/8/2004 4:00:00 PM,70

Code:
read_file <- function(file,nrows=-1) {

   # create temp classes
   setClass("t_class_",representation("character"))
   setAs("character", "t_class_", function(from)
as.POSIXct(strptime(from,format="%m/%d/%Y")))
  
   setClass("t_class2_", representation("character"))
   setAs("character", "t_class2_", function(from)
as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))

   # read the file
   file <- read.csv(file,
                    header=FALSE,
                    comment.char = "",
                    nrows=nrows,
                    as.is=FALSE,
                    col.names=c("c_field_1",
                                "n_field_2",
                                "d_field_3",
                                "d_field_4",
                                "n_field_5),
                     colClasses=c("character",
                                  "numeric",
                                  "t_class_",
                                  "t_class2_",
                                  "numeric")
                     )

   # remove them now that we are done with them
   removeClass("t_class_")
   removeClass("t_class2_")

   return(file)

}

If any of you folks know a better way and/or have comments/enhancements to
this code, feel free to post/email your critique.


Thanks,
Charles



> _____________________________________________ 
> From: 	Charles and Kimberly Maner [mailto:ckjmaner@carolina.rr.com]
> 
> Sent:	Thursday, February 03, 2005 8:35 AM
> To:	'r-help@stat.math.ethz.ch'
> Subject:	Reading Dates in a csv File
> 
> 
> Hi all.  I'm reading in a flat, comma-delimited flat file using
read.csv.
> It works marvelously for the most part.  I am using the colClasses
> argument to, basically, create numeric, factor and character classes for
> the columns I'm reading in.  However, a couple of the fields in the
file
> are date fields.  I'm fully aware that POSIXct can be used as a class,
> however the field must obey, (I think), the standard/default POSIXct
> format.  Hence the following question:  Does anyone have a method they can
> share to read in a non-standard formatted date to convert to POSIXct?  I
> can read it in then convert it, but that's a two pass approach and not
as
> elegant as a single pass through read.csv.  I've read, from the
> documentation, that "[o]therwise there needs to be an as method (from
> package methods) for conversion from "character" to the specified
formal
> class" but I do not know and have not figured out how to do that.
> 
> Any suggestion(s) would be greatly appreciated.
> 
> 
> Thanks,
> Charles
	[[alternative HTML version deleted]]

Mulholland, Tom

2005-Feb-08 05:46 UTC

head link

[R] RE: Reading Dates in a csv File

My first thought was that all it looked a bit complicated for something that
should be straightforward.

I created a file called t.txt. I worked out the way I would have done it and
then I tested to see which was fastest. One little hiccup is that the two
objects are not identical and I though they would be. Of course I could have
made a typo somewhere. But then there may be something I have not come across.
Guess it's time to see what identical really means.
> system.time({+ file <- read.csv("t.txt",header=F,
+                     col.names =c("c_field_1",
+                                 "n_field_2",
+                                 "d_field_3",
+                                 "d_field_4",
+                                 "n_field_5"),
+                      colClasses = c("character",
+                                   "numeric",
+                                   "character",
+                                   "character",
+                                   "numeric")
+ )
+ file$d_field_3 <-
as.POSIXct(strptime(file$d_field_3,format="%m/%d/%Y" ))
+ file$d_field_4 <- as.POSIXct(strptime(file$d_field_4,format="%m/%d/%Y
%I:%M:%S %p" ))
+  })
[1] 0.00 0.00 0.02   NA   NA>  
> 
> 
> read_file <- function(file,nrows=-1) {+ 
+    # create temp classes
+    setClass("t_class_",representation("character"))
+    setAs("character", "t_class_", function(from)
+ as.POSIXct(strptime(from,format="%m/%d/%Y")))
+ 
+    setClass("t_class2_", representation("character"))
+    setAs("character", "t_class2_", function(from)
+ as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))
+ 
+    # read the file
+    file <- read.csv(file,
+                     header=FALSE,
+                     comment.char = "",
+                     nrows=nrows,
+                     as.is=FALSE,
+                     col.names=c("c_field_1",
+                                 "n_field_2",
+                                 "d_field_3",
+                                 "d_field_4",
+                                 "n_field_5"),
+                      colClasses=c("character",
+                                   "numeric",
+                                   "t_class_",
+                                   "t_class2_",
+                                   "numeric")
+                      )
+ 
+    # remove them now that we are done with them
+    removeClass("t_class_")
+    removeClass("t_class2_")
+ 
+    return(file)
+ 
+ }> system.time(file2 <- read_file("t.txt"))
[1] 0.14 0.00 0.16   NA   NA> 
> identical(file, file2)
[1] FALSE> 
> file  c_field_1 n_field_2  d_field_3           d_field_4 n_field_5
1       MHK     76.53 2004-05-21 2004-05-04 16:00:00        60
2       MHK     76.53 2004-06-21 2004-05-05 16:00:00        60
3       MHK     76.53 2004-07-21 2004-05-06 16:00:00        65
4       MHK     76.53 2004-08-21 2004-05-07 16:00:00        65
5       MHK     76.53 2004-09-21 2004-05-08 16:00:00       
70> file2  c_field_1 n_field_2  d_field_3           d_field_4 n_field_5
1       MHK     76.53 2004-05-21 2004-05-04 16:00:00        60
2       MHK     76.53 2004-06-21 2004-05-05 16:00:00        60
3       MHK     76.53 2004-07-21 2004-05-06 16:00:00        65
4       MHK     76.53 2004-08-21 2004-05-07 16:00:00        65
5       MHK     76.53 2004-09-21 2004-05-08 16:00:00       
70> str(file)`data.frame':   5 obs. of  5 variables:
 $ c_field_1: chr  "MHK" "MHK" "MHK"
"MHK" ...
 $ n_field_2: num  76.5 76.5 76.5 76.5 76.5
 $ d_field_3:`POSIXct', format: chr  "2004-05-21"
"2004-06-21" "2004-07-21" "2004-08-21" ...
 $ d_field_4:`POSIXct', format: chr  "2004-05-04 16:00:00"
"2004-05-05 16:00:00" "2004-05-06 16:00:00" "2004-05-07
16:00:00" ...
 $ n_field_5: num  60 60 65 65 70> str(file2)`data.frame':   5 obs. of  5 variables:
 $ c_field_1: chr  "MHK" "MHK" "MHK"
"MHK" ...
 $ n_field_2: num  76.5 76.5 76.5 76.5 76.5
 $ d_field_3:`POSIXct', format: chr  "2004-05-21"
"2004-06-21" "2004-07-21" "2004-08-21" ...
 $ d_field_4:`POSIXct', format: chr  "2004-05-04 16:00:00"
"2004-05-05 16:00:00" "2004-05-06 16:00:00" "2004-05-07
16:00:00" ...
 $ n_field_5: num  60 60 65 65 70> 
> -----Original Message-----
> From: Charles and Kimberly Maner [mailto:ckjmaner at carolina.rr.com]
> Sent: Tuesday, 8 February 2005 12:08 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] RE: Reading Dates in a csv File
> 
> 
> 
> Hi all.  Thanks for all of your help/suggestions.  I found an 
> old email in
> the R-help archives, pieced together a couple things and 
> arrived at the
> solution below.  As an additional followup, I thought I would 
> go ahead and
> post it should other readers come across this same situation. 
>  Here goes..
> 
> Raw data:
> MHK,76.53,05/21/2004,5/4/2004 4:00:00 PM,60
> MHK,76.53,06/21/2004,5/5/2004 4:00:00 PM,60
> MHK,76.53,07/21/2004,5/6/2004 4:00:00 PM,65
> MHK,76.53,08/21/2004,5/7/2004 4:00:00 PM,65
> MHK,76.53,09/21/2004,5/8/2004 4:00:00 PM,70
> 
> Code:
> read_file <- function(file,nrows=-1) {
> 
>    # create temp classes
>    setClass("t_class_",representation("character"))
>    setAs("character", "t_class_", function(from)
> as.POSIXct(strptime(from,format="%m/%d/%Y")))
>   
>    setClass("t_class2_", representation("character"))
>    setAs("character", "t_class2_", function(from)
> as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))
> 
>    # read the file
>    file <- read.csv(file,
>                     header=FALSE,
>                     comment.char = "",
>                     nrows=nrows,
>                     as.is=FALSE,
>                     col.names=c("c_field_1",
>                                 "n_field_2",
>                                 "d_field_3",
>                                 "d_field_4",
>                                 "n_field_5),
>                      colClasses=c("character",
>                                   "numeric",
>                                   "t_class_",
>                                   "t_class2_",
>                                   "numeric")
>                      )
> 
>    # remove them now that we are done with them
>    removeClass("t_class_")
>    removeClass("t_class2_")
> 
>    return(file)
> 
> }
> 
> If any of you folks know a better way and/or have 
> comments/enhancements to
> this code, feel free to post/email your critique.
> 
> 
> Thanks,
> Charles
> 
> 
> 
> 
> > _____________________________________________ 
> > From: 	Charles and Kimberly Maner 
> [mailto:ckjmaner at carolina.rr.com]
> > 
> > Sent:	Thursday, February 03, 2005 8:35 AM
> > To:	'r-help at stat.math.ethz.ch'
> > Subject:	Reading Dates in a csv File
> > 
> > 
> > Hi all.  I'm reading in a flat, comma-delimited flat file 
> using read.csv.
> > It works marvelously for the most part.  I am using the colClasses
> > argument to, basically, create numeric, factor and 
> character classes for
> > the columns I'm reading in.  However, a couple of the 
> fields in the file
> > are date fields.  I'm fully aware that POSIXct can be used 
> as a class,
> > however the field must obey, (I think), the standard/default POSIXct
> > format.  Hence the following question:  Does anyone have a 
> method they can
> > share to read in a non-standard formatted date to convert 
> to POSIXct?  I
> > can read it in then convert it, but that's a two pass 
> approach and not as
> > elegant as a single pass through read.csv.  I've read, from the
> > documentation, that "[o]therwise there needs to be an as 
> method (from
> > package methods) for conversion from "character" to the 
> specified formal
> > class" but I do not know and have not figured out how to do that.
> > 
> > Any suggestion(s) would be greatly appreciated.
> > 
> > 
> > Thanks,
> > Charles
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Feb 2005 - Reading Dates in a csv File

[R] Reading Dates in a csv File

[R] Reading Dates in a csv File

[R] RE: Reading Dates in a csv File

[R] RE: Reading Dates in a csv File

Apparently Analagous Threads