Date-Time-Stamp input method to correctly interpret user-specific formats:coding is 90% there - based on exmple at http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html ...anyone got the last 10% please? CONTEXT: Data is received where one of the columns is a datetimestamp. At midnight, the value represented as text in this column consists of just the date part, e.g. "01/09/2009". At other times, the value in the column contains both date and time e.g. "01/09/2009 00:00:01". The goal is to read it into R as an appropriate data type, where for example date arithmetic can be performed. As far as I can tell, the most appropriate such data type is POSIXct. The trick then is to read in the datetimestamps in the data as this type. PROBLEM: POSIXct defaults to a text representation almost but not quite like my received data. The main difference is that the POSIXct date part is in reverse order, e.g. "2009-09-01". It is possible to define a different format where date and time parts look like my data but when encountering datetimestamps where only the the date part is present (as in the case of my midnight data) then this is interpreted as NA i.e. undefined. SOLUTION (ALMOST): There is a workaround (based on example at http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It is possible to define a class then read the data in as this class. For such a class it is possible to define a class method, in terms of a function, for translating a text (character string) representation into a value. In that function, one can use a conditional expression to treat midnight datetimestamps differently from those at other times of day. The example below does that. In order to apply this function over all of the datetimestamp values in the column, it is necessary to use something like R's 'sapply' function. SNAG: The function below implements this approach. A datetimestamp with only the date part, including leading zeroes, is always length 10 (characters). It correctly interprets the datetimestamp values, but unfortunately translates them into what appear to be numeric type. I am actually uncertain precisely what is happening, as I am very new to R and have most certainly stretched myself in writing this code. I think perhaps it returns a list and something associated with this aspect makes it "forget" the data type is POSIXct or at least how such a type should be displayed as text or what to do about it. PLEA: Please, can anyone give any help whatsoever, however tenuous? CODE, DATA & RESULTS: Function to Read required data, intended to make the datetime column of the data (example given further below) into POSIXct values: <<< spot_frequency_readin <- function(file,nrows=-1) { # create temp class setClass("t_class2_", representation("character")) setAs("character", "t_class2_", function(from) {sapply(from, function(x) { if (nchar(x)==10) { as.POSIXct(strptime(x,format="%d/%m/%Y")) } else { as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S")) } } ) } ) #(for format symbols, see "R Reference Card") # read the file (TSV) file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows, as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_", "numeric") ) # remove it now that we are done with it removeClass("t_class2_") return(file) }>>>This appears to work apart as regards processing each row of data correctly, but the values returned look like numeric equivalents of POSIXct, as opposed to the expected character-based (string) equivalents: Example Data: <<< DATETIME FREQ 01/09/2009 59.036 01/09/2009 00:00:01 58.035 01/09/2009 00:00:02 53.035 01/09/2009 00:00:03 47.033 01/09/2009 00:00:04 52.03 01/09/2009 00:00:05 55.025>>>Example Function Call: <<<> spot = spot_frequency_readin("mydatafile.txt",4) >>>Result of Example Function Call: <<<> spot[1]DATETIME 1 1251759600 2 1251759601 3 1251759602 4 1251759603>>>What I ideally wanted to see (whether or not the time part of the datetimestamp at midnight was displayed): <<<> spot[1]DATETIME 01/09/2009 00:00:00 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009 00:00:03 01/09/2009 00:00:04>>>For the function as defined above using 'sapply'> spot[,1]01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009 00:00:03 1251759600 1251759601 1251759602 1251759603 This was unexpected - it seems to have displayed the datetimestamp values both as per my defined character-string representation and as numeric values. Alternatively ifI replace the 'sapply' by a 'lapply' then I get something closer to what I expect. It is at least what looks like R's default text representation for POSIXct datetimes, even if it is not in my preferred format. <<<> spot[,1][[1]] [1] "2009-09-01 BST" [[2]] [1] "2009-09-01 00:00:01 BST" [[3]] [1] "2009-09-01 00:00:02 BST" [[4]] [1] "2009-09-01 00:00:03 BST">>>-- View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html Sent from the R help mailing list archive at Nabble.com.
Don MacQueen
2009-Oct-05 22:18 UTC
[R] Date-Time-Stamp input method for user-specific formats
Off the top of my head, I think you're working to hard at this. I would read in the timestamp column as a character string. Then, find those where the string length is too short [using nchar()], append "00:00:00" to those [using paste()], and then convert to POSIXt [using as.POSIXct()]. No need to define new classes. Simple and easy to understand. -Don At 2:14 PM -0700 10/5/09, esp wrote:>Date-Time-Stamp input method to correctly interpret user-specific >formats:coding is 90% there - based on exmple at >http://*tolstoy.newcastle.edu.au/R/help/05/02/12003.html >...anyone got the last 10% please? > >CONTEXT: > >Data is received where one of the columns is a datetimestamp. At midnight, >the value represented as text in this column consists of just the date part, >e.g. "01/09/2009". At other times, the value in the column contains both >date and time e.g. "01/09/2009 00:00:01". The goal is to read it into R as >an appropriate data type, where for example date arithmetic can be >performed. As far as I can tell, the most appropriate such data type is >POSIXct. The trick then is to read in the datetimestamps in the data as >this type. > >PROBLEM: > >POSIXct defaults to a text representation almost but not quite like my >received data. The main difference is that the POSIXct date part is in >reverse order, e.g. "2009-09-01". It is possible to define a different >format where date and time parts look like my data but when encountering >datetimestamps where only the the date part is present (as in the case of my >midnight data) then this is interpreted as NA i.e. undefined. > >SOLUTION (ALMOST): > >There is a workaround (based on example at >http://*tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It is possible to >define a class then read the data in as this class. For such a class it is >possible to define a class method, in terms of a function, for translating a >text (character string) representation into a value. In that function, one >can use a conditional expression to treat midnight datetimestamps >differently from those at other times of day. The example below does that. >In order to apply this function over all of the datetimestamp values in the >column, it is necessary to use something like R's 'sapply' function. > >SNAG: > >The function below implements this approach. A datetimestamp with only the >date part, including leading zeroes, is always length 10 (characters). It >correctly interprets the datetimestamp values, but unfortunately translates >them into what appear to be numeric type. I am actually uncertain precisely >what is happening, as I am very new to R and have most certainly stretched >myself in writing this code. I think perhaps it returns a list and >something associated with this aspect makes it "forget" the data type is >POSIXct or at least how such a type should be displayed as text or what to >do about it. > >PLEA: > >Please, can anyone give any help whatsoever, however tenuous? > >CODE, DATA & RESULTS: > >Function to Read required data, intended to make the datetime column of the >data (example given further below) into POSIXct values: ><<< >spot_frequency_readin <- function(file,nrows=-1) { > ># create temp class >setClass("t_class2_", representation("character")) >setAs("character", "t_class2_", function(from) {sapply(from, function(x) { > if (nchar(x)==10) { >as.POSIXct(strptime(x,format="%d/%m/%Y")) >} >else { >as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S")) >} >} >) >} >) > >#(for format symbols, see "R Reference Card") > ># read the file (TSV) >file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows, >as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_", >"numeric") ) > ># remove it now that we are done with it >removeClass("t_class2_") > >return(file) >} >>>> >This appears to work apart as regards processing each row of data correctly, >but the values returned look like numeric equivalents of POSIXct, as opposed >to the expected character-based (string) equivalents: > > >Example Data: ><<< >DATETIME FREQ >01/09/2009 59.036 >01/09/2009 00:00:01 58.035 >01/09/2009 00:00:02 53.035 >01/09/2009 00:00:03 47.033 >01/09/2009 00:00:04 52.03 >01/09/2009 00:00:05 55.025 >>>> > > >Example Function Call: ><<< >> spot = spot_frequency_readin("mydatafile.txt",4) >>>> > > >Result of Example Function Call: ><<< >> spot[1] > DATETIME > >1 1251759600 >2 1251759601 >3 1251759602 >4 1251759603 >>>> > > >What I ideally wanted to see (whether or not the time part of the >datetimestamp at midnight was displayed): ><<< >> spot[1] > DATETIME > >01/09/2009 00:00:00 >01/09/2009 00:00:01 >01/09/2009 00:00:02 >01/09/2009 00:00:03 >01/09/2009 00:00:04 >>>> > > >For the function as defined above using 'sapply' >> spot[,1] > 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009 >00:00:03 > 1251759600 1251759601 1251759602 >1251759603 > >This was unexpected - it seems to have displayed the datetimestamp values >both as per my defined character-string representation and as numeric >values. > >Alternatively ifI replace the 'sapply' by a 'lapply' then I get something >closer to what I expect. It is at least what looks like R's default text >representation for POSIXct datetimes, even if it is not in my preferred >format. ><<< >> spot[,1] > >[[1]] >[1] "2009-09-01 BST" > >[[2]] >[1] "2009-09-01 00:00:01 BST" > >[[3]] >[1] "2009-09-01 00:00:02 BST" > >[[4]] >[1] "2009-09-01 00:00:03 BST" >>>> > >-- >View this message in context: >http://*www.*nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html >Sent from the R help mailing list archive at Nabble.com. > >______________________________________________ >R-help at r-project.org mailing list >https://*stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062
David Winsemius
2009-Oct-05 22:24 UTC
[R] Date-Time-Stamp input method for user-specific formats
On Oct 5, 2009, at 5:14 PM, esp wrote:> > Date-Time-Stamp input method to correctly interpret user-specific > formats:coding is 90% there - based on exmple at > http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html > ...anyone got the last 10% please? > > CONTEXT: > > Data is received where one of the columns is a datetimestamp. At > midnight, > the value represented as text in this column consists of just the > date part, > e.g. "01/09/2009". At other times, the value in the column contains > both > date and time e.g. "01/09/2009 00:00:01". The goal is to read it > into R as > an appropriate data type, where for example date arithmetic can be > performed. As far as I can tell, the most appropriate such data > type is > POSIXct. The trick then is to read in the datetimestamps in the > data as > this type. > > PROBLEM: > > POSIXct defaults to a text representation almost but not quite like my > received data. The main difference is that the POSIXct date part is > in > reverse order, e.g. "2009-09-01". It is possible to define a > different > format where date and time parts look like my data but when > encountering > datetimestamps where only the the date part is present (as in the > case of my > midnight data) then this is interpreted as NA i.e. undefined. > > SOLUTION (ALMOST): > > There is a workaround (based on example at > http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It is > possible to > define a class then read the data in as this class. For such a > class it is > possible to define a class method, in terms of a function, for > translating a > text (character string) representation into a value. In that > function, one > can use a conditional expression to treat midnight datetimestamps > differently from those at other times of day. The example below > does that. > In order to apply this function over all of the datetimestamp values > in the > column, it is necessary to use something like R's 'sapply' function. > > SNAG: > > The function below implements this approach. A datetimestamp with > only the > date part, including leading zeroes, is always length 10 > (characters). It > correctly interprets the datetimestamp values, but unfortunately > translates > them into what appear to be numeric type. I am actually uncertain > precisely > what is happening, as I am very new to R and have most certainly > stretched > myself in writing this code. I think perhaps it returns a list and > something associated with this aspect makes it "forget" the data > type is > POSIXct or at least how such a type should be displayed as text or > what to > do about it. > > PLEA: > > Please, can anyone give any help whatsoever, however tenuous? > > CODE, DATA & RESULTS: > > Function to Read required data, intended to make the datetime column > of the > data (example given further below) into POSIXct values: > <<< > spot_frequency_readin <- function(file,nrows=-1) { > > # create temp class > setClass("t_class2_", representation("character")) > setAs("character", "t_class2_", function(from) {sapply(from, > function(x) { > if (nchar(x)==10) { > as.POSIXct(strptime(x,format="%d/%m/%Y")) > } > else { > as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S")) > } > } > ) > } > ) > > #(for format symbols, see "R Reference Card") > > # read the file (TSV) > file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows, > as.is=FALSE, col.names=c("DATETIME", "FREQ"), > colClasses=c("t_class2_", > "numeric") ) > > # remove it now that we are done with it > removeClass("t_class2_") > > return(file) > } >>>> > This appears to work apart as regards processing each row of data > correctly, > but the values returned look like numeric equivalents of POSIXct, as > opposed > to the expected character-based (string) equivalents: > > > Example Data: > <<< > DATETIME FREQ > 01/09/2009 59.036 > 01/09/2009 00:00:01 58.035 > 01/09/2009 00:00:02 53.035 > 01/09/2009 00:00:03 47.033 > 01/09/2009 00:00:04 52.03 > 01/09/2009 00:00:05 55.025 >>>> > > > Example Function Call: > <<< >> spot = spot_frequency_readin("mydatafile.txt",4) >>>> > > > Result of Example Function Call: > <<< >> spot[1] > DATETIME > > 1 1251759600 > 2 1251759601 > 3 1251759602 > 4 1251759603 >>>> > > > What I ideally wanted to see (whether or not the time part of the > datetimestamp at midnight was displayed): > <<< >> spot[1] > DATETIME > > 01/09/2009 00:00:00 > 01/09/2009 00:00:01 > 01/09/2009 00:00:02 > 01/09/2009 00:00:03 > 01/09/2009 00:00:04 >>>> > > > For the function as defined above using 'sapply' >> spot[,1] > 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009 > 00:00:03 > 1251759600 1251759601 1251759602 > 1251759603 > > This was unexpected - it seems to have displayed the datetimestamp > values > both as per my defined character-string representation and as numeric > values.as.POSIXct(spot$DATETIME, origin="1970-01-01") 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 "2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-01 05:00:02 EDT" 01/09/2009 00:00:03 "2009-09-01 05:00:03 EDT" If you want to get rid of the somewhat extranous names: > unname(as.POSIXct(spot$DATETIME, origin="1970-01-01") ) [1] "2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-01 05:00:02 EDT" [4] "2009-09-01 05:00:03 EDT" If you want a varialbe that stays that way: > spot$D2 <- as.POSIXct(spot$DATETIME, origin="1970-01-01") > spot DATETIME FREQ D2 1 1251777600 59.036 2009-09-01 05:00:00 2 1251777601 58.035 2009-09-01 05:00:01 3 1251777602 53.035 2009-09-01 05:00:02 4 1251777603 47.033 2009-09-01 05:00:03 Or you could overwrite spot$DATETIME.> > Alternatively ifI replace the 'sapply' by a 'lapply' then I get > something > closer to what I expect. It is at least what looks like R's default > text > representation for POSIXct datetimes, even if it is not in my > preferred > format. > <<< >> spot[,1] > > [[1]] > [1] "2009-09-01 BST" > > [[2]] > [1] "2009-09-01 00:00:01 BST" > > [[3]] > [1] "2009-09-01 00:00:02 BST" > > [[4]] > [1] "2009-09-01 00:00:03 BST" >>>> > > --David Winsemius, MD Heritage Laboratories West Hartford, CT
Gabor Grothendieck
2009-Oct-05 22:50 UTC
[R] Date-Time-Stamp input method for user-specific formats
Try this. First we read a line at a time into L except for the header. Then we use strapply to match on the given pattern. It passes the backreferences (the portions within parentheses in the pattern) to the function (defined via a formula) whose implicit arguments are x, y and z. That function returns two columns which are in the required form so that in the next statement we convert one to chron and the other to numeric. See R News 4/1 for more about dates and times. library(gsubfn) # strapply library(chron) # as.chron Lines <- "DATETIME FREQ 01/09/2009 59.036 01/09/2009 00:00:01 58.035 01/09/2009 00:00:02 53.035 01/09/2009 00:00:03 47.033 01/09/2009 00:00:04 52.03 01/09/2009 00:00:05 55.025" L <- readLines(Lines)[-1] pat <- "(../../....) (..:..:..){0,1} *([0-9.]+)" s <- strapply(L, pat, ~ c(paste(x, y, "00:00:00"), z), simplify = rbind) fmt <- "%m/%d/%Y %H:%M:%S" DF <- data.frame(Time = as.chron(s[,1], fmt), Freq = as.numeric(s[,2])) DF The final output looks like this:> DFTime Freq 1 (01/09/09 00:00:00) 59.036 2 (01/09/09 00:00:01) 58.035 3 (01/09/09 00:00:02) 53.035 4 (01/09/09 00:00:03) 47.033 5 (01/09/09 00:00:04) 52.030 6 (01/09/09 00:00:05) 55.025 If the times are unique you could consider making a zoo object out of it by replacing the DF<- statement with: library(zoo) z <- zoo(as.numeric(s[,2]), as.chron(s[,1], fmt)) See the three vignettes in the zoo package. On Mon, Oct 5, 2009 at 5:14 PM, esp <davidgaryesp at gmail.com> wrote:> > Date-Time-Stamp input method to correctly interpret user-specific > formats:coding is ?90% there - based on exmple at > http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html > ...anyone got the last 10% please? > > CONTEXT: > > Data is received where one of the columns is a datetimestamp. ?At midnight, > the value represented as text in this column consists of just the date part, > e.g. "01/09/2009". ?At other times, the value in the column contains both > date and time e.g. "01/09/2009 00:00:01". ?The goal is to read it into R as > an appropriate data type, where for example date arithmetic can be > performed. ?As far as I can tell, the most appropriate such data type is > POSIXct. ?The trick then is to read in the datetimestamps in the data as > this type. > > PROBLEM: > > POSIXct defaults to a text representation almost but not quite like my > received data. ?The main difference is that the POSIXct date part is in > reverse order, e.g. "2009-09-01". ?It is possible to define a different > format where date and time parts look like my data but when encountering > datetimestamps where only the the date part is present (as in the case of my > midnight data) then this is interpreted as NA i.e. undefined. > > SOLUTION (ALMOST): > > There is a workaround (based on example at > http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). ?It is possible to > define a class then read the data in as this class. ?For such a class it is > possible to define a class method, in terms of a function, for translating a > text (character string) representation into a value. In that function, one > can use a conditional expression to treat midnight datetimestamps > differently from those at other times of day. ?The example below does that. > In order to apply this function over all of the datetimestamp values in the > column, it is necessary to use something like R's 'sapply' function. > > SNAG: > > The function below implements this approach. ?A datetimestamp with only the > date part, including leading zeroes, is always length 10 (characters). ? It > correctly interprets the datetimestamp values, but unfortunately translates > them into what appear to be numeric type. ?I am actually uncertain precisely > what is happening, as I am very new to R and have most certainly stretched > myself in writing this code. ?I think perhaps it returns a list and > something associated with this aspect makes it "forget" the data type is > POSIXct or at least how such a type should be displayed as text or what to > do about it. > > PLEA: > > Please, can anyone give any help whatsoever, however tenuous? > > CODE, DATA & RESULTS: > > Function to Read required data, intended to make the datetime column of the > data (example given further below) into POSIXct values: > <<< > spot_frequency_readin <- function(file,nrows=-1) { > > # create temp class > setClass("t_class2_", representation("character")) > setAs("character", "t_class2_", function(from) {sapply(from, function(x) { > ?if (nchar(x)==10) { > as.POSIXct(strptime(x,format="%d/%m/%Y")) > } > else { > as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S")) > } > } > ) > } > ) > > #(for format symbols, see "R Reference Card") > > # read the file (TSV) > file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows, > as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_", > "numeric") ) > > # remove it now that we are done with it > removeClass("t_class2_") > > return(file) > } >>>> > This appears to work apart as regards processing each row of data correctly, > but the values returned look like numeric equivalents of POSIXct, as opposed > to the expected character-based (string) equivalents: > > > Example Data: > <<< > DATETIME ? ? ? ?FREQ > 01/09/2009 ? ? ?59.036 > 01/09/2009 00:00:01 ? ? 58.035 > 01/09/2009 00:00:02 ? ? 53.035 > 01/09/2009 00:00:03 ? ? 47.033 > 01/09/2009 00:00:04 ? ? 52.03 > 01/09/2009 00:00:05 ? ? 55.025 >>>> > > > Example Function Call: > <<< >> spot = spot_frequency_readin("mydatafile.txt",4) >>>> > > > Result of Example Function Call: > <<< >> spot[1] > ? ?DATETIME > > 1 1251759600 > 2 1251759601 > 3 1251759602 > 4 1251759603 >>>> > > > What I ideally wanted to see (whether or not the time part of the > datetimestamp at midnight was displayed): > <<< >> spot[1] > ? ?DATETIME > > 01/09/2009 00:00:00 > 01/09/2009 00:00:01 > 01/09/2009 00:00:02 > 01/09/2009 00:00:03 > 01/09/2009 00:00:04 >>>> > > > For the function as defined above using 'sapply' >> spot[,1] > ? ? ? ? 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009 > 00:00:03 > ? ? ? ? 1251759600 ? ? ? ? ?1251759601 ? ? ? ? ?1251759602 > 1251759603 > > This was unexpected - it seems to have displayed the datetimestamp values > both as per my defined character-string representation and as numeric > values. > > Alternatively ifI replace the 'sapply' by a 'lapply' then I get something > closer to what I expect. ?It is at least what looks like R's default text > representation for POSIXct datetimes, even if it is not in my preferred > format. > <<< >> spot[,1] > > [[1]] > [1] "2009-09-01 BST" > > [[2]] > [1] "2009-09-01 00:00:01 BST" > > [[3]] > [1] "2009-09-01 00:00:02 BST" > > [[4]] > [1] "2009-09-01 00:00:03 BST" >>>> > > -- > View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you all who replied, I will try out these ideas later today. David Esp -- View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25763935.html Sent from the R help mailing list archive at Nabble.com.
esp wrote:> > For the function as defined above using 'sapply' >> spot[,1] > 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009 > 00:00:03 > 1251759600 1251759601 1251759602 > 1251759603 > > This was unexpected - it seems to have displayed the datetimestamp values > both as per my defined character-string representation and as numeric > values. >One mystery solved (now I appreciate the existence and utility of the 'str' and 'ls.str' functions), the apparent dual dateformat and numeric results from my initial algorithm were in fact the associated characterstring and numeric parts of a "Named num" object. Hence for example:> str(spot$DATETIME)Named num [1:4] 1.25e+09 1.25e+09 1.25e+09 1.25e+09 - attr(*, "names")= chr [1:4] "01/09/2009" "01/09/2009 00:00:01" "01/09/2009 00:00:02" "01/09/2009 00:00:03"> names(spot$DATETIME)[1] "01/09/2009" "01/09/2009 00:00:01" "01/09/2009 00:00:02" "01/09/2009 00:00:03" -- View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25770184.html Sent from the R help mailing list archive at Nabble.com.
Another solution, as a fix to my original algorithm, was found by a colleague (Matthew Roberts). While he claims not too much for its elegance, it does seem to work. This fix is based on the use of the 'pmax' function. This function is a variant of the 'max' (maximum) function to return a vector of results corresponding to vectors of inputs. Example: max(1:3,4:8) == 8 but pmax(1:3,4:6) == 4 5 6. Thanks to this, it provides appropriate results for all rows of the data. In the code, there are two possible datetimestamp interpretations, midnight and non midnight, each implemented by a 'strptime' call. When a midnight datetimestamp is encountered, only the midnight conversion will return a proper (non NA) value. Thanks to the "na.rm=TRUE" option, the NA result is removed so 'pmax' returns just the proper value. For a non midnight datetimestamp, both midnight and non midnight conversions return proper values, but only the non midnight conversion will give a result greater than midnight, and it is this that is returned by the 'pmax'. The code is as follows: spot_frequency_readin <- function(file,nrows=-1) { # create temp class setClass("t_class2_", representation("character")) setAs("character", "t_class2_", function(from) { as.POSIXct(pmax(strptime(from, format="%d/%m/%Y"), strptime(from, format="%d/%m/%Y %H:%M:%S"), na.rm=TRUE), tz="GMT") } ) #(for format symbols, see "R Reference Card") # read the file (TSV) file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows, as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_", "numeric") ) # remove it now that we are done with it removeClass("t_class2_") return(file) } The result:> spotDATETIME FREQ 1 2009-09-01 00:00:00 50.036 2 2009-09-01 00:00:01 50.035 3 2009-09-01 00:00:02 50.035 4 2009-09-01 00:00:03 50.033 Confirm the nature of the result:> str(spot)'data.frame': 4 obs. of 2 variables: $ DATETIME: POSIXct, format: "2009-09-01 00:00:00" "2009-09-01 00:00:01" "2009-09-01 00:00:02" "2009-09-01 00:00:03" $ FREQ : num 50 50 50 50 (Note: 'str' means "Compactly display the internal structure of an R object". I can claim from experience that his and 'ls.str' are things that the novice R user can benefit hugely from knowing about) -- View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25770983.html Sent from the R help mailing list archive at Nabble.com.