Henry
2014-Jun-15 18:39 UTC
[R] reading time series csv file with read.zoo issues, then align time stamps
Goal: get time series data interpolated on to desired time stamps. I have two or more data sets that have time stamps that vary from 5 mins to 3-5 hours. I want to get all the data put on common time stamps e.g. "00:05:00" intervals. I asked Gabor and got some very good code ( zoo aggregate, na.spline, na.approx) but I'm having trouble getting the csv file read in and converted to a zoo object so I can try getting these functions going again. Here is what Gabor sent last time. _____________________start of what Gabor sent ______________________ If you are using zoo then the zoo FAQ discusses grids http://cran.r-project.org/web/packages/zoo/index.html and the other 4 vignettes (pdf documents) and reference manual on that page discuss more. zoo does not supply its own time classes except where classes are elsewhere missing. Its design is completely independent of the time class and it works with any time class that supports certain methods (and that includes all popular ones). See R News 4/1 for more on date and time classes. Here is some code: Lines <- "10/11/2011 23:30:01 432.22 10/11/2011 23:31:17 432.32 10/11/2011 23:35:00 432.32 10/11/2011 23:36:18 432.22 10/11/2011 23:37:18 432.72 10/11/2011 23:39:19 432.23 10/11/2011 23:40:02 432.23 10/11/2011 23:45:00 432.23 10/11/2011 23:45:20 429.75 10/11/2011 23:46:20 429.65 10/11/2011 23:50:00 429.65 10/11/2011 23:51:22 429.75 10/11/2011 23:55:01 429.75 10/11/2011 23:56:23 429.55 10/12/2011 0:00:07 429.55 10/12/2011 0:01:24 429.95 10/12/2011 0:05:00 429.95 10/12/2011 0:06:25 429.85 10/12/2011 0:10:00 429.85 10/12/2011 0:11:26 428.85 10/12/2011 0:15:00 428.85 10/12/2011 0:20:03 428.85 10/12/2011 0:21:29 428.75 10/12/2011 0:25:01 428.75 10/12/2011 0:30:01 428.75 10/12/2011 0:31:31 428.75" library(zoo) library(chron) fmt <- "%m/%d/%Y %H:%M:%S" toChron <- function(d, t) as.chron(paste(d, t), format = fmt) z <- read.zoo(text = Lines, index = 1:2, FUN = toChron) # 5 minute aggregates m5 <- times("00:05:00") ag5 <- aggregate(z, trunc(time(z), m5), mean) # 5 minute spline fit g <- seq(trunc(start(z), m5), end(z), by = m5) na.spline(z, xout = g) # 5 minute linear approx na.approx(z, xout = g) ________________end of what Gabor sent_________________ My csv data looks like this.....when I look at the file with NotePad++ I see the commas. TimeStamp Sea_Temperature_F 12/31/2011 13:24:00 52 12/31/2011 16:44:06 52 12/31/2011 20:44:06 53 01/01/2012 00:44:06 53 01/01/2012 04:44:06 53 01/01/2012 08:44:07 54 01/01/2012 12:26:00 54 01/01/2012 12:44:07 53 01/01/2012 16:44:07 53 01/01/2012 20:44:06 54 01/02/2012 00:44:09 54 01/02/2012 04:44:06 55 01/02/2012 08:44:07 55 01/02/2012 12:44:06 56 01/02/2012 13:04:00 56 01/02/2012 16:44:07 57 01/02/2012 20:44:07 58 01/03/2012 00:44:07 58 01/03/2012 04:44:06 59 01/03/2012 08:44:06 59 01/03/2012 10:48:00 59 01/03/2012 12:44:06 58 01/03/2012 16:44:06 58 01/03/2012 20:44:07 59 01/04/2012 00:44:06 59 01/04/2012 04:44:07 58 01/04/2012 08:44:07 58 01/04/2012 12:44:07 57 01/04/2012 15:30:00 57 01/04/2012 16:44:07 57 01/04/2012 20:44:06 57 01/05/2012 00:44:06 57 The R code I'm trying to get working is as follows: (I'm trying to follow code provided by Gabor) but I'm too embarrassed to ask him directly again. fmt <- "%M/%D/%Y %H:%M:%S" toChron <- function(d, t) as.chron(paste(d, t), format = fmt) seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE, FUN=toChron) I get errors:> fmt <- "%M/%D/%Y %H:%M:%S" > toChron <- function(d, t) as.chron(paste(d, t), format = fmt) > seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE, > FUN=toChron)Error in paste(d, t) : argument "t" is missing, with no default>If I take the "FUN=toChron" out I get this error. There are 542 rows of data.> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE)Error in read.zoo("SampleSeaTempData-2.csv", sep = ",", header = TRUE) : index has 542 bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 ...>I guess there is too much going on that I don't understand: - what does the toChron line do? how are "d" and "t" defined? - why does the Gabor read.zoo line have "index=1:2" ? - why does the Gabor code have " FUN=toChron" ? The idea is to get two or more data streams "converted" to exact timestamp csv files with interpolated values and then I guess cbind the data into one data frame so I can plot together. I've read re. zoo csv file read issues/posts - e.g. getting the seconds (":00") to appear in the csv file to eliminate duplicate row index entries. Maybe it would be easier/cleaner to read the csv file into a regular R dataframe and then "convert" to a zoo object? In my analysis and plotting I use POSIXlt for time. Help appreciated. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/reading-time-series-csv-file-with-read-zoo-issues-then-align-time-stamps-tp4692157.html Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2014-Jun-15 19:19 UTC
[R] reading time series csv file with read.zoo issues, then align time stamps
index = 1:2 is missing. On Sun, Jun 15, 2014 at 2:39 PM, Henry <hccoles at lbl.gov> wrote:> Goal: get time series data interpolated on to desired time stamps. > I have two or more data sets that have time stamps that vary from 5 mins to > 3-5 hours. > I want to get all the data put on common time stamps e.g. "00:05:00" > intervals. > > I asked Gabor and got some very good code ( zoo aggregate, na.spline, > na.approx) but I'm having trouble getting the csv file read in and converted > to a zoo object so I can try getting these functions going again. Here is > what Gabor sent last time. > > _____________________start of what Gabor sent ______________________ > If you are using zoo then the zoo FAQ discusses grids > http://cran.r-project.org/web/packages/zoo/index.html > and the other 4 vignettes (pdf documents) and reference manual on that > page discuss more. > > zoo does not supply its own time classes except where classes are > elsewhere missing. Its design is completely independent of the time > class and it works with any time class that supports certain methods > (and that includes all popular ones). See R News 4/1 for more on date > and time classes. > > Here is some code: > > Lines <- "10/11/2011 23:30:01 432.22 > 10/11/2011 23:31:17 432.32 > 10/11/2011 23:35:00 432.32 > 10/11/2011 23:36:18 432.22 > 10/11/2011 23:37:18 432.72 > 10/11/2011 23:39:19 432.23 > 10/11/2011 23:40:02 432.23 > 10/11/2011 23:45:00 432.23 > 10/11/2011 23:45:20 429.75 > 10/11/2011 23:46:20 429.65 > 10/11/2011 23:50:00 429.65 > 10/11/2011 23:51:22 429.75 > 10/11/2011 23:55:01 429.75 > 10/11/2011 23:56:23 429.55 > 10/12/2011 0:00:07 429.55 > 10/12/2011 0:01:24 429.95 > 10/12/2011 0:05:00 429.95 > 10/12/2011 0:06:25 429.85 > 10/12/2011 0:10:00 429.85 > 10/12/2011 0:11:26 428.85 > 10/12/2011 0:15:00 428.85 > 10/12/2011 0:20:03 428.85 > 10/12/2011 0:21:29 428.75 > 10/12/2011 0:25:01 428.75 > 10/12/2011 0:30:01 428.75 > 10/12/2011 0:31:31 428.75" > > library(zoo) > library(chron) > > fmt <- "%m/%d/%Y %H:%M:%S" > toChron <- function(d, t) as.chron(paste(d, t), format = fmt) > > z <- read.zoo(text = Lines, index = 1:2, FUN = toChron) > > # 5 minute aggregates > m5 <- times("00:05:00") > ag5 <- aggregate(z, trunc(time(z), m5), mean) > > # 5 minute spline fit > g <- seq(trunc(start(z), m5), end(z), by = m5) > na.spline(z, xout = g) > > # 5 minute linear approx > na.approx(z, xout = g) > ________________end of what Gabor sent_________________ > > My csv data looks like this.....when I look at the file with NotePad++ I see > the commas. > > > TimeStamp Sea_Temperature_F > 12/31/2011 13:24:00 52 > 12/31/2011 16:44:06 52 > 12/31/2011 20:44:06 53 > 01/01/2012 00:44:06 53 > 01/01/2012 04:44:06 53 > 01/01/2012 08:44:07 54 > 01/01/2012 12:26:00 54 > 01/01/2012 12:44:07 53 > 01/01/2012 16:44:07 53 > 01/01/2012 20:44:06 54 > 01/02/2012 00:44:09 54 > 01/02/2012 04:44:06 55 > 01/02/2012 08:44:07 55 > 01/02/2012 12:44:06 56 > 01/02/2012 13:04:00 56 > 01/02/2012 16:44:07 57 > 01/02/2012 20:44:07 58 > 01/03/2012 00:44:07 58 > 01/03/2012 04:44:06 59 > 01/03/2012 08:44:06 59 > 01/03/2012 10:48:00 59 > 01/03/2012 12:44:06 58 > 01/03/2012 16:44:06 58 > 01/03/2012 20:44:07 59 > 01/04/2012 00:44:06 59 > 01/04/2012 04:44:07 58 > 01/04/2012 08:44:07 58 > 01/04/2012 12:44:07 57 > 01/04/2012 15:30:00 57 > 01/04/2012 16:44:07 57 > 01/04/2012 20:44:06 57 > 01/05/2012 00:44:06 57 > > > The R code I'm trying to get working is as follows: (I'm trying to follow > code provided by Gabor) but I'm too embarrassed to ask him directly again. > > fmt <- "%M/%D/%Y %H:%M:%S" > toChron <- function(d, t) as.chron(paste(d, t), format = fmt) > seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE, > FUN=toChron) > > I get errors: > >> fmt <- "%M/%D/%Y %H:%M:%S" >> toChron <- function(d, t) as.chron(paste(d, t), format = fmt) >> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE, >> FUN=toChron) > Error in paste(d, t) : argument "t" is missing, with no default >> > > If I take the "FUN=toChron" out I get this error. There are 542 rows of > data. > >> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE) > Error in read.zoo("SampleSeaTempData-2.csv", sep = ",", header = TRUE) : > index has 542 bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 > 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 > 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 > 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 > 90 91 92 93 94 95 96 97 98 99 100 ... >> > > I guess there is too much going on that I don't understand: > - what does the toChron line do? how are "d" and "t" defined? > - why does the Gabor read.zoo line have "index=1:2" ? > - why does the Gabor code have " FUN=toChron" ? > > > The idea is to get two or more data streams "converted" to exact timestamp > csv files with interpolated values and then I guess cbind the data into one > data frame so I can plot together. > > I've read re. zoo csv file read issues/posts - e.g. getting the seconds > (":00") to appear in the csv file to eliminate duplicate row index entries. > > Maybe it would be easier/cleaner to read the csv file into a regular R > dataframe and then "convert" to a zoo object? > > In my analysis and plotting I use POSIXlt for time. > > > Help appreciated. Thanks. > > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/reading-time-series-csv-file-with-read-zoo-issues-then-align-time-stamps-tp4692157.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com