Henry
2014-Jun-15 18:39 UTC
[R] reading time series csv file with read.zoo issues, then align time stamps
Goal: get time series data interpolated on to desired time stamps.
I have two or more data sets that have time stamps that vary from 5 mins to
3-5 hours.
I want to get all the data put on common time stamps e.g. "00:05:00"
intervals.
I asked Gabor and got some very good code ( zoo aggregate, na.spline,
na.approx) but I'm having trouble getting the csv file read in and converted
to a zoo object so I can try getting these functions going again. Here is
what Gabor sent last time.
_____________________start of what Gabor sent ______________________
If you are using zoo then the zoo FAQ discusses grids
http://cran.r-project.org/web/packages/zoo/index.html
and the other 4 vignettes (pdf documents) and reference manual on that
page discuss more.
zoo does not supply its own time classes except where classes are
elsewhere missing. Its design is completely independent of the time
class and it works with any time class that supports certain methods
(and that includes all popular ones). See R News 4/1 for more on date
and time classes.
Here is some code:
Lines <- "10/11/2011 23:30:01 432.22
10/11/2011 23:31:17 432.32
10/11/2011 23:35:00 432.32
10/11/2011 23:36:18 432.22
10/11/2011 23:37:18 432.72
10/11/2011 23:39:19 432.23
10/11/2011 23:40:02 432.23
10/11/2011 23:45:00 432.23
10/11/2011 23:45:20 429.75
10/11/2011 23:46:20 429.65
10/11/2011 23:50:00 429.65
10/11/2011 23:51:22 429.75
10/11/2011 23:55:01 429.75
10/11/2011 23:56:23 429.55
10/12/2011 0:00:07 429.55
10/12/2011 0:01:24 429.95
10/12/2011 0:05:00 429.95
10/12/2011 0:06:25 429.85
10/12/2011 0:10:00 429.85
10/12/2011 0:11:26 428.85
10/12/2011 0:15:00 428.85
10/12/2011 0:20:03 428.85
10/12/2011 0:21:29 428.75
10/12/2011 0:25:01 428.75
10/12/2011 0:30:01 428.75
10/12/2011 0:31:31 428.75"
library(zoo)
library(chron)
fmt <- "%m/%d/%Y %H:%M:%S"
toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
z <- read.zoo(text = Lines, index = 1:2, FUN = toChron)
# 5 minute aggregates
m5 <- times("00:05:00")
ag5 <- aggregate(z, trunc(time(z), m5), mean)
# 5 minute spline fit
g <- seq(trunc(start(z), m5), end(z), by = m5)
na.spline(z, xout = g)
# 5 minute linear approx
na.approx(z, xout = g)
________________end of what Gabor sent_________________
My csv data looks like this.....when I look at the file with NotePad++ I see
the commas.
TimeStamp Sea_Temperature_F
12/31/2011 13:24:00 52
12/31/2011 16:44:06 52
12/31/2011 20:44:06 53
01/01/2012 00:44:06 53
01/01/2012 04:44:06 53
01/01/2012 08:44:07 54
01/01/2012 12:26:00 54
01/01/2012 12:44:07 53
01/01/2012 16:44:07 53
01/01/2012 20:44:06 54
01/02/2012 00:44:09 54
01/02/2012 04:44:06 55
01/02/2012 08:44:07 55
01/02/2012 12:44:06 56
01/02/2012 13:04:00 56
01/02/2012 16:44:07 57
01/02/2012 20:44:07 58
01/03/2012 00:44:07 58
01/03/2012 04:44:06 59
01/03/2012 08:44:06 59
01/03/2012 10:48:00 59
01/03/2012 12:44:06 58
01/03/2012 16:44:06 58
01/03/2012 20:44:07 59
01/04/2012 00:44:06 59
01/04/2012 04:44:07 58
01/04/2012 08:44:07 58
01/04/2012 12:44:07 57
01/04/2012 15:30:00 57
01/04/2012 16:44:07 57
01/04/2012 20:44:06 57
01/05/2012 00:44:06 57
The R code I'm trying to get working is as follows: (I'm trying to
follow
code provided by Gabor) but I'm too embarrassed to ask him directly again.
fmt <- "%M/%D/%Y %H:%M:%S"
toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",",
header=TRUE,
FUN=toChron)
I get errors:
> fmt <- "%M/%D/%Y %H:%M:%S"
> toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
> seatemp <- read.zoo ("SampleSeaTempData-2.csv",
sep=",", header=TRUE,
> FUN=toChron)
Error in paste(d, t) : argument "t" is missing, with no
default>
If I take the "FUN=toChron" out I get this error. There are 542 rows
of
data.
> seatemp <- read.zoo ("SampleSeaTempData-2.csv",
sep=",", header=TRUE)
Error in read.zoo("SampleSeaTempData-2.csv", sep = ",",
header = TRUE) :
index has 542 bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 ...>
I guess there is too much going on that I don't understand:
- what does the toChron line do? how are "d" and "t"
defined?
- why does the Gabor read.zoo line have "index=1:2" ?
- why does the Gabor code have " FUN=toChron" ?
The idea is to get two or more data streams "converted" to exact
timestamp
csv files with interpolated values and then I guess cbind the data into one
data frame so I can plot together.
I've read re. zoo csv file read issues/posts - e.g. getting the seconds
(":00") to appear in the csv file to eliminate duplicate row index
entries.
Maybe it would be easier/cleaner to read the csv file into a regular R
dataframe and then "convert" to a zoo object?
In my analysis and plotting I use POSIXlt for time.
Help appreciated. Thanks.
--
View this message in context:
http://r.789695.n4.nabble.com/reading-time-series-csv-file-with-read-zoo-issues-then-align-time-stamps-tp4692157.html
Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2014-Jun-15 19:19 UTC
[R] reading time series csv file with read.zoo issues, then align time stamps
index = 1:2 is missing. On Sun, Jun 15, 2014 at 2:39 PM, Henry <hccoles at lbl.gov> wrote:> Goal: get time series data interpolated on to desired time stamps. > I have two or more data sets that have time stamps that vary from 5 mins to > 3-5 hours. > I want to get all the data put on common time stamps e.g. "00:05:00" > intervals. > > I asked Gabor and got some very good code ( zoo aggregate, na.spline, > na.approx) but I'm having trouble getting the csv file read in and converted > to a zoo object so I can try getting these functions going again. Here is > what Gabor sent last time. > > _____________________start of what Gabor sent ______________________ > If you are using zoo then the zoo FAQ discusses grids > http://cran.r-project.org/web/packages/zoo/index.html > and the other 4 vignettes (pdf documents) and reference manual on that > page discuss more. > > zoo does not supply its own time classes except where classes are > elsewhere missing. Its design is completely independent of the time > class and it works with any time class that supports certain methods > (and that includes all popular ones). See R News 4/1 for more on date > and time classes. > > Here is some code: > > Lines <- "10/11/2011 23:30:01 432.22 > 10/11/2011 23:31:17 432.32 > 10/11/2011 23:35:00 432.32 > 10/11/2011 23:36:18 432.22 > 10/11/2011 23:37:18 432.72 > 10/11/2011 23:39:19 432.23 > 10/11/2011 23:40:02 432.23 > 10/11/2011 23:45:00 432.23 > 10/11/2011 23:45:20 429.75 > 10/11/2011 23:46:20 429.65 > 10/11/2011 23:50:00 429.65 > 10/11/2011 23:51:22 429.75 > 10/11/2011 23:55:01 429.75 > 10/11/2011 23:56:23 429.55 > 10/12/2011 0:00:07 429.55 > 10/12/2011 0:01:24 429.95 > 10/12/2011 0:05:00 429.95 > 10/12/2011 0:06:25 429.85 > 10/12/2011 0:10:00 429.85 > 10/12/2011 0:11:26 428.85 > 10/12/2011 0:15:00 428.85 > 10/12/2011 0:20:03 428.85 > 10/12/2011 0:21:29 428.75 > 10/12/2011 0:25:01 428.75 > 10/12/2011 0:30:01 428.75 > 10/12/2011 0:31:31 428.75" > > library(zoo) > library(chron) > > fmt <- "%m/%d/%Y %H:%M:%S" > toChron <- function(d, t) as.chron(paste(d, t), format = fmt) > > z <- read.zoo(text = Lines, index = 1:2, FUN = toChron) > > # 5 minute aggregates > m5 <- times("00:05:00") > ag5 <- aggregate(z, trunc(time(z), m5), mean) > > # 5 minute spline fit > g <- seq(trunc(start(z), m5), end(z), by = m5) > na.spline(z, xout = g) > > # 5 minute linear approx > na.approx(z, xout = g) > ________________end of what Gabor sent_________________ > > My csv data looks like this.....when I look at the file with NotePad++ I see > the commas. > > > TimeStamp Sea_Temperature_F > 12/31/2011 13:24:00 52 > 12/31/2011 16:44:06 52 > 12/31/2011 20:44:06 53 > 01/01/2012 00:44:06 53 > 01/01/2012 04:44:06 53 > 01/01/2012 08:44:07 54 > 01/01/2012 12:26:00 54 > 01/01/2012 12:44:07 53 > 01/01/2012 16:44:07 53 > 01/01/2012 20:44:06 54 > 01/02/2012 00:44:09 54 > 01/02/2012 04:44:06 55 > 01/02/2012 08:44:07 55 > 01/02/2012 12:44:06 56 > 01/02/2012 13:04:00 56 > 01/02/2012 16:44:07 57 > 01/02/2012 20:44:07 58 > 01/03/2012 00:44:07 58 > 01/03/2012 04:44:06 59 > 01/03/2012 08:44:06 59 > 01/03/2012 10:48:00 59 > 01/03/2012 12:44:06 58 > 01/03/2012 16:44:06 58 > 01/03/2012 20:44:07 59 > 01/04/2012 00:44:06 59 > 01/04/2012 04:44:07 58 > 01/04/2012 08:44:07 58 > 01/04/2012 12:44:07 57 > 01/04/2012 15:30:00 57 > 01/04/2012 16:44:07 57 > 01/04/2012 20:44:06 57 > 01/05/2012 00:44:06 57 > > > The R code I'm trying to get working is as follows: (I'm trying to follow > code provided by Gabor) but I'm too embarrassed to ask him directly again. > > fmt <- "%M/%D/%Y %H:%M:%S" > toChron <- function(d, t) as.chron(paste(d, t), format = fmt) > seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE, > FUN=toChron) > > I get errors: > >> fmt <- "%M/%D/%Y %H:%M:%S" >> toChron <- function(d, t) as.chron(paste(d, t), format = fmt) >> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE, >> FUN=toChron) > Error in paste(d, t) : argument "t" is missing, with no default >> > > If I take the "FUN=toChron" out I get this error. There are 542 rows of > data. > >> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE) > Error in read.zoo("SampleSeaTempData-2.csv", sep = ",", header = TRUE) : > index has 542 bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 > 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 > 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 > 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 > 90 91 92 93 94 95 96 97 98 99 100 ... >> > > I guess there is too much going on that I don't understand: > - what does the toChron line do? how are "d" and "t" defined? > - why does the Gabor read.zoo line have "index=1:2" ? > - why does the Gabor code have " FUN=toChron" ? > > > The idea is to get two or more data streams "converted" to exact timestamp > csv files with interpolated values and then I guess cbind the data into one > data frame so I can plot together. > > I've read re. zoo csv file read issues/posts - e.g. getting the seconds > (":00") to appear in the csv file to eliminate duplicate row index entries. > > Maybe it would be easier/cleaner to read the csv file into a regular R > dataframe and then "convert" to a zoo object? > > In my analysis and plotting I use POSIXlt for time. > > > Help appreciated. Thanks. > > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/reading-time-series-csv-file-with-read-zoo-issues-then-align-time-stamps-tp4692157.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com