amvds at xs4all.nl
2009-Apr-08 08:43 UTC
[R] Convert data frame containing time stamps to time series
I read records using scan: dat<-data.frame(scan(file="KDA.csv",what=list(t="%m/%d/%y %H:%M",f=0,p=0,d=0,o=0,s=0,a=0,l=0,c=0),skip=2,sep=",",nmax=np,flush=TRUE,na.strings=c("I/OTimeout","ArcOff-line"))) which results in:> dat[1:5,]t f p d o s a l c 1 1/21/09 5:01 16151 8.2 76 30 282 1060 53 7 2 1/21/09 5:02 16256 8.3 76 23 282 1059 54 7 3 1/21/09 5:03 16150 8.4 76 26 282 1059 55 7 4 1/21/09 5:04 16150 9.0 76 25 282 1051 57 6 5 1/21/09 5:05 15543 10.4 76 7 282 1024 58 6 I have been unable to find a way to convert this into a time series. I did read the manuals and came across a way to coerce a data frame to a ts object: as.ts() Trouble is I do not know how to keep the timestamps in column t in the data frame above. The t column is not strings. If I do: plot.ts(dat) I can see how the first graphics panel is indeed numbers not text. So I think scan converted the text correctly per the format string I put in. Much more difficult still. The datafiles I have contain invalid data, missing values and other none relevant information. I filter this out using subset which works brilliantly. However, how can I filter using subset and convert to a time series afterwards. Since after subsetting there will be 'holes' i.e. missing records. Can a ts object deal with missing records? If so, how? Just point me to a document. I can and will put in the work to figure it out myself. Thank you! Alex van der Spek
Gabor Grothendieck
2009-Apr-08 13:04 UTC
[R] Convert data frame containing time stamps to time series
Try varying the arguments to this to accommodate the precise format of your data. See the three zoo vignettes, ?read.zoo and R News 4/1 for dates and times.> Lines <- "t,f,p,d,o,s,a,l,c+ 1/21/09 5:01,16151,8.2,76,30,282,1060,53,7 + 1/21/09 5:02,16256,8.3,76,23,282,1059,54,7 + 1/21/09 5:03,16150,8.4,76,26,282,1059,55,7 + 1/21/09 5:04,16150,9.0,76,25,282,1051,57,6 + 1/21/09 5:05,15543,10.4,76,7,282,1024,58,6"> library(zoo) > library(chron) > z <- read.zoo(textConnection(Lines), sep = ",", header = TRUE,+ FUN = as.chron, format = "%m/%d/%y %H:%M")> zf p d o s a l c (01/21/09 05:01:00) 16151 8.2 76 30 282 1060 53 7 (01/21/09 05:02:00) 16256 8.3 76 23 282 1059 54 7 (01/21/09 05:03:00) 16150 8.4 76 26 282 1059 55 7 (01/21/09 05:04:00) 16150 9.0 76 25 282 1051 57 6 (01/21/09 05:05:00) 15543 10.4 76 7 282 1024 58 6 On Wed, Apr 8, 2009 at 4:43 AM, <amvds at xs4all.nl> wrote:> I read records using scan: > > dat<-data.frame(scan(file="KDA.csv",what=list(t="%m/%d/%y > %H:%M",f=0,p=0,d=0,o=0,s=0,a=0,l=0,c=0),skip=2,sep=",",nmax=np,flush=TRUE,na.strings=c("I/OTimeout","ArcOff-line"))) > > which results in: > >> dat[1:5,] > ? ? ? ? ? ? t ? ? f ? ?p ?d ?o ? s ? ?a ?l c > 1 1/21/09 5:01 16151 ?8.2 76 30 282 1060 53 7 > 2 1/21/09 5:02 16256 ?8.3 76 23 282 1059 54 7 > 3 1/21/09 5:03 16150 ?8.4 76 26 282 1059 55 7 > 4 1/21/09 5:04 16150 ?9.0 76 25 282 1051 57 6 > 5 1/21/09 5:05 15543 10.4 76 ?7 282 1024 58 6 > > I have been unable to find a way to convert this into a time series. I did > read the manuals and came across a way to coerce a data frame to a ts > object: as.ts() > > Trouble is I do not know how to keep the timestamps in column t in the > data frame above. The t column is not strings. If I do: > > plot.ts(dat) > > I can see how the first graphics panel is indeed numbers not text. So I > think scan converted the text correctly per the format string I put in. > > Much more difficult still. The datafiles I have contain invalid data, > missing values and other none relevant information. I filter this out > using subset which works brilliantly. However, how can I filter using > subset and convert to a time series afterwards. Since after subsetting > there will be 'holes' i.e. missing records. Can a ts object deal with > missing records? If so, how? Just point me to a document. I can and will > put in the work to figure it out myself. > > Thank you! > Alex van der Spek > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
amvds at xs4all.nl
2009-Apr-08 15:56 UTC
[R] Convert data frame containing time stamps to time series
Converting dates is getting stranger still. I am coercing a data frame into a ts as follows: tst1<-as.POSIXct("1/21/09 5:01",format="%m/%d/%y %H:%M") tst2<-as.POSIXct("1/28/09 3:40",format="%m/%d/%y %H:%M") tsdat<-as.ts(dat,start=tst1,end=tst2,frequency=1) This generates a ts object. But strangely enough the first column of that matrix starts at the numeric value of 841 counts up to 1139 and then starts at 1 again, only to count up from there. The restart at 1 occurs at the first day "1/21/09" at 10:00:00. What is so special about that time? This phenomenon happens several times in the long file. But the restart count is always a different number. This creates a ramp with some bumps. Can anybody explain this? Thanks in advance, Alex van der Spek> I read records using scan: > > dat<-data.frame(scan(file="KDA.csv",what=list(t="%m/%d/%y > %H:%M",f=0,p=0,d=0,o=0,s=0,a=0,l=0,c=0),skip=2,sep=",",nmax=np,flush=TRUE,na.strings=c("I/OTimeout","ArcOff-line"))) > > which results in: > >> dat[1:5,] > t f p d o s a l c > 1 1/21/09 5:01 16151 8.2 76 30 282 1060 53 7 > 2 1/21/09 5:02 16256 8.3 76 23 282 1059 54 7 > 3 1/21/09 5:03 16150 8.4 76 26 282 1059 55 7 > 4 1/21/09 5:04 16150 9.0 76 25 282 1051 57 6 > 5 1/21/09 5:05 15543 10.4 76 7 282 1024 58 6 > > I have been unable to find a way to convert this into a time series. I did > read the manuals and came across a way to coerce a data frame to a ts > object: as.ts() > > Trouble is I do not know how to keep the timestamps in column t in the > data frame above. The t column is not strings. If I do: > > plot.ts(dat) > > I can see how the first graphics panel is indeed numbers not text. So I > think scan converted the text correctly per the format string I put in. > > Much more difficult still. The datafiles I have contain invalid data, > missing values and other none relevant information. I filter this out > using subset which works brilliantly. However, how can I filter using > subset and convert to a time series afterwards. Since after subsetting > there will be 'holes' i.e. missing records. Can a ts object deal with > missing records? If so, how? Just point me to a document. I can and will > put in the work to figure it out myself. > > Thank you! > Alex van der Spek > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
amvds at xs4all.nl
2009-Apr-09 15:29 UTC
[R] Convert data frame containing time stamps to time series
What is zoo? I cannot find anything about zoo int he documentation. I did try as.ts() see below. Thank you, Alex van der Spek> have you tried using zoo and then using the function as.ts() > > On Wed, Apr 8, 2009 at 11:56 AM, <amvds at xs4all.nl> wrote: >> Converting dates is getting stranger still. I am coercing a data frame >> into a ts as follows: >> >> >> tst1<-as.POSIXct("1/21/09 5:01",format="%m/%d/%y %H:%M") >> tst2<-as.POSIXct("1/28/09 3:40",format="%m/%d/%y %H:%M") >> tsdat<-as.ts(dat,start=tst1,end=tst2,frequency=1) >> >> This generates a ts object. But strangely enough the first column of >> that >> matrix starts at the numeric value of 841 counts up to 1139 and then >> starts at 1 again, only to count up from there. The restart at 1 occurs >> at >> the first day "1/21/09" at 10:00:00. >> >> What is so special about that time? This phenomenon happens several >> times >> in the long file. But the restart count is always a different number. >> This creates a ramp with some bumps. >> >> Can anybody explain this? >> Thanks in advance, >> Alex van der Spek >> >> >>> I read records using scan: >>> >>> dat<-data.frame(scan(file="KDA.csv",what=list(t="%m/%d/%y >>> %H:%M",f=0,p=0,d=0,o=0,s=0,a=0,l=0,c=0),skip=2,sep=",",nmax=np,flush=TRUE,na.strings=c("I/OTimeout","ArcOff-line"))) >>> >>> which results in: >>> >>>> dat[1:5,] >>> ?? ?? ?? ?? ?? ?? ??t ?? ?? f ?? ??p ??d ??o ?? s ?? ??a ??l c >>> 1 1/21/09 5:01 16151 ??8.2 76 30 282 1060 53 7 >>> 2 1/21/09 5:02 16256 ??8.3 76 23 282 1059 54 7 >>> 3 1/21/09 5:03 16150 ??8.4 76 26 282 1059 55 7 >>> 4 1/21/09 5:04 16150 ??9.0 76 25 282 1051 57 6 >>> 5 1/21/09 5:05 15543 10.4 76 ??7 282 1024 58 6 >>> >>> I have been unable to find a way to convert this into a time series. I >>> did >>> read the manuals and came across a way to coerce a data frame to a ts >>> object: as.ts() >>> >>> Trouble is I do not know how to keep the timestamps in column t in the >>> data frame above. The t column is not strings. If I do: >>> >>> plot.ts(dat) >>> >>> I can see how the first graphics panel is indeed numbers not text. So I >>> think scan converted the text correctly per the format string I put in. >>> >>> Much more difficult still. The datafiles I have contain invalid data, >>> missing values and other none relevant information. I filter this out >>> using subset which works brilliantly. However, how can I filter using >>> subset and convert to a time series afterwards. Since after subsetting >>> there will be 'holes' i.e. missing records. Can a ts object deal with >>> missing records? If so, how? Just point me to a document. I can and >>> will >>> put in the work to figure it out myself. >>> >>> Thank you! >>> Alex van der Spek >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Stephen Sefick > > Let's not spend our time and resources thinking about things that are > so little or so large that all they really do for us is puff us up and > make us feel like gods. We are mammals, and have not exhausted the > annoying little problems of being mammals. > > -K. Mullis > >