thr3ads.net - R help - [R] finding and describing missing data runs in a time series [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Durant, James T. (ATSDR/DTEM/PRMSB)

2012-Feb-13 00:36 UTC

[R] finding and describing missing data runs in a time series

Hi -

 I am trying to find and describe missing data in a time series. For instance,
in the library openair, there is a data frame called "mydata":
library(openair)
head(mydata)

  date   ws  wd nox no2 o3 pm10    so2      co pm25
1 1998-01-01 00:00:00 0.60 280 285  39  1   29 4.7225  3.3725   NA
2 1998-01-01 01:00:00 2.16 230  NA  NA NA   37     NA      NA   NA
3 1998-01-01 02:00:00 2.76 190  NA  NA  3   34 6.8300  9.6025   NA
4 1998-01-01 03:00:00 2.16 170 493  52  3   35 7.6625 10.2175   NA
5 1998-01-01 04:00:00 2.40 180 468  78  2   34 8.0700  8.9125   NA
6 1998-01-01 05:00:00 3.00 190 264  42  0   16 5.5050  3.0525   NA


So for example, I would like to be able to detect for pm25, I would like to be
able to detect that there are NA's starting at 1998-01-01 0:00:00 and runs
for 2887 hourly observations.  Then I would be able to know that there is an NA
at 2910 and so on. The key information I am looking for is when the NA's
start and their length. The closest thing I can use that I know about is
timePlot in the openair package with statistic="frequency" but it only
gives monthly summary data, and does not tell me if the missing data are clumped
together or are dispersed.

VR

Jim


James T. Durant, MSPH CIH
Emergency Response Coordinator
US Agency for Toxic Substances and Disease Registry
Atlanta, GA 30341
770-378-1695





	[[alternative HTML version deleted]]

R. Michael Weylandt <michael.weylandt@gmail.com>

2012-Feb-13 02:40 UTC

head link

[R] finding and describing missing data runs in a time series

Not at a computer to test this but perhaps

rle(is.na(x))

might help. 

Michael

On Feb 12, 2012, at 7:36 PM, "Durant, James T. (ATSDR/DTEM/PRMSB)"
<hzd3 at cdc.gov> wrote:
> Hi -
> 
> I am trying to find and describe missing data in a time series. For
instance, in the library openair, there is a data frame called
"mydata":
> library(openair)
> head(mydata)
> 
>  date   ws  wd nox no2 o3 pm10    so2      co pm25
> 1 1998-01-01 00:00:00 0.60 280 285  39  1   29 4.7225  3.3725   NA
> 2 1998-01-01 01:00:00 2.16 230  NA  NA NA   37     NA      NA   NA
> 3 1998-01-01 02:00:00 2.76 190  NA  NA  3   34 6.8300  9.6025   NA
> 4 1998-01-01 03:00:00 2.16 170 493  52  3   35 7.6625 10.2175   NA
> 5 1998-01-01 04:00:00 2.40 180 468  78  2   34 8.0700  8.9125   NA
> 6 1998-01-01 05:00:00 3.00 190 264  42  0   16 5.5050  3.0525   NA
> 
> 
> So for example, I would like to be able to detect for pm25, I would like to
be able to detect that there are NA's starting at 1998-01-01 0:00:00 and
runs for 2887 hourly observations.  Then I would be able to know that there is
an NA at 2910 and so on. The key information I am looking for is when the
NA's start and their length. The closest thing I can use that I know about
is timePlot in the openair package with statistic="frequency" but it
only gives monthly summary data, and does not tell me if the missing data are
clumped together or are dispersed.
> 
> VR
> 
> Jim
> 
> 
> James T. Durant, MSPH CIH
> Emergency Response Coordinator
> US Agency for Toxic Substances and Disease Registry
> Atlanta, GA 30341
> 770-378-1695
> 
> 
> 
> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

(Ted Harding)

2012-Feb-13 08:51 UTC

head link

[R] finding and describing missing data runs in a time series

On 13-Feb-2012 Durant, James T. (ATSDR/DTEM/PRMSB)
wrote:> Hi -
> I am trying to find and describe missing data in a time series.
> For instance, in the library openair, there is a data frame
> called "mydata":
> library(openair)
> head(mydata)
> 
>   date   ws  wd nox no2 o3 pm10    so2      co pm25
> 1 1998-01-01 00:00:00 0.60 280 285  39  1   29 4.7225  3.3725   NA
> 2 1998-01-01 01:00:00 2.16 230  NA  NA NA   37     NA      NA   NA
> 3 1998-01-01 02:00:00 2.76 190  NA  NA  3   34 6.8300  9.6025   NA
> 4 1998-01-01 03:00:00 2.16 170 493  52  3   35 7.6625 10.2175   NA
> 5 1998-01-01 04:00:00 2.40 180 468  78  2   34 8.0700  8.9125   NA
> 6 1998-01-01 05:00:00 3.00 190 264  42  0   16 5.5050  3.0525   NA
> 
> 
> So for example, I would like to be able to detect for pm25,
> I would like to be able to detect that there are NA's starting
> at 1998-01-01 0:00:00 and runs for 2887 hourly observations.
> Then I would be able to know that there is an NA at 2910 and
> so on. The key information I am looking for is when the NA's
> start and their length. The closest thing I can use that I
> know about is timePlot in the openair package with
> statistic="frequency" but it only gives monthly summary data,
> and does not tell me if the missing data are clumped together
> or are dispersed.
> 
> VR
> Jim
> 
> James T. Durant, MSPH CIH
> Emergency Response Coordinator
> US Agency for Toxic Substances and Disease Registry
> Atlanta, GA 30341
> 770-378-1695
You might consider an approach based on

  rle(is.na(mydata$pm25))

See ?rle

Example:

  X <- c(1,2,3,NA,NA,NA,4,5,NA,6,7,8,NA,NA,NA,NA,NA)
  X
  # [1]  1  2  3 NA NA NA  4  5 NA  6  7  8 NA NA NA NA NA
  rle(is.na(X))
  # Run Length Encoding
  #   lengths: int [1:6] 3 3 2 1 3 5
  #   values : logi [1:6] FALSE TRUE FALSE TRUE FALSE TRUE

Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 13-Feb-2012  Time: 08:51:19
This message was sent by XFMail

Seemingly Similar Threads

Search for more reasonably related threads

R help - Feb 2012 - finding and describing missing data runs in a time series

[R] finding and describing missing data runs in a time series

[R] finding and describing missing data runs in a time series

[R] finding and describing missing data runs in a time series

Seemingly Similar Threads