thr3ads.net - R help - [R] Aggregating zoo object with NAs in multiple column [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Abiel Reinhart

2008-Jul-23 23:02 UTC

[R] Aggregating zoo object with NAs in multiple column

I would like to run an aggregation on a zoo object that has multiple series
in it, with one of more series having NA values. The problem is that by
default the aggregate function will produce an NA value in each aggregated
period that contains an NA. For instance, if I run aggregate(x,
as.yearmon(index(x)), mean) on the example object "x" which is printed
below, I will just get a bunch of NAs for January.

This behavior is perfectly logical. The problem is that if I try to use the
na.omit() function, it will throw away the entire line if even one series
has an NA value. For example, in the table below, you can see that running
na.omit() will throw out periods 2001-01-06 through 2001-01-10. But since
each of these lines contain many non-NA readings, we are throwing away real
information that should be used in the calculation of the means for January.
The mean for column B should include non-NA value for the month, but since A
has a NA value on January 6, the January 6 value for B will be dropped as
well. Same thing for columns C, D, and E.

I suppose one solution would be to break the object into five one-series
objects, run aggregate(na.omit(item), as.yearmon(index(na.omit(item))),
mean) on each of them, then bind them back together, but this is rather
annoying. Is there a better way?

Thanks.

Abiel

                   a          b          c          d         e
2001-01-01 0.5183099 0.62792449 0.90859932 0.56578026 0.3991120
2001-01-02 0.2759420 0.96788392 0.30789409 0.76159986 0.3122280
2001-01-03 0.3263367 0.41224859 0.69756281 0.27406235 0.6902459
2001-01-04 0.3681782 0.41167564 0.02734471 0.39348676 0.8370692
2001-01-05 0.2550825 0.65790206 0.65134885 0.92537263 0.4143775
2001-01-06        NA 0.09076128 0.35209944 0.70821994 0.6659275
2001-01-07 0.4749008         NA 0.73579892 0.67311239 0.2155689
2001-01-08 0.7314498 0.56542607         NA 0.37529408 0.9313593
2001-01-09 0.5560702 0.47944318 0.01946189         NA 0.7055763
2001-01-10 0.4848510 0.12003527 0.31297935 0.41487588        NA
2001-01-11 0.0902985 0.88107285 0.33374604 0.26173483 0.3062338
2001-01-12 0.3664127 0.35366508 0.97760256 0.90784835 0.7399498
2001-01-13 0.6394206 0.05157520 0.38823937 0.92289256 0.6464278
2001-01-14 0.1949957 0.29738760 0.25224214 0.00024017 0.1228440
2001-01-15 0.7723980 0.99391775 0.22869908 0.97916413 0.1066641

	[[alternative HTML version deleted]]

Achim Zeileis

2008-Jul-24 00:26 UTC

head link

[R] Aggregating zoo object with NAs in multiple column

On Wed, 23 Jul 2008, Abiel Reinhart wrote:
> I would like to run an aggregation on a zoo object that has multiple series
> in it, with one of more series having NA values. The problem is that by
> default the aggregate function will produce an NA value in each aggregated
> period that contains an NA. For instance, if I run aggregate(x,
> as.yearmon(index(x)), mean) on the example object "x" which is
printed
> below, I will just get a bunch of NAs for January.
This is not specific to zoo series, the function mean() always behaves 
like this. If you want to remove the NAs before, you have to pass the 
argument na.rm = TRUE to mean. The easiest way to do this is
   aggregate(x, as.yearmon, mean, na.rm = TRUE)
Z
> This behavior is perfectly logical. The problem is that if I try to use the
> na.omit() function, it will throw away the entire line if even one series
> has an NA value. For example, in the table below, you can see that running
> na.omit() will throw out periods 2001-01-06 through 2001-01-10. But since
> each of these lines contain many non-NA readings, we are throwing away real
> information that should be used in the calculation of the means for
January.
> The mean for column B should include non-NA value for the month, but since
A
> has a NA value on January 6, the January 6 value for B will be dropped as
> well. Same thing for columns C, D, and E.
>
> I suppose one solution would be to break the object into five one-series
> objects, run aggregate(na.omit(item), as.yearmon(index(na.omit(item))),
> mean) on each of them, then bind them back together, but this is rather
> annoying. Is there a better way?
>
> Thanks.
>
> Abiel
>
>                   a          b          c          d         e
> 2001-01-01 0.5183099 0.62792449 0.90859932 0.56578026 0.3991120
> 2001-01-02 0.2759420 0.96788392 0.30789409 0.76159986 0.3122280
> 2001-01-03 0.3263367 0.41224859 0.69756281 0.27406235 0.6902459
> 2001-01-04 0.3681782 0.41167564 0.02734471 0.39348676 0.8370692
> 2001-01-05 0.2550825 0.65790206 0.65134885 0.92537263 0.4143775
> 2001-01-06        NA 0.09076128 0.35209944 0.70821994 0.6659275
> 2001-01-07 0.4749008         NA 0.73579892 0.67311239 0.2155689
> 2001-01-08 0.7314498 0.56542607         NA 0.37529408 0.9313593
> 2001-01-09 0.5560702 0.47944318 0.01946189         NA 0.7055763
> 2001-01-10 0.4848510 0.12003527 0.31297935 0.41487588        NA
> 2001-01-11 0.0902985 0.88107285 0.33374604 0.26173483 0.3062338
> 2001-01-12 0.3664127 0.35366508 0.97760256 0.90784835 0.7399498
> 2001-01-13 0.6394206 0.05157520 0.38823937 0.92289256 0.6464278
> 2001-01-14 0.1949957 0.29738760 0.25224214 0.00024017 0.1228440
> 2001-01-15 0.7723980 0.99391775 0.22869908 0.97916413 0.1066641
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

R help - Jul 2008 - Aggregating zoo object with NAs in multiple column

[R] Aggregating zoo object with NAs in multiple column

[R] Aggregating zoo object with NAs in multiple column