thr3ads.net - R help - [R] (Newbie) Aggregate for NA values [Feb 2006]

If this information is useful, please help other people find it:
Share via:

Vivek Satsangi

2006-Feb-24 15:16 UTC

[R] (Newbie) Aggregate for NA values

Folks,

Sorry if this question has been answered before or is obvious (or
worse, statistically "bad"). I don't understand what was said in
one
of the search results that seems somewhat related.

I use aggregate to get a quick summary of the data. Part of what I am
looking for in the summary is, how much influence might the NA's have
had, if they were included, and is excluding them from the means
causing some sort of bias. So I want the summary stat for the NA's
also.

Here is a simple example session (edited to remove the typos I made,
comments added later):
> tmp_a <- 1:10
> tmp_b <- rep(1:5,2)
> tmp_c <- rep(1:2,5)
> tmp_d <- c(1,1,1,2,2,2,3,3,3,4)
> tmp_df <- data.frame(tmp_a,tmp_b,tmp_c,tmp_d);
> tmp_df$tmp_c[9:10] <- NA ;
> tmp_df   tmp_a tmp_b tmp_c tmp_d
1      1     1     1     1
2      2     2     2     1
3      3     3     1     1
4      4     4     2     2
5      5     5     1     2
6      6     1     2     2
7      7     2     1     3
8      8     3     2     3
9      9     4    NA     3
10    10     5    NA     4> aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_b,tmp_df$tmp_c),mean);  Group.1 Group.2 x
1       1       1 1
2       2       1 3
3       3       1 1
4       5       1 2
5       1       2 2
6       2       2 1
7       3       2 3
8       4       2 2
# Only one row for each (tmp_b, tmp_c) combination, NA's getting dropped.
> aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_c),mean);  Group.1    x
1       1 1.75
2       2 2.00

What I want in this last aggregate is, a mean for the values in tmp_d
that correspond to the tmp_c values of NA. Similarly, perhaps there is
a way to make the second last call to aggregate return the values of
tmp_d for the NA values of tmp_c also.

How can I achieve this?

--
-- Vivek Satsangi
Student, Rochester, NY USA

Adaikalavan Ramasamy

2006-Feb-24 16:05 UTC

head link

[R] (Newbie) Aggregate for NA values

I think it makes perfect sense for R to drop it since 'NA' represents
uninformative information. I do not know if there is a elegant solution
but I would suggest that you make these 'NA' into an informative value.

Here is one possibility:

 df <- data.frame( AA=1:10, BB=rep(1:5,2), CC=rep(1:2,5), DD=rnorm(10) )
 df[ 9:10, "CC" ] <- NA

 df[is.na(df)] <- "lala"   ## change NA's into informative
category ##


 aggregate( df$DD, by=list( df$CC ), mean  )
     Group.1          x
   1       1  1.1533763
   2       2  0.6427338
   3    lala -0.2745249

 aggregate( df$DD, by=list( df$BB, df$CC ), mean  )
      Group.1 Group.2           x
   1        1       1  0.47264081
   2        2       1  0.63795211
   3        3       1  1.66756015
   4        5       1  1.83535232
   5        1       2  0.89914287
   6        2       2  1.11102134
   7        3       2  0.22268699
   8        4       2  0.33808394
   9        4    lala -0.60154608
   10       5    lala  0.05249622

Regards, Adai



On Fri, 2006-02-24 at 10:16 -0500, Vivek Satsangi wrote:> Folks,
> 
> Sorry if this question has been answered before or is obvious (or
> worse, statistically "bad"). I don't understand what was said
in one
> of the search results that seems somewhat related.
> 
> I use aggregate to get a quick summary of the data. Part of what I am
> looking for in the summary is, how much influence might the NA's have
> had, if they were included, and is excluding them from the means
> causing some sort of bias. So I want the summary stat for the NA's
> also.
> 
> Here is a simple example session (edited to remove the typos I made,
> comments added later):
> 
> > tmp_a <- 1:10
> > tmp_b <- rep(1:5,2)
> > tmp_c <- rep(1:2,5)
> > tmp_d <- c(1,1,1,2,2,2,3,3,3,4)
> > tmp_df <- data.frame(tmp_a,tmp_b,tmp_c,tmp_d);
> > tmp_df$tmp_c[9:10] <- NA ;
> > tmp_df
>    tmp_a tmp_b tmp_c tmp_d
> 1      1     1     1     1
> 2      2     2     2     1
> 3      3     3     1     1
> 4      4     4     2     2
> 5      5     5     1     2
> 6      6     1     2     2
> 7      7     2     1     3
> 8      8     3     2     3
> 9      9     4    NA     3
> 10    10     5    NA     4
> > aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_b,tmp_df$tmp_c),mean);
>   Group.1 Group.2 x
> 1       1       1 1
> 2       2       1 3
> 3       3       1 1
> 4       5       1 2
> 5       1       2 2
> 6       2       2 1
> 7       3       2 3
> 8       4       2 2
> # Only one row for each (tmp_b, tmp_c) combination, NA's getting
dropped.
> 
> > aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_c),mean);
>   Group.1    x
> 1       1 1.75
> 2       2 2.00
> 
> What I want in this last aggregate is, a mean for the values in tmp_d
> that correspond to the tmp_c values of NA. Similarly, perhaps there is
> a way to make the second last call to aggregate return the values of
> tmp_d for the NA values of tmp_c also.
> 
> How can I achieve this?
> 
> --
> -- Vivek Satsangi
> Student, Rochester, NY USA
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Feb 2006 - (Newbie) Aggregate for NA values

[R] (Newbie) Aggregate for NA values

[R] (Newbie) Aggregate for NA values

Apparently Analagous Threads