thr3ads.net - R help - [R] How to force aggregate to exclude NA ? [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Daren Tan

2008-Dec-07 12:06 UTC

[R] How to force aggregate to exclude NA ?

The aggregate function does "almost" all that I need to summarize a
datasets, except that I can't specify exclusion of NAs without a little bit
of hassle.
 > set.seed(143)
> m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10],
20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T))
> m   A B  C  D
1  E I  1 NA
2  A C NA NA
3  D I NA  3
4  C I  2  4
5  A C  3  2
6  E J  1  2
7  D J  2  2
8  C G  4  1
9  C D NA  3
10 B G  3 NA
11 C B  4  2
12 A B NA NA
13 E A NA  4
14 B B  3  3
15 E I  4  1
16 E J  3  1
17 B J  4  4
18 B J  1  3
19 D D  4  2
20 B B  4  3
 > aggregate(m[,-c(1:2)], by=list(m[,1]), sum)  Group.1  C  D
1       A NA NA
2       B 15 NA
3       C NA 10
4       D NA  7
5       E NA NA
> aggregate(m[,-c(1:2)], by=list(m[,1]), length)  Group.1 C D
1       A 3 3
2       B 5 5
3       C 4 4
4       D 3 3
5       E 5 5

My own defined version of length and sum to exclude NA
> mylength <- function(x) {  sum(as.logical(x), na.rm=T) }
> mysum <- function(x) {sum(x, na.rm=T)}
 > aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <----------------- this
computes correctly.  Group.1  C  D
1       A  3  2
2       B 15 13
3       C 10 10
4       D  6  7
5       E  9  8
> aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this
computes correctly.  Group.1 C D
1       A 1 1
2       B 5 4
3       C 3 4
4       D 2 3
5       E 4 4

There are other statistics I need to compute e.g. var, sd, and it is a hassle to
create customized versions to exclude NA. Any alternative approaches ?


 
 
_________________________________________________________________
[[elided Hotmail spam]]

Gabor Grothendieck

2008-Dec-07 12:43 UTC

head link

[R] How to force aggregate to exclude NA ?

Try

aggregate(m[, -(1:2)], m[1], sum, na.rm = TRUE)
aggregate(!is.na(m[, -(1:2)]), m[1], sum, na.rm = TRUE)

# or (this uses row names rather than a column for the group):

rowsum(m[, -(1:2)], m[,1], na.rm = TRUE)
rowsum(0+!is.na(m[, -(1:2)]), m[,1], na.rm = TRUE)


On Sun, Dec 7, 2008 at 7:06 AM, Daren Tan <daren76 at hotmail.com>
wrote:>
> The aggregate function does "almost" all that I need to summarize
a datasets, except that I can't specify exclusion of NAs without a little
bit of hassle.
>
>> set.seed(143)
>> m <- data.frame(A=sample(LETTERS[1:5], 20, T),
B=sample(LETTERS[1:10], 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4),
20, T))
>> m
>   A B  C  D
> 1  E I  1 NA
> 2  A C NA NA
> 3  D I NA  3
> 4  C I  2  4
> 5  A C  3  2
> 6  E J  1  2
> 7  D J  2  2
> 8  C G  4  1
> 9  C D NA  3
> 10 B G  3 NA
> 11 C B  4  2
> 12 A B NA NA
> 13 E A NA  4
> 14 B B  3  3
> 15 E I  4  1
> 16 E J  3  1
> 17 B J  4  4
> 18 B J  1  3
> 19 D D  4  2
> 20 B B  4  3
>
>> aggregate(m[,-c(1:2)], by=list(m[,1]), sum)
>  Group.1  C  D
> 1       A NA NA
> 2       B 15 NA
> 3       C NA 10
> 4       D NA  7
> 5       E NA NA
>
>> aggregate(m[,-c(1:2)], by=list(m[,1]), length)
>  Group.1 C D
> 1       A 3 3
> 2       B 5 5
> 3       C 4 4
> 4       D 3 3
> 5       E 5 5
>
> My own defined version of length and sum to exclude NA
>
>> mylength <- function(x) {  sum(as.logical(x), na.rm=T) }
>> mysum <- function(x) {sum(x, na.rm=T)}
>
>> aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <-----------------
this computes correctly.
>  Group.1  C  D
> 1       A  3  2
> 2       B 15 13
> 3       C 10 10
> 4       D  6  7
> 5       E  9  8
>
>> aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <-----------------
this computes correctly.
>  Group.1 C D
> 1       A 1 1
> 2       B 5 4
> 3       C 3 4
> 4       D 2 3
> 5       E 4 4
>
> There are other statistics I need to compute e.g. var, sd, and it is a
hassle to create customized versions to exclude NA. Any alternative approaches ?
>
>
>
>
> _________________________________________________________________
> [[elided Hotmail spam]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Mike Lawrence

2008-Dec-07 13:11 UTC

head link

[R] How to force aggregate to exclude NA ?

This should work. I updated a personal modification of aggregate that I made
to facilitate return of multiple values if necessary:
#modified aggregate command, implementing na.rm for all functions and
allowing for multiple and/or named return values values
agg=function(z,Ind,FUN,na.rm=F,...){
if(na.rm){
for(i in 1:length(Ind)){
Ind[[i]] = Ind[[i]][!is.na(z)]
}
z = z[!is.na(z)]
}
FUN.out=by(z,Ind,FUN,...)
num.cells=length(FUN.out)
num.values=length(FUN.out[[1]])
Ind.levels = list()
for(i in 1:length(Ind)){
Ind.levels[[i]]=levels(factor(Ind[[i]]))
}
temp=expand.grid(Ind.levels)
if(is.character(names(Ind))){
names(temp) = names(Ind)
}else{
names(temp) = paste('Var',1:length(Ind),sep='')
}
for(i in 1:num.values){
temp$new=NA
n=names(FUN.out[[1]])[i]
names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse(i==1,'x',paste('x',i,sep='')))
for(j in 1:num.cells){
temp[j,length(temp)]=FUN.out[[j]][i]
}
}
return(temp)
}

# create some data
z=rnorm(100)
A=rep(1:2,each=25,2)
B=rep(1:2,each=50)
Ind=list(A=A,B=B)

aggregate(z,Ind,mean)
agg(z,Ind,mean) #should be identical to aggregate

aggregate(z,Ind,summary) #returns an error
agg(z,Ind,summary) #returns named columns

# Make a function that returns multiple unnamed values
summary2=function(x){
s=summary(x)
names(s)=NULL
return(s)
}
agg(z,Ind,summary2) #returns multiple columns, default names

#demonstrate implementation of na.rm
z[1]=NA
z[100]=NA
agg(z,Ind,sum) #returns NA for some cells
agg(z,Ind,sum,na.rm=T) #removes NAs before calculating sum


On Sun, Dec 7, 2008 at 8:06 AM, Daren Tan <daren76@hotmail.com> wrote:
>
> The aggregate function does "almost" all that I need to summarize
a
> datasets, except that I can't specify exclusion of NAs without a little
bit
> of hassle.
>
> > set.seed(143)
> > m <- data.frame(A=sample(LETTERS[1:5], 20, T),
B=sample(LETTERS[1:10],
> 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T))
> > m
>   A B  C  D
> 1  E I  1 NA
> 2  A C NA NA
> 3  D I NA  3
> 4  C I  2  4
> 5  A C  3  2
> 6  E J  1  2
> 7  D J  2  2
> 8  C G  4  1
> 9  C D NA  3
> 10 B G  3 NA
> 11 C B  4  2
> 12 A B NA NA
> 13 E A NA  4
> 14 B B  3  3
> 15 E I  4  1
> 16 E J  3  1
> 17 B J  4  4
> 18 B J  1  3
> 19 D D  4  2
> 20 B B  4  3
>
> > aggregate(m[,-c(1:2)], by=list(m[,1]), sum)
>  Group.1  C  D
> 1       A NA NA
> 2       B 15 NA
> 3       C NA 10
> 4       D NA  7
> 5       E NA NA
>
> > aggregate(m[,-c(1:2)], by=list(m[,1]), length)
>  Group.1 C D
> 1       A 3 3
> 2       B 5 5
> 3       C 4 4
> 4       D 3 3
> 5       E 5 5
>
> My own defined version of length and sum to exclude NA
>
> > mylength <- function(x) {  sum(as.logical(x), na.rm=T) }
> > mysum <- function(x) {sum(x, na.rm=T)}
>
> > aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <-----------------
this
> computes correctly.
>  Group.1  C  D
> 1       A  3  2
> 2       B 15 13
> 3       C 10 10
> 4       D  6  7
> 5       E  9  8
>
> > aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <-----------------
this
> computes correctly.
>  Group.1 C D
> 1       A 1 1
> 2       B 5 4
> 3       C 3 4
> 4       D 2 3
> 5       E 4 4
>
> There are other statistics I need to compute e.g. var, sd, and it is a
> hassle to create customized versions to exclude NA. Any alternative
> approaches ?
>
>
>
>
> _________________________________________________________________
> [[elided Hotmail spam]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University
www.thatmike.com

Looking to arrange a meeting? Do so at:
http://www.timetomeet.info/with/mike/

~ Certainty is folly... I think. ~

	[[alternative HTML version deleted]]

hadley wickham

2008-Dec-07 13:45 UTC

head link

[R] How to force aggregate to exclude NA ?

>> aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <-----------------
this computes correctly.
>  Group.1  C  D
> 1       A  3  2
> 2       B 15 13
> 3       C 10 10
> 4       D  6  7
> 5       E  9  8
>
>> aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <-----------------
this computes correctly.
>  Group.1 C D
> 1       A 1 1
> 2       B 5 4
> 3       C 3 4
> 4       D 2 3
> 5       E 4 4
>
> There are other statistics I need to compute e.g. var, sd, and it is a
hassle to create customized versions to exclude NA. Any alternative approaches ?
How about writing a function to do the customisation for you?

na.rm <- function(f) {
  function(x, ...) f(x[!is.na(x)], ...)
}

aggregate(m[,-c(1:2)], by=list(m[,1]), na.rm(sum))
aggregate(m[,-c(1:2)], by=list(m[,1]), na.rm(length))

Hadley

-- 
http://had.co.nz/

Daren Tan

2008-Dec-07 16:10 UTC

head link

[R] How to force aggregate to exclude NA ?

How to use the na.rm function outside aggregate ? I tried 
 
na.rm <- function(f) {
  function(x, ...) f(x[!is.na(x)], ...)
}
 
 >na.rm(sum(c(NA,1,2)))
function(x, ...) f(x[!is.na(x)], ...)

> na.rm(sum, c(NA,1,2))Error in na.rm(sum, c(NA, 1, 2)) : unused argument(s) (c(NA, 1, 2))

 

> Date: Sun, 7 Dec 2008 07:45:14 -0600
> From: h.wickham at gmail.com
> To: daren76 at hotmail.com
> Subject: Re: [R] How to force aggregate to exclude NA ?
> CC: r-help at stat.math.ethz.ch
> 
>>> aggregate(m[,-c(1:2)], by=list(m[,1]), mysum) <-----------------
this computes correctly.
>> Group.1 C D
>> 1 A 3 2
>> 2 B 15 13
>> 3 C 10 10
>> 4 D 6 7
>> 5 E 9 8
>>
>>> aggregate(m[,-c(1:2)], by=list(m[,1]), mylength)
<----------------- this computes correctly.
>> Group.1 C D
>> 1 A 1 1
>> 2 B 5 4
>> 3 C 3 4
>> 4 D 2 3
>> 5 E 4 4
>>
>> There are other statistics I need to compute e.g. var, sd, and it is a
hassle to create customized versions to exclude NA. Any alternative approaches ?
> 
> How about writing a function to do the customisation for you?
> 
> na.rm <- function(f) {
> function(x, ...) f(x[!is.na(x)], ...)
> }
> 
> aggregate(m[,-c(1:2)], by=list(m[,1]), na.rm(sum))
> aggregate(m[,-c(1:2)], by=list(m[,1]), na.rm(length))
> 
> Hadley
> 
> -- 
> http://had.co.nz/

hadley wickham

2008-Dec-07 18:00 UTC

head link

[R] How to force aggregate to exclude NA ?

On Sun, Dec 7, 2008 at 10:10 AM, Daren Tan <daren76 at hotmail.com>
wrote:>
> How to use the na.rm function outside aggregate ? I tried
>
> na.rm <- function(f) {
>  function(x, ...) f(x[!is.na(x)], ...)
> }
>
>
>>na.rm(sum(c(NA,1,2)))
>
> function(x, ...) f(x[!is.na(x)], ...)
>
>
>> na.rm(sum, c(NA,1,2))
> Error in na.rm(sum, c(NA, 1, 2)) : unused argument(s) (c(NA, 1, 2))
na.rm(sum)(c(NA, 1, 2))

Hadley

-- 
http://had.co.nz/

Seemingly Similar Threads

Search for more reasonably related threads

R help - Dec 2008 - How to force aggregate to exclude NA ?

[R] How to force aggregate to exclude NA ?

[R] How to force aggregate to exclude NA ?

[R] How to force aggregate to exclude NA ?

[R] How to force aggregate to exclude NA ?

[R] How to force aggregate to exclude NA ?

[R] How to force aggregate to exclude NA ?

Seemingly Similar Threads