thr3ads.net - R devel - [Rd] tapply with weighted.mean [Jan 2005]

If this information is useful, please help other people find it:
Share via:

Martyn Plummer

2005-Jan-26 16:43 UTC

[Rd] tapply with weighted.mean

We were caught out recently attempting to use tapply to get a table of
weighted means.  This gives the wrong answer (or, more correctly, not
the answer we were expecting), as the following example shows:

R> x <- 1:10 #some data
R> w <- c(1:5,5:1) #weights
R> id <- rep(1:2,rep(5,2)) #id values
R> weighted.mean(x[id==1],w[id==1]) #Weighted mean of x in group 1
[1] 3.666667
R> weighted.mean(x[id==2],w[id==2]) #Weighted mean of x in group 2
[1] 7.333333
R> tapply(x,INDEX=id,FUN=weighted.mean,w=w) #Wrong!
1 2
3 8

The reason for this is that tapply splits it's first argument by the
INDEX variable, but does not split any of the arguments supplied via ...
So the result is

c(weighted.mean(x[id==1],w), weighted.mean(x[id==2],w))

R silently replicates the shorter variable to match the length of the
longer one.

I draw two conclusions from this:

1) weighted.mean(x,w) should include a length check for w.  The
documentation says it should be the same length as x, so this should be
enforced.

2) More importantly, the help page for tapply should explicitly warn the
user that optional arguments supplied to 'FUN' are not split by
'INDEX'.
I really only understood the behaviour of tapply after inspecting the
code. Then it became obvious why this could never work.

I hope I am not being too obtuse.  Any objections before I make these
changes?

Martyn

Apparently Analagous Threads

Search for more reasonably related threads

R devel - Jan 2005 - tapply with weighted.mean

[Rd] tapply with weighted.mean

Apparently Analagous Threads

Wisdom of the Ancients