Dear R-help, I've encounter what seems to me a strange problem with "names<-". Suppose I define the function: fun <- function(x, f) { m <- tapply(x, f, mean) ans <- x - m[match(f, unique(f))] names(ans) <- names(x) ans } which subtract out the means of `x' grouped by `f' (which is the same as, e.g., resid(lm(x~f)) if `f' is a factor). If `x' does not have names, then I'd expect the output of the function not to have names, as names(x) would be NULL, and assigning NULL to names(ans) should wipe out the names of `ans'. However, I get:> x = rnorm(20) > f = factor(sample(rep(letters[1:4], 5))) > fun(x, f)a b c b c c d -0.53791639 1.03704065 0.95727411 0.89219177 -0.04218746 0.57976675 -2.15799919 a c d a d b d 1.28422452 -0.92881186 0.40526262 -0.13471983 -0.72599709 1.68726680 -0.95420354 a c a b b d -2.28013373 1.02522037 0.07728352 0.54321899 0.95742354 -1.68420455 What am I missing? [BTW, this is using the tip that Thomas Lumley posted about forming the group means. I've wanted to write a `tsweep' function that's sort of the cross of tapply() and sweep().] Best, Andy Liaw, PhD Biometrics Research PO Box 2000, RY33-300 Merck Research Labs Rahway, NJ 07065 mailto:andy_liaw at merck.com 732-594-0820
Execute these two commands: ans <- fun(x,f) attributes(ans) and you get this: $dim [1] 20 $dimnames $dimnames[[1]] [1] "a" "a" "b" "c" "a" "d" "a" "b" "d" "d" "a" "b" "d" "c" "c" "c" "b" "c" "b" [20] "d" so ans does not have names, it has dimnames. If you try dimnames(ans) <- NULL then its dimnames do get nulled out. Liaw, Andy <andy_liaw <at> merck.com> writes: : : Dear R-help, : : I've encounter what seems to me a strange problem with "names<-". Suppose I : define the function: : : fun <- function(x, f) { : m <- tapply(x, f, mean) : ans <- x - m[match(f, unique(f))] : names(ans) <- names(x) : ans : } : : which subtract out the means of `x' grouped by `f' (which is the same as, : e.g., resid(lm(x~f)) if `f' is a factor). If `x' does not have names, then : I'd expect the output of the function not to have names, as names(x) would : be NULL, and assigning NULL to names(ans) should wipe out the names of : `ans'. However, I get: : : > x = rnorm(20) : > f = factor(sample(rep(letters[1:4], 5))) : > fun(x, f) : a b c b c c : d : -0.53791639 1.03704065 0.95727411 0.89219177 -0.04218746 0.57976675 : -2.15799919 : a c d a d b : d : 1.28422452 -0.92881186 0.40526262 -0.13471983 -0.72599709 1.68726680 : -0.95420354 : a c a b b d : -2.28013373 1.02522037 0.07728352 0.54321899 0.95742354 -1.68420455 : : What am I missing? : : [BTW, this is using the tip that Thomas Lumley posted about forming the : group means. I've wanted to write a `tsweep' function that's sort of the : cross of tapply() and sweep().] : : Best, : Andy Liaw, PhD : Biometrics Research PO Box 2000, RY33-300 : Merck Research Labs Rahway, NJ 07065 : mailto:andy_liaw <at> merck.com 732-594-0820 : : ______________________________________________ : R-help <at> stat.math.ethz.ch mailing list : https://www.stat.math.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html : :
Remember tapply with a single factor in R returns a 1D array. What you are seeing are the dimnames, not the names: look at attributes() on your return value (or even name() or str() on it). I suspect you intended an as.vector() call in the formation of m. Brian On Sun, 9 May 2004, Liaw, Andy wrote:> I've encounter what seems to me a strange problem with "names<-". Suppose I > define the function: > > fun <- function(x, f) { > m <- tapply(x, f, mean) > ans <- x - m[match(f, unique(f))] > names(ans) <- names(x) > ans > } > > which subtract out the means of `x' grouped by `f' (which is the same as, > e.g., resid(lm(x~f)) if `f' is a factor). If `x' does not have names, then > I'd expect the output of the function not to have names, as names(x) would > be NULL, and assigning NULL to names(ans) should wipe out the names of > `ans'. However, I get: > > > x = rnorm(20) > > f = factor(sample(rep(letters[1:4], 5))) > > fun(x, f) > a b c b c c > d > -0.53791639 1.03704065 0.95727411 0.89219177 -0.04218746 0.57976675 > -2.15799919 > a c d a d b > d > 1.28422452 -0.92881186 0.40526262 -0.13471983 -0.72599709 1.68726680 > -0.95420354 > a c a b b d > -2.28013373 1.02522037 0.07728352 0.54321899 0.95742354 -1.68420455 > > What am I missing?-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
"Liaw, Andy" <andy_liaw at merck.com> writes:> [BTW, this is using the tip that Thomas Lumley posted about forming the > group means. I've wanted to write a `tsweep' function that's sort of the > cross of tapply() and sweep().]Also notice that this is unsplit(lapply(split(x, g), scale, scale=FALSE), g) and the generalized sweep might be written along the lines of unsplit(mapply("-",split(x,g),tapply(x,g,mean)),g) Can't vouch for the speed, though. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Liaw, Andy wrote:>Suppose I >define the function: > >fun <- function(x, f) { > m <- tapply(x, f, mean) > ans <- x - m[match(f, unique(f))] > names(ans) <- names(x) > ans >} > > >May I ask what is the purpose of match(f,unique(f)) ? To remove the group means, I have be using: x-tapply(x,f,mean)[f] for a while, (and I am now changing to x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of indexing named vectors with factors ) The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular order in the result of tapply, no? It seems a bit dangerous to me. Christophe Pallier
On 10 May 2004 at 10:09, Christophe Pallier wrote:> > > Liaw, Andy wrote: > > >Suppose I > >define the function: > > > >fun <- function(x, f) { > > m <- tapply(x, f, mean) > > ans <- x - m[match(f, unique(f))] > > names(ans) <- names(x) > > ans > >} > > > > > > > > May I ask what is the purpose of match(f,unique(f)) ? > > To remove the group means, I have be using: > > x-tapply(x,f,mean)[f] > > for a while, (and I am now changing to > x-tapply(x,f,mean)[as.character(f)] because of the peculiarities ofwouldn't sweep(as.array(x), 1, tapply(x,f,mean)[as.character(f)] , "-") be more natural? Kjetil Halvorsen> indexing named vectors with factors ) > > The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular > order in the result of tapply, no? It seems a bit dangerous to me. > > > Christophe Pallier > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
On Sun, 9 May 2004, Liaw, Andy wrote:> Dear R-help, > > I've encounter what seems to me a strange problem with "names<-". Suppose I > define the function: > > fun <- function(x, f) { > m <- tapply(x, f, mean) > ans <- x - m[match(f, unique(f))] > names(ans) <- names(x) > ans > } > > which subtract out the means of `x' grouped by `f' (which is the same as, > e.g., resid(lm(x~f)) if `f' is a factor). If `x' does not have names, then > I'd expect the output of the function not to have names, as names(x) would > be NULL, and assigning NULL to names(ans) should wipe out the names of > `ans'. However, I get:That's because ans is a 1-d matrix, not a vector. If you want ans to be a vector you need ans <- as.vector(x-m[match(f, unique(f))]) names(ans)<-names(x) -thomas
On Mon, 10 May 2004, Christophe Pallier wrote:> > The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular > order in the result of tapply, no? It seems a bit dangerous to me. >My original code for the group means problem used rowsum(,reorder=FALSE) rather than tapply(), and we do know that this produces the same order as unique(). -thomas