I would like to de-mean the 'vector' column of the following dataframe by factor: set.seed(5444) vector <- rnorm(1:10) factor <- rep(1:2,5) test.df <- data.frame(factor, vector) which is: factor vector 1 1 -0.4963935 2 2 -2.0768182 3 1 -1.5822224 4 2 0.8025474 5 1 0.3504199 6 2 0.2358464 7 1 -0.3989443 8 2 -0.3692544 9 1 -0.3174586 10 2 1.4305431 Using the by() command, I get:> by(test.df$vector, test.df$factor, function(x) {x - mean(x)})test.df$factor: 1 [1] -0.007473699 -1.093302612 0.839339673 0.089975488 0.171461151 -------------------------------------------------------------------------------------------------- test.df$factor: 2 [1] -2.0813911 0.7979745 0.2312735 -0.3738272 1.4259702>My question is: Is there a way to have this output put back to the dataframe? I.e to make by(), or some other command, return me a vector of length 10 whose values x' correspond to x'_1 = x_1 - mean(x | factor1), x'_2 = x_2 - mean(x | factor2),... Thanks in advance for the help, and apologies for the poor notation. Vassilis -- View this message in context: http://r.789695.n4.nabble.com/Can-the-by-function-return-a-single-column-tp3089231p3089231.html Sent from the R help mailing list archive at Nabble.com.
Hello, Vassilis, maybe> with( test.df, ave( vector, factor, FUN = function( x) x - mean( x)))does what you want. -- Gerrit On Wed, 15 Dec 2010, Vassilis wrote:> > I would like to de-mean the 'vector' column of the following dataframe by > factor: > > set.seed(5444) > vector <- rnorm(1:10) > factor <- rep(1:2,5) > test.df <- data.frame(factor, vector) > > which is: > > factor vector > 1 1 -0.4963935 > 2 2 -2.0768182 > 3 1 -1.5822224 > 4 2 0.8025474 > 5 1 0.3504199 > 6 2 0.2358464 > 7 1 -0.3989443 > 8 2 -0.3692544 > 9 1 -0.3174586 > 10 2 1.4305431 > > Using the by() command, I get: > >> by(test.df$vector, test.df$factor, function(x) {x - mean(x)}) > test.df$factor: 1 > [1] -0.007473699 -1.093302612 0.839339673 0.089975488 0.171461151 > -------------------------------------------------------------------------------------------------- > test.df$factor: 2 > [1] -2.0813911 0.7979745 0.2312735 -0.3738272 1.4259702 >> > > My question is: Is there a way to have this output put back to the > dataframe? I.e to make by(), or some other command, return me a vector of > length 10 whose values x' correspond to x'_1 = x_1 - mean(x | factor1), x'_2 > = x_2 - mean(x | factor2),... > > Thanks in advance for the help, and apologies for the poor notation. > > Vassilis > > -- > View this message in context: http://r.789695.n4.nabble.com/Can-the-by-function-return-a-single-column-tp3089231p3089231.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Gerrit, This does exactly what I want, thank you very much! Even more, I notice that ave() uses the split/unsplit functions under the hood, which are very useful tools as they allow to apply even more complicated functions on a factor-by-factor basis. best, Vassilis -- View this message in context: http://r.789695.n4.nabble.com/Can-the-by-function-return-a-single-column-tp3089231p3089556.html Sent from the R help mailing list archive at Nabble.com.
Seemingly Similar Threads
- weighed mean of a data frame row-by-row
- How to handle missing values for the GeneMeta package?
- blazer_usb MEC0002 problem Fry's Electronics (Turbo-X) [HID PDC?]
- X100P random hangups - Please help with suggestions
- blazer_usb MEC0002 problem Fry's Electronics (Turbo-X) [HID PDC?]