Hi, I have this data.frame with two variables in it,> zV1 V2 1 10 8 2 NA 18 3 9 7 4 3 NA 5 NA 10 6 11 12 7 13 9 8 12 11 and a vector of means,> means <- apply(z, 2, function (col) mean(na.omit(col))) > meansV1 V2 9.666667 10.714286 My intention was substracting means from z, so instictively I tried> z-meansV1 V2 1 0.3333333 -1.6666667 2 NA 7.2857143 3 -0.6666667 -2.6666667 4 -7.7142857 NA 5 NA 0.3333333 6 0.2857143 1.2857143 7 3.3333333 -0.6666667 8 1.2857143 0.2857143 But this is completely wrong. sapply() gives the same result:> sapply(z, function(row) row - means)V1 V2 [1,] 0.3333333 -1.6666667 [2,] NA 7.2857143 [3,] -0.6666667 -2.6666667 [4,] -7.7142857 NA [5,] NA 0.3333333 [6,] 0.2857143 1.2857143 [7,] 3.3333333 -0.6666667 [8,] 1.2857143 0.2857143 So, what is going on here? The following appears to work> z-matrix(means,ncol=2)[rep(1, dim(z)[1]),]V1 V2 1 0.3333333 -2.7142857 2 NA 7.2857143 3 -0.6666667 -3.7142857 4 -6.6666667 NA 5 NA -0.7142857 6 1.3333333 1.2857143 7 3.3333333 -1.7142857 8 2.3333333 0.2857143 but I think it's rather cumbersome, surely there must be a cleaner way to do it. -- Ernest
R works by going down the columns. If you make the rows into columns, it then does what you want. You just have to make the columns back into rows to get the original shape of your matrix. So the code in one line is : t(t(z) - means) ---- Original message ---->Date: Fri, 28 Jan 2011 01:16:45 +0100 >From: r-help-bounces at r-project.org (on behalf of nfdisco at gmail.com (Ernest Adrogu? i Calveras)) >Subject: [R] sapply puzzlement >To: r-help at r-project.org > >Hi, > >I have this data.frame with two variables in it, > >> z > V1 V2 >1 10 8 >2 NA 18 >3 9 7 >4 3 NA >5 NA 10 >6 11 12 >7 13 9 >8 12 11 > >and a vector of means, > >> means <- apply(z, 2, function (col) mean(na.omit(col))) >> means > V1 V2 > 9.666667 10.714286 > >My intention was substracting means from z, so instictively I tried > >> z-means > V1 V2 >1 0.3333333 -1.6666667 >2 NA 7.2857143 >3 -0.6666667 -2.6666667 >4 -7.7142857 NA >5 NA 0.3333333 >6 0.2857143 1.2857143 >7 3.3333333 -0.6666667 >8 1.2857143 0.2857143 > >But this is completely wrong. sapply() gives the same result: > >> sapply(z, function(row) row - means) > V1 V2 >[1,] 0.3333333 -1.6666667 >[2,] NA 7.2857143 >[3,] -0.6666667 -2.6666667 >[4,] -7.7142857 NA >[5,] NA 0.3333333 >[6,] 0.2857143 1.2857143 >[7,] 3.3333333 -0.6666667 >[8,] 1.2857143 0.2857143 > >So, what is going on here? >The following appears to work > >> z-matrix(means,ncol=2)[rep(1, dim(z)[1]),] > V1 V2 >1 0.3333333 -2.7142857 >2 NA 7.2857143 >3 -0.6666667 -3.7142857 >4 -6.6666667 NA >5 NA -0.7142857 >6 1.3333333 1.2857143 >7 3.3333333 -1.7142857 >8 2.3333333 0.2857143 > >but I think it's rather cumbersome, surely there must be a cleaner way >to do it. > >-- >Ernest > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia
sapply(z, function(row) ...) does not actually grab a row at a time out of 'z'. It grabs a column (because 'z' is a data.frame) You may want: t(apply(z, 1, function(row) row - means)) or: t(t(z) - means) Hope that helps, -David Johnston -- View this message in context: http://r.789695.n4.nabble.com/sapply-puzzlement-tp3243520p3243534.html Sent from the R help mailing list archive at Nabble.com.
On Jan 27, 2011, at 7:16 PM, Ernest Adrogu? i Calveras wrote:> Hi, > > I have this data.frame with two variables in it, > >> z > V1 V2 > 1 10 8 > 2 NA 18 > 3 9 7 > 4 3 NA > 5 NA 10 > 6 11 12 > 7 13 9 > 8 12 11 > > and a vector of means, > >> means <- apply(z, 2, function (col) mean(na.omit(col))) >> means > V1 V2 > 9.666667 10.714286Two methods: A) use sweep (which by default takes the difference) > sweep(z, 2, means) V1 V2 1 0.3333333 -2.7142857 2 NA 7.2857143 3 -0.6666667 -3.7142857 4 -6.6666667 NA 5 NA -0.7142857 6 1.3333333 1.2857143 7 3.3333333 -1.7142857 8 2.3333333 0.2857143 B) use the scale function (whose "whole purpose in life" is to subtract the mean and possibly divide by the standard deviation which we suppressed in this case with the scale=FALSE argument) > scale(z, scale=FALSE) V1 V2 1 0.3333333 -2.7142857 2 NA 7.2857143 3 -0.6666667 -3.7142857 4 -6.6666667 NA 5 NA -0.7142857 6 1.3333333 1.2857143 7 3.3333333 -1.7142857 8 2.3333333 0.2857143 attr(,"scaled:center") V1 V2 9.666667 10.714286 -- David.> > My intention was substracting means from z, so instictively I tried > >> z-means > V1 V2 > 1 0.3333333 -1.6666667 > 2 NA 7.2857143 > 3 -0.6666667 -2.6666667 > 4 -7.7142857 NA > 5 NA 0.3333333 > 6 0.2857143 1.2857143 > 7 3.3333333 -0.6666667 > 8 1.2857143 0.2857143 > > But this is completely wrong. sapply() gives the same result: > >> sapply(z, function(row) row - means) > V1 V2 > [1,] 0.3333333 -1.6666667 > [2,] NA 7.2857143 > [3,] -0.6666667 -2.6666667 > [4,] -7.7142857 NA > [5,] NA 0.3333333 > [6,] 0.2857143 1.2857143 > [7,] 3.3333333 -0.6666667 > [8,] 1.2857143 0.2857143 > > So, what is going on here? > The following appears to work > >> z-matrix(means,ncol=2)[rep(1, dim(z)[1]),] > V1 V2 > 1 0.3333333 -2.7142857 > 2 NA 7.2857143 > 3 -0.6666667 -3.7142857 > 4 -6.6666667 NA > 5 NA -0.7142857 > 6 1.3333333 1.2857143 > 7 3.3333333 -1.7142857 > 8 2.3333333 0.2857143 > > but I think it's rather cumbersome, surely there must be a cleaner way > to do it. > > -- > Ernest > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT