Dear list, I just made a very simple mistake, but it was hard to spot. And I think that I should warn other people, because it is probably so simple to make... === R code == # Let us create a matrix: (a <- cbind(c(0,1,1), rep(1,3))) # [,1] [,2] # [1,] 0 1 # [2,] 1 1 # [3,] 1 1 # That is a MISTAKE: a/colSums(a) # [,1] [,2] # [1,] 0.0000000 0.3333333 # [2,] 0.3333333 0.5000000 # [3,] 0.5000000 0.3333333 # I just wonder if some R warning should be issued here? # That is what I actually needed (column-wise frequencies): t(t(a)/colSums(a)) # [,1] [,2] # [1,] 0.0 0.3333333 # [2,] 0.5 0.3333333 # [3,] 0.5 0.3333333 === end of R code == With best wishes and regards, Alexey Shipunov
and you might want to check ?prop.table > prop.table(a, 2) [,1] [,2] [1,] 0.0 0.3333333 [2,] 0.5 0.3333333 [3,] 0.5 0.3333333 or even ?sweep (which will be useful for more complex situations) > sweep(a, 2, colSums(a), "/") [,1] [,2] [1,] 0.0 0.3333333 [2,] 0.5 0.3333333 [3,] 0.5 0.3333333 b On Mar 7, 2008, at 5:32 PM, Alexey Shipunov wrote:> Dear list, > > I just made a very simple mistake, but it was hard to spot. And I > think that I should warn other people, because it is probably so > simple to make... > > === R code ==> > # Let us create a matrix: > (a <- cbind(c(0,1,1), rep(1,3))) > > # [,1] [,2] > # [1,] 0 1 > # [2,] 1 1 > # [3,] 1 1 > > # That is a MISTAKE: > a/colSums(a) > > # [,1] [,2] > # [1,] 0.0000000 0.3333333 > # [2,] 0.3333333 0.5000000 > # [3,] 0.5000000 0.3333333 > # I just wonder if some R warning should be issued here? > > # That is what I actually needed (column-wise frequencies): > t(t(a)/colSums(a)) > > # [,1] [,2] > # [1,] 0.0 0.3333333 > # [2,] 0.5 0.3333333 > # [3,] 0.5 0.3333333 > > === end of R code ==> > With best wishes and regards, > Alexey Shipunov > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
R is working exactly as documented. If you look at how the matrix is stored (column-wise) and then what you are dividing by, you will see that it is doing what you asked (when recycling values):> as.vector(a)[1] 0 1 1 1 1 1> as.vector(a)/c(2,3)[1] 0.0000000 0.3333333 0.5000000 0.3333333 0.5000000 0.3333333>On Fri, Mar 7, 2008 at 5:32 PM, Alexey Shipunov <dactylorhiza at gmail.com> wrote:> Dear list, > > I just made a very simple mistake, but it was hard to spot. And I > think that I should warn other people, because it is probably so > simple to make... > > === R code ==> > # Let us create a matrix: > (a <- cbind(c(0,1,1), rep(1,3))) > > # [,1] [,2] > # [1,] 0 1 > # [2,] 1 1 > # [3,] 1 1 > > # That is a MISTAKE: > a/colSums(a) > > # [,1] [,2] > # [1,] 0.0000000 0.3333333 > # [2,] 0.3333333 0.5000000 > # [3,] 0.5000000 0.3333333 > # I just wonder if some R warning should be issued here? > > # That is what I actually needed (column-wise frequencies): > t(t(a)/colSums(a)) > > # [,1] [,2] > # [1,] 0.0 0.3333333 > # [2,] 0.5 0.3333333 > # [3,] 0.5 0.3333333 > > === end of R code ==> > With best wishes and regards, > Alexey Shipunov > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Alexey Shipunov wrote:> Dear list, > > I just made a very simple mistake, but it was hard to spot. And I > think that I should warn other people, because it is probably so > simple to make... > > === R code ==> > # Let us create a matrix: > (a <- cbind(c(0,1,1), rep(1,3))) > > # [,1] [,2] > # [1,] 0 1 > # [2,] 1 1 > # [3,] 1 1 > > # That is a MISTAKE: > a/colSums(a) > > # [,1] [,2] > # [1,] 0.0000000 0.3333333 > # [2,] 0.3333333 0.5000000 > # [3,] 0.5000000 0.3333333 > # I just wonder if some R warning should be issued here? > > # That is what I actually needed (column-wise frequencies): > t(t(a)/colSums(a)) > > # [,1] [,2] > # [1,] 0.0 0.3333333 > # [2,] 0.5 0.3333333 > # [3,] 0.5 0.3333333 > > === end of R code ==> > With best wishes and regards, > Alexey Shipunovor more simply: > prop.table(a, 2) [,1] [,2] [1,] 0.0 0.3333333 [2,] 0.5 0.3333333 [3,] 0.5 0.3333333 See ?prop.table You need to recognize how matrices are stored in R. They are vectors with a 'dim' attribute. Thus 'a' is essentially: > as.vector(a) [1] 0 1 1 1 1 1 If I take that vector, as say 'x': x <- as.vector(a) > x [1] 0 1 1 1 1 1 dim(x) <- c(3, 2) > x [,1] [,2] [1,] 0 1 [2,] 1 1 [3,] 1 1 When you do the division, since colSums(a) contains two values, they are recycled as required to get the result. Thus colSums(a) effectively becomes: > rep(colSums(a), 3) [1] 2 3 2 3 2 3 so that it is equal in length to 'a'. The result is then: > as.vector(a) / rep(colSums(a), 3) [1] 0.0000000 0.3333333 0.5000000 0.3333333 0.5000000 0.3333333 which is then returned as a matrix with the original dimensions: > matrix(as.vector(a) / rep(colSums(a), 3), 3, 2) [,1] [,2] [1,] 0.0000000 0.3333333 [2,] 0.3333333 0.5000000 [3,] 0.5000000 0.3333333 Another option simply for the sake of example, is: > t(apply(a, 1, function(x) x / colSums(a))) [,1] [,2] [1,] 0.0 0.3333333 [2,] 0.5 0.3333333 [3,] 0.5 0.3333333 prop.table() actually uses sweep(), so you might also want to look at that. So the key to understanding the behavior is to understand how R objects are stored and how recycling rules work. HTH, Marc Schwartz