The following are four ways I coded for the variance of multinomial distribution and their CPU time. It is a bit surprising that the 3rd method beats the 4th one substantially. var.multinomial1 = function(n, pi) + { + p = length(pi); + var = array(0, c(p,p)); + + for(i in 1:p) + for(j in 1:p) + { + if(i==j) var[i,j] = n*pi[i]*(1-pi[1]) + else var[i,j] = - n*pi[i]*pi[j]; + } + }> > var.multinomial2 = function(n, pi)+ { + n*( -pi%*%t(pi) + diag(pi) ) + }> > var.multinomial3 = function(n, pi)+ { + var = -pi%*%t(pi); + var[row(var)==col(var)] = pi*(1-pi); + n*var; + }> > var.multinomial4 = function(n, pi)+ { + var = -pi%*%t(pi); + var[seq(1, by=(p+1), length=p)] = pi*(1-pi); + n*var; + }> > n = 100; > pi = exp( rnorm(10)); > pi = pi/sum(pi); > n.simu = 1000; > > system.time(for(i in 1:n.simu) var = var.multinomial1(n, pi))[1] 4.50 0.04 99.73 NA NA> system.time(for(i in 1:n.simu) var = var.multinomial2(n, pi))[1] 0.33 0.00 10.74 NA NA> system.time(for(i in 1:n.simu) var = var.multinomial3(n, pi))[1] 0.14 0.00 4.17 NA NA> system.time(for(i in 1:n.simu) var = var.multinomial4(n, pi))[1] 0.21 0.00 6.47 NA NA> > >But first, does anyone know a vectorized way of coding (summing the outter product of rows) p = 10; x = rnorm(p*p); dim(x) = c(p.p); var = array(0, c(p,p)); for(i in 1:p) var = var + x[i,]%*% t(x[i,]); ====Jason G. Liao, Ph.D. Division of Biometrics University of Medicine and Dentistry of New Jersey 335 George Street, Suite 2200 New Brunswick, NJ 08903-2688 phone (732) 235-8611, fax (732) 235-9777 http://www.geocities.com/jg_liao __________________________________________________ Faith Hill - Exclusive Performances, Videos & More -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
t(X)%*%X or better crossprod(X) is the sum of outer products of the rows of X. The (i,j) entry is the sum of X[k,i]*X[k,j] over k, i.e., the (i,j) entry of the outer product of X[k, ] with itself. Reid Huntsinger -----Original Message----- From: Jason Liao [mailto:jg_liao at yahoo.com] Sent: Tuesday, October 15, 2002 12:38 PM To: r-help at stat.math.ethz.ch Subject: [R] case study of efficient R coding [...] But first, does anyone know a vectorized way of coding (summing the outter product of rows) p = 10; x = rnorm(p*p); dim(x) = c(p.p); var = array(0, c(p,p)); for(i in 1:p) var = var + x[i,]%*% t(x[i,]); ====Jason G. Liao, Ph.D. Division of Biometrics University of Medicine and Dentistry of New Jersey 335 George Street, Suite 2200 New Brunswick, NJ 08903-2688 phone (732) 235-8611, fax (732) 235-9777 http://www.geocities.com/jg_liao __________________________________________________ Faith Hill - Exclusive Performances, Videos & More -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
This is an interesting case study. Method 3 and 4 only differ in how the indexes are obtained. Method 3 uses seq, which uses the primitive function c. Method 4 evaluates p^2 logical expression in R. When p is small, method 4 can beat method 3. But as p gets bigger, the time usage by method 3 only slightly increases, while the time usage by method 4 increasely dramatically. test <- function(nsim, p) { z1<- system.time(for (i in 1:nsim) seq(1, by=(p+1), length=p)) mat <- matrix(1, p, p) z2 <- system.time(for (i in 1:nsim) row(mat) == col(mat)) rbind(seq=z1, rowcol=z2) }> test(1000, 200)[,1] [,2] [,3] [,4] [,5] seq 0.21 0 0.21 0 0 rowcol 9.16 0 9.16 0 0> test(1000, 100)[,1] [,2] [,3] [,4] [,5] seq 0.17 0.01 0.19 0 0 rowcol 2.06 0.00 2.05 0 0> test(1000, 10)[,1] [,2] [,3] [,4] [,5] seq 0.16 0.00 0.16 0 0 rowcol 0.04 0.01 0.05 0 0 On Tue, 15 Oct 2002, Jason Liao wrote:> The following are four ways I coded for the variance of multinomial > distribution and their CPU time. It is a bit surprising that the 3rd > method beats the 4th one substantially. > > > var.multinomial1 = function(n, pi) > + { > + p = length(pi); > + var = array(0, c(p,p)); > + > + for(i in 1:p) > + for(j in 1:p) > + { > + if(i==j) var[i,j] = n*pi[i]*(1-pi[1]) > + else var[i,j] = - n*pi[i]*pi[j]; > + } > + } > > > > var.multinomial2 = function(n, pi) > + { > + n*( -pi%*%t(pi) + diag(pi) ) > + } > > > > var.multinomial3 = function(n, pi) > + { > + var = -pi%*%t(pi); > + var[row(var)==col(var)] = pi*(1-pi); > + n*var; > + } > > > > var.multinomial4 = function(n, pi) > + { > + var = -pi%*%t(pi); > + var[seq(1, by=(p+1), length=p)] = pi*(1-pi); > + n*var; > + } > > > > n = 100; > > pi = exp( rnorm(10)); > > pi = pi/sum(pi); > > n.simu = 1000; > > > > system.time(for(i in 1:n.simu) var = var.multinomial1(n, pi)) > [1] 4.50 0.04 99.73 NA NA > > system.time(for(i in 1:n.simu) var = var.multinomial2(n, pi)) > [1] 0.33 0.00 10.74 NA NA > > system.time(for(i in 1:n.simu) var = var.multinomial3(n, pi)) > [1] 0.14 0.00 4.17 NA NA > > system.time(for(i in 1:n.simu) var = var.multinomial4(n, pi)) > [1] 0.21 0.00 6.47 NA NA > > > > > > > But first, does anyone know a vectorized way of coding (summing the > outter product of rows) > > p = 10; > x = rnorm(p*p); > dim(x) = c(p.p); > > var = array(0, c(p,p)); > for(i in 1:p) var = var + x[i,]%*% t(x[i,]); > > ====> Jason G. Liao, Ph.D. > Division of Biometrics > University of Medicine and Dentistry of New Jersey > 335 George Street, Suite 2200 > New Brunswick, NJ 08903-2688 > phone (732) 235-8611, fax (732) 235-9777 > http://www.geocities.com/jg_liao > > __________________________________________________ > > Faith Hill - Exclusive Performances, Videos & More > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._