The following are four ways I coded for the variance of multinomial
distribution and their CPU time. It is a bit surprising that the 3rd
method beats the 4th one substantially.
var.multinomial1 = function(n, pi)
+ {
+ p = length(pi);
+ var = array(0, c(p,p));
+
+ for(i in 1:p)
+ for(j in 1:p)
+ {
+ if(i==j) var[i,j] = n*pi[i]*(1-pi[1])
+ else var[i,j] = - n*pi[i]*pi[j];
+ }
+ }>
> var.multinomial2 = function(n, pi)
+ {
+ n*( -pi%*%t(pi) + diag(pi) )
+ }>
> var.multinomial3 = function(n, pi)
+ {
+ var = -pi%*%t(pi);
+ var[row(var)==col(var)] = pi*(1-pi);
+ n*var;
+ }>
> var.multinomial4 = function(n, pi)
+ {
+ var = -pi%*%t(pi);
+ var[seq(1, by=(p+1), length=p)] = pi*(1-pi);
+ n*var;
+ }>
> n = 100;
> pi = exp( rnorm(10));
> pi = pi/sum(pi);
> n.simu = 1000;
>
> system.time(for(i in 1:n.simu) var = var.multinomial1(n, pi))
[1] 4.50 0.04 99.73 NA NA> system.time(for(i in 1:n.simu) var = var.multinomial2(n, pi))
[1] 0.33 0.00 10.74 NA NA> system.time(for(i in 1:n.simu) var = var.multinomial3(n, pi))
[1] 0.14 0.00 4.17 NA NA> system.time(for(i in 1:n.simu) var = var.multinomial4(n, pi))
[1] 0.21 0.00 6.47 NA NA>
>
>
But first, does anyone know a vectorized way of coding (summing the
outter product of rows)
p = 10;
x = rnorm(p*p);
dim(x) = c(p.p);
var = array(0, c(p,p));
for(i in 1:p) var = var + x[i,]%*% t(x[i,]);
====Jason G. Liao, Ph.D.
Division of Biometrics
University of Medicine and Dentistry of New Jersey
335 George Street, Suite 2200
New Brunswick, NJ 08903-2688
phone (732) 235-8611, fax (732) 235-9777
http://www.geocities.com/jg_liao
__________________________________________________
Faith Hill - Exclusive Performances, Videos & More
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
t(X)%*%X or better crossprod(X) is the sum of outer products of the rows of X. The (i,j) entry is the sum of X[k,i]*X[k,j] over k, i.e., the (i,j) entry of the outer product of X[k, ] with itself. Reid Huntsinger -----Original Message----- From: Jason Liao [mailto:jg_liao at yahoo.com] Sent: Tuesday, October 15, 2002 12:38 PM To: r-help at stat.math.ethz.ch Subject: [R] case study of efficient R coding [...] But first, does anyone know a vectorized way of coding (summing the outter product of rows) p = 10; x = rnorm(p*p); dim(x) = c(p.p); var = array(0, c(p,p)); for(i in 1:p) var = var + x[i,]%*% t(x[i,]); ====Jason G. Liao, Ph.D. Division of Biometrics University of Medicine and Dentistry of New Jersey 335 George Street, Suite 2200 New Brunswick, NJ 08903-2688 phone (732) 235-8611, fax (732) 235-9777 http://www.geocities.com/jg_liao __________________________________________________ Faith Hill - Exclusive Performances, Videos & More -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
This is an interesting case study. Method 3 and 4 only differ in how the
indexes are obtained. Method 3 uses seq, which uses the primitive function
c. Method 4 evaluates p^2 logical expression in R. When p is small, method
4 can beat method 3. But as p gets bigger, the time usage by method 3 only
slightly increases, while the time usage by method 4 increasely
dramatically.
test <- function(nsim, p) {
z1<- system.time(for (i in 1:nsim) seq(1, by=(p+1), length=p))
mat <- matrix(1, p, p)
z2 <- system.time(for (i in 1:nsim) row(mat) == col(mat))
rbind(seq=z1, rowcol=z2)
}
> test(1000, 200)
[,1] [,2] [,3] [,4] [,5]
seq 0.21 0 0.21 0 0
rowcol 9.16 0 9.16 0 0> test(1000, 100)
[,1] [,2] [,3] [,4] [,5]
seq 0.17 0.01 0.19 0 0
rowcol 2.06 0.00 2.05 0 0> test(1000, 10)
[,1] [,2] [,3] [,4] [,5]
seq 0.16 0.00 0.16 0 0
rowcol 0.04 0.01 0.05 0 0
On Tue, 15 Oct 2002, Jason Liao wrote:
> The following are four ways I coded for the variance of multinomial
> distribution and their CPU time. It is a bit surprising that the 3rd
> method beats the 4th one substantially.
>
>
> var.multinomial1 = function(n, pi)
> + {
> + p = length(pi);
> + var = array(0, c(p,p));
> +
> + for(i in 1:p)
> + for(j in 1:p)
> + {
> + if(i==j) var[i,j] = n*pi[i]*(1-pi[1])
> + else var[i,j] = - n*pi[i]*pi[j];
> + }
> + }
> >
> > var.multinomial2 = function(n, pi)
> + {
> + n*( -pi%*%t(pi) + diag(pi) )
> + }
> >
> > var.multinomial3 = function(n, pi)
> + {
> + var = -pi%*%t(pi);
> + var[row(var)==col(var)] = pi*(1-pi);
> + n*var;
> + }
> >
> > var.multinomial4 = function(n, pi)
> + {
> + var = -pi%*%t(pi);
> + var[seq(1, by=(p+1), length=p)] = pi*(1-pi);
> + n*var;
> + }
> >
> > n = 100;
> > pi = exp( rnorm(10));
> > pi = pi/sum(pi);
> > n.simu = 1000;
> >
> > system.time(for(i in 1:n.simu) var = var.multinomial1(n, pi))
> [1] 4.50 0.04 99.73 NA NA
> > system.time(for(i in 1:n.simu) var = var.multinomial2(n, pi))
> [1] 0.33 0.00 10.74 NA NA
> > system.time(for(i in 1:n.simu) var = var.multinomial3(n, pi))
> [1] 0.14 0.00 4.17 NA NA
> > system.time(for(i in 1:n.simu) var = var.multinomial4(n, pi))
> [1] 0.21 0.00 6.47 NA NA
> >
> >
> >
> But first, does anyone know a vectorized way of coding (summing the
> outter product of rows)
>
> p = 10;
> x = rnorm(p*p);
> dim(x) = c(p.p);
>
> var = array(0, c(p,p));
> for(i in 1:p) var = var + x[i,]%*% t(x[i,]);
>
> ====> Jason G. Liao, Ph.D.
> Division of Biometrics
> University of Medicine and Dentistry of New Jersey
> 335 George Street, Suite 2200
> New Brunswick, NJ 08903-2688
> phone (732) 235-8611, fax (732) 235-9777
> http://www.geocities.com/jg_liao
>
> __________________________________________________
>
> Faith Hill - Exclusive Performances, Videos & More
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._