thr3ads.net - R help - [R] hard to believe speed difference [Aug 2002]

If this information is useful, please help other people find it:
Share via:

Jason Liao

2002-Aug-06 20:12 UTC

[R] hard to believe speed difference

First, I love R and am grateful to be using this free and extremely
high quality software.

Recently I have been comparing two algorithms and naturally I
programmed in R first. It is so slow that I can almost feel its pain.
So I decided to do a comparison with Java. To draw 500,0000 truncated
normal, Java program takes 2 second and R takes 72 seconds. Not a
computer science major, I find it hard to understand how R can be so
slow.

R code:

normal.truncated <- function(mean, sd, lower, upper)
	 {  
	    repeat
            { 
		deviate = round( rnorm(1, mean, sd) );
                if(deviate<=upper && deviate>=lower) break;
            }
	    temp <- c(deviate+.5, deviate-.5, upper+.5, lower-.5);
	    result <- pnorm(temp, mean, sd);
	    prob <- (result[1] - result[2])/(result[3] - result[4]);

            return(c(deviate, prob));
     }

     print(date());

     n = 500000;
     result = numeric(n);

     for(i in 1:n) result[i] = normal.truncated(0,1, -10, 10)[1]
     print(date());

Java Code:

    package Mcdonald;
    import VisualNumerics.math.*;
    import java.util.*;

    public final class normal_univariate_truncated 
    {
       public double harvest, prob;
       static Random ran_obj = new Random();
       
       public normal_univariate_truncated(double mean, double var,
double lower, double upper)
       { 
          double sd = Math.sqrt(var);
          
          lower -= .5;
          upper += .5;
          double left = (lower - mean)/sd;
          double right = (upper - mean)/sd;
          double prob2 = Statistics.normalCdf(right) -
Statistics.normalCdf(left);
          
          while(true)
          {
              harvest = mean + ran_obj.nextGaussian()*sd;              
              if(harvest>lower && harvest<upper) break;
          }
          
          harvest = Sfun.nearestInteger(harvest);                 
          left = (harvest - .5 - mean)/sd;
          right = (harvest + .5 - mean)/sd;
          double prob1 = Statistics.normalCdf(right) -
Statistics.normalCdf(left);                  
                  
          prob = prob1/prob2;           
       }
          
       
       public static void main(String[] useless)
       {
           System.out.println(new Date());
           int n = 500000;
           double[] result = new double[n];
           for(int i=0; i<n; i++) 
           {
               normal_univariate_truncated obj = new
normal_univariate_truncated(0, 1, -10, 10);
               result[i] = obj.harvest;
           }
           System.out.println(new Date());
       }
           
    }


====Jason G. Liao, Ph.D.
Division of Biometrics
University of Medicine and Dentistry of New Jersey
335 George Street, Suite 2200
New Brunswick, NJ 08903-2688
phone (732) 235-8611, fax (732) 235-9777
http://www.geocities.com/jg_liao

__________________________________________________



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

ripley@stats.ox.ac.uk

2002-Aug-06 20:38 UTC

head link

[R] badly written code may be slow (was hard to believe speed difference)

If you write R like C or Java, it will often be slow.  R is a vector
language, so learn to work with it rather than against it.

For a truncated (standard) normal, try qnorm(runif(500000, a, b)) for
suitable a and b (pnorm(lower) and pnorm(upper)?)

And what are you going to do with 0.5m truncated normals that makes 70s
significant?

BTW, R has system.time() to time things.

On Tue, 6 Aug 2002, Jason Liao wrote:
> First, I love R and am grateful to be using this free and extremely
> high quality software.
>
> Recently I have been comparing two algorithms and naturally I
> programmed in R first. It is so slow that I can almost feel its pain.
> So I decided to do a comparison with Java. To draw 500,0000 truncated
> normal, Java program takes 2 second and R takes 72 seconds. Not a
> computer science major, I find it hard to understand how R can be so
> slow.
It's not the language ....
> R code:
>
> normal.truncated <- function(mean, sd, lower, upper)
> 	 {
> 	    repeat
>             {
> 		deviate = round( rnorm(1, mean, sd) );
>                 if(deviate<=upper && deviate>=lower) break;
>             }
> 	    temp <- c(deviate+.5, deviate-.5, upper+.5, lower-.5);
> 	    result <- pnorm(temp, mean, sd);
> 	    prob <- (result[1] - result[2])/(result[3] - result[4]);
>
>             return(c(deviate, prob));
>      }
>
>      print(date());
>
>      n = 500000;
>      result = numeric(n);
>
>      for(i in 1:n) result[i] = normal.truncated(0,1, -10, 10)[1]
>      print(date());
>
> Java Code:
>
>     package Mcdonald;
>     import VisualNumerics.math.*;
>     import java.util.*;
>
>     public final class normal_univariate_truncated
>     {
>        public double harvest, prob;
>        static Random ran_obj = new Random();
>
>        public normal_univariate_truncated(double mean, double var,
> double lower, double upper)
>        {
>           double sd = Math.sqrt(var);
>
>           lower -= .5;
>           upper += .5;
>           double left = (lower - mean)/sd;
>           double right = (upper - mean)/sd;
>           double prob2 = Statistics.normalCdf(right) -
> Statistics.normalCdf(left);
>
>           while(true)
>           {
>               harvest = mean + ran_obj.nextGaussian()*sd;
>               if(harvest>lower && harvest<upper) break;
>           }
>
>           harvest = Sfun.nearestInteger(harvest);
>           left = (harvest - .5 - mean)/sd;
>           right = (harvest + .5 - mean)/sd;
>           double prob1 = Statistics.normalCdf(right) -
> Statistics.normalCdf(left);
>
>           prob = prob1/prob2;
>        }
>
>
>        public static void main(String[] useless)
>        {
>            System.out.println(new Date());
>            int n = 500000;
>            double[] result = new double[n];
>            for(int i=0; i<n; i++)
>            {
>                normal_univariate_truncated obj = new
> normal_univariate_truncated(0, 1, -10, 10);
>                result[i] = obj.harvest;
>            }
>            System.out.println(new Date());
>        }
>
>     }
>
>
> ====> Jason G. Liao, Ph.D.
> Division of Biometrics
> University of Medicine and Dentistry of New Jersey
> 335 George Street, Suite 2200
> New Brunswick, NJ 08903-2688
> phone (732) 235-8611, fax (732) 235-9777
> http://www.geocities.com/jg_liao
>
> __________________________________________________
>
>
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

2002-Aug-06 20:56 UTC

head link

[R] hard to believe speed difference

On Tue, 6 Aug 2002, Jason Liao wrote:
> First, I love R and am grateful to be using this free and extremely
> high quality software.
>
> Recently I have been comparing two algorithms and naturally I
> programmed in R first. It is so slow that I can almost feel its pain.
> So I decided to do a comparison with Java. To draw 500,0000 truncated
> normal, Java program takes 2 second and R takes 72 seconds. Not a
> computer science major, I find it hard to understand how R can be so
> slow.
>
> R code:
>
> normal.truncated <- function(mean, sd, lower, upper)
> 	 {
> 	    repeat
>             {
> 		deviate = round( rnorm(1, mean, sd) );
>                 if(deviate<=upper && deviate>=lower) break;
>             }
> 	    temp <- c(deviate+.5, deviate-.5, upper+.5, lower-.5);
> 	    result <- pnorm(temp, mean, sd);
> 	    prob <- (result[1] - result[2])/(result[3] - result[4]);
>
>             return(c(deviate, prob));
>      }
>
>      print(date());
>
>      n = 500000;
>      result = numeric(n);
>
>      for(i in 1:n) result[i] = normal.truncated(0,1, -10, 10)[1]
>      print(date());
>
Largely because you aren't using vectorised code. Loops in R are slow (to
some extent true with all interpreters).  The way around this is to use
R's ability to work on vectors or matrices instead of looping over their
elements.

Eg
rtruncnorm<-function(n,mean,sd,lower,upper){

    rval<-numeric(n)
    done<-rep(FALSE,n)
    while(m<-sum(!done)){
        rval[!done] <- round(rnorm(m,mean,sd))
        done <- rval>=lower & rval<=upper
    }

   
prob<-(pnorm(rval+0.5,mean,sd)-pnorm(rval-0.5,mean,sd))/(pnorm(upper+0.5,mean,sd)-pnorm(lower-0.5))
    cbind(rval,prob)

}

rtruncnorm2<-function(n,mean,sd,lower,upper){

    uval<-runif(n,pnorm(lower-0.5,mean,sd),pnorm(upper+0.5,mean,sd))
    rval<-round(qnorm(uval,mean,sd))
   
prob<-(pnorm(rval+0.5,mean,sd)-pnorm(rval-0.5,mean,sd))/(pnorm(upper+0.5,mean,sd)-pnorm(lower-0.5))
    cbind(rval,prob)

}

are two improved versions of your code that run more than 20 times
faster on my computer.
The first uses the same rejection sampling algorithm that you use, but
does it for the whole vector at once.

The second generates truncated uniforms (which is trivial) and transforms
them to Normal.

In addition to being a lot faster than your code I think they will handle
the case where mean, sd, lower or upper are also vectors.

Even faster, if you only want whole numbers and always the same
parameters, is to compute the probabilities once and then sample from the
resulting discrete distribution.

rtruncnorm3<-function(n,mean,sd,lower,upper){
	p<-diff(pnorm((lower:(1+upper))-0.5,mean,sd))
	sample(lower:upper,n,replace=TRUE,prob=p)
}
This is about 400 times faster than your version (and the speed advantage
would increase if the truncation points were less extreme), apparently
faster than the Java program.

By the way, there's no point truncating standard Normals to [-10,10]. The
probability of getting anything outside [-10,10] in 500000 samples is
about 10^-17. Even for [-6, 6] it's less than 0.001.


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Marc Schwartz

2002-Aug-12 13:41 UTC

head link

[R] Attaching marginal summary plots to the main matrix plot

> Dear helplist, I am trying to visualise a datamatrix together with its
> marginal densities.
> Suppose dat <- matrix( rnorm(1000), nrow=25 )
> 
> I can the main visual plot using image, levelplot etc. But how do I
attach> the plots for rows sums and column sums to the top/bottom and right
hand> side of the original plot ?
> 
> The only way I can think of is by using par(mfrow=c(2,2)) command but
the> result is not desirable. Is there are useful command to do this ? If
> succesful I am thinking of affixing the boxplots by row and columns as
well.> I have gone through the lattice help but am still confused. Thank you.
If I correctly understand what you are trying to do, the way to do this
would be to use par("fig"), which adjusts the size of the plot region
within the current device.

par(mfrow) and par(mfcol) result in the device being split up into
equally sized plot regions, which does not achieve what I think you
want.

Below is some code as an example.  I am using barplot() for the row and
column sum plots in the example.  This may or may not be how you want
these plotted.

I will leave the fine tuning of the plot types, sizes, axis ranges,
labels, row/column alignments, etc. for you. This should give you a
conceptual feel for how to go about it.

HTH.

Marc

------------------

#define matrix
dat <- matrix(rnorm(1000), nrow = 25)

# save the current parameters 
op <- par(no.readonly = TRUE) 


# open a new plot window 
plot.new() 


# adjust the plot region size for each area
# the LL and UR corners of the default region are (0,0) and (1,1) 
# par("fig") is (x1, x2, y1, y2) 


# first set the region for the larger plot at the bottom 
# note that the dimensions will overlap a bit in each region
par(fig = c(0.0, 0.75, 0.0, 0.75)) 


# now draw the larger plot 
image(dat)


# do not overwrite the current plot 
par(new = TRUE) 


# now set the second region for the small plot at the top 
par(fig = c(0.0, 0.75, .65, 1.0)) 

# plot the colSums
barplot(colSums(dat))


# do not overwrite the current plot 
par(new = TRUE) 


# now set the third region for the small plot to the right 
par(fig = c(0.65, 1.0, 0.0, 0.75)) 

# plot the rowSums
barplot(rowSums(dat), horiz = TRUE)


# restore the original parameters 
par(op) 



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Aug 2002 - hard to believe speed difference

[R] hard to believe speed difference

[R] badly written code may be slow (was hard to believe speed difference)

[R] hard to believe speed difference

[R] Attaching marginal summary plots to the main matrix plot

Apparently Analagous Threads