I'm doing m <- match(matriz, origen, 0) where matriz is a 270x900 matrix and origen a 11675 elements vector, and is taking a very long time. Is match a function implemented in C? If not, would a C code be faster? Thanks Agus Dr. Agustin Lobo Instituto de Ciencias de la Tierra (CSIC) Lluis Sole Sabaris s/n 08028 Barcelona SPAIN tel 34 93409 5410 fax 34 93411 0012 alobo at ija.csic.es -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, 20 Nov 2001, Agustin Lobo wrote:> > I'm doing > > m <- match(matriz, origen, 0) > > where matriz is a 270x900 matrix and > origen a 11675 elements vector, and is taking > a very long time. > > Is match a function > implemented in C? If not, would a C > code be faster?Well, typing the function name at the R prompt gives R> match function (x, table, nomatch = NA, incomparables = FALSE) { if (!is.logical(incomparables) || incomparables) .NotYetUsed("incomparables != FALSE") .Internal(match(if (is.factor(x)) as.character(x) else x, if (is.factor(table)) as.character(table) else table, nomatch)) showing that it is .Internal and thus in compiled C code. Looking at src/main/unique.c reveals that it is implemented by sticking `table' in a hash table and looking up each element of x, which is a pretty good algorithm for this problem. If the hash function is good it will take about length(table)+length(x) hash computations, and you won't be able to beat that easily. I don't even find it that slow> matriz<-matrix(rnorm(270*900),ncol=900) > origen<-rnorm(11675) > system.time(match(matriz,origen,0))[1] 0.27 0.01 0.33 0.00 0.00 or with a lot of matches> matriz<-matrix(sample(270*900,1:20,TRUE),ncol=900) > origen<-1:11675 > system.time(match(matriz,origen,0))[1] 0.01 0.00 0.01 0.00 0.00 -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, 20 Nov 2001, Agustin Lobo wrote:> > I'm doing > > m <- match(matriz, origen, 0) > > where matriz is a 270x900 matrix and > origen a 11675 elements vector, and is taking > a very long time. > > Is match a function > implemented in C? If not, would a CAll of R is implemented in C or Fortran, ultimately. But you could do> matchfunction (x, table, nomatch = NA, incomparables = FALSE) { if (!is.logical(incomparables) || incomparables) .NotYetUsed("incomparables != FALSE") .Internal(match(if (is.factor(x)) as.character(x) else x, if (is.factor(table)) as.character(table) else table, nomatch)) } to see that it is a direct call to an internal function, and they are in C.> code be faster?The internal C code (do_match in src/main/unique.c) uses hashing, so unless that is not doing a good job on your particular data it ought to be about as fast as possible. You could have looked at the source code in the same way I did: that's the beauty of an open-source system. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._