AURORA GONZALEZ VIDAL
2016-Aug-18 15:38 UTC
[R] optimize the filling of a diagonal matrix (two for loops)
Hello I have two for loops that I am trying to optimize... I looked for vectorization or for using some funcions of the apply family? but really cannot do it. I am writting my code with some small data set. With this size there is no problem but sometimes I will have hundreds of rows so it is really important to optimize the code. Any suggestion will be very welcomed. library("TSMining") dataS = data.frame(V1 = sample(c(1,2,3,4),30,replace = T), ?????????????????? V2 = sample(c(1,2,3,4),30,replace T), ?????????????????? V3 = sample(c(1,2,3,4),30,replace T), ?????????????????? V4 = sample(c(1,2,3,4),30,replace T)) saxM = Func.matrix(5) colnames(saxM) = 1:5 rownames(saxM) = 1:5 matrixPrepared = matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS)) FOR(I IN 1:(NROW(DATAS)-1)){ ? FOR(J IN (1+I):NROW(DATAS)){ ??? MATRIXPREPARED[I,J] = FUNC.DIST(AS.CHARACTER(DATAS[I,]), AS.CHARACTER(DATAS[J,]), SAXM, N=60) ? } } matrixPrepared Thank you! ------ Aurora Gonz?lez Vidal Phd student in Data Analytics for Energy Efficiency Faculty of Computer Sciences University of Murcia @. aurora.gonzalez2 at um.es T. 868 88 7866 www.um.es/ae [[alternative HTML version deleted]]
Thomas Mailund
2016-Aug-18 16:50 UTC
[R] optimize the filling of a diagonal matrix (two for loops)
? The nested for-loops could very easily be moved to Rcpp which should speed them up. Using apply functions instead of for-loops will not make it faster; they still have to do the same looping. At least, when I use `outer` to replace the loop I get roughly the same speed for the two versions ? although the `outer` solution does iterate over the entire matrix and not just the upper-triangular matrix. library(stringdist) # I don?t have TSmining library installed so I tested with this instead for_loop_test <- function() { ? matrixPrepared <- matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS)) ? for (i in 1:(nrow(dataS)-1)){ ? ? for (j in (1+i):nrow(dataS)){ ? ? ? matrixPrepared[i, j] <- stringdist(paste0(as.character(dataS[i,]), collapse=""), ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?paste0(as.character(dataS[j,]), collapse="")) ? ? } ? } ? matrixPrepared } apply_test <- function() { ? get_dist <- function(i, j) { ? ? if (i <= j) NA ? ? else stringdist(paste0(as.character(dataS[i,]), collapse=""), ? ? ? ? ? ? ? ? ? ? paste0(as.character(dataS[j,]), collapse="")) ? } ? get_dist <- Vectorize(get_dist) ? t(outer(1:nrow(dataS), 1:nrow(dataS), get_dist)) } library(microbenchmark) equivalent <- function(x, y) (is.na(x) && is.na(y)) || (x == y) check <- function(values) all(equivalent(values[[1]], values[[2]])) microbenchmark(for_loop_test(), apply_test(), check = check, times = 5) Cheers Thomas On 18 August 2016 at 17:41:01, AURORA GONZALEZ VIDAL (aurora.gonzalez2 at um.es(mailto:aurora.gonzalez2 at um.es)) wrote:> Hello > > I have two for loops that I am trying to optimize... I looked for > vectorization or for using some funcions of the apply family but really > cannot do it. I am writting my code with some small data set. With this > size there is no problem but sometimes I will have hundreds of rows so it > is really important to optimize the code. Any suggestion will be very > welcomed. > > library("TSMining") > dataS = data.frame(V1 = sample(c(1,2,3,4),30,replace = T), > V2 = sample(c(1,2,3,4),30,replace > T), > V3 = sample(c(1,2,3,4),30,replace > T), > V4 = sample(c(1,2,3,4),30,replace > T)) > saxM = Func.matrix(5) > colnames(saxM) = 1:5 > rownames(saxM) = 1:5 > matrixPrepared = matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS)) > > FOR(I IN 1:(NROW(DATAS)-1)){ > FOR(J IN (1+I):NROW(DATAS)){ > MATRIXPREPARED[I,J] = FUNC.DIST(AS.CHARACTER(DATAS[I,]), > AS.CHARACTER(DATAS[J,]), SAXM, N=60) > } > } > matrixPrepared > > Thank you! > > > ------ > Aurora Gonz?lez Vidal > Phd student in Data Analytics for Energy Efficiency > > Faculty of Computer Sciences > University of Murcia > > @. aurora.gonzalez2 at um.es > T. 868 88 7866 > www.um.es/ae > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.