bbslover
2010-Dec-26 12:18 UTC
[R] how to replace my double for loop which is little efficient!
Dear all, My double for loop as follows, but it is little efficient, I hope all friends can give me a "vectorized" program to replace my code. thanks x: is a matrix 202*263, that is 202 samples, and 263 independent variables num.compd<-nrow(x); # number of compounds diss.all<-0 for( i in 1:num.compd) for (j in 1:num.compd) if (i!=j) { S1<-sum(x[i,]*x[j,]) S2<-sum(x[i,]^2) S3<-sum(x[j,]^2) sim2<-S1/(S2+S3-S1) diss2<-1-sim2 diss.all<-diss.all+diss2} it will cost a long time to finish this computation! i really need "rapid" code to replace my code. thanks kevin -- View this message in context: r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html Sent from the R help mailing list archive at Nabble.com.
Berend Hasselman
2010-Dec-26 14:13 UTC
[R] how to replace my double for loop which is little efficient!
bbslover wrote:> > x: is a matrix 202*263, that is 202 samples, and 263 independent > variables > > num.compd<-nrow(x); # number of compounds > diss.all<-0 > for( i in 1:num.compd) > for (j in 1:num.compd) > if (i!=j) { > S1<-sum(x[i,]*x[j,]) > S2<-sum(x[i,]^2) > S3<-sum(x[j,]^2) > sim2<-S1/(S2+S3-S1) > diss2<-1-sim2 > diss.all<-diss.all+diss2} > > it will cost a long time to finish this computation! i really need "rapid" > code to replace my code. >Alternative 1: j-loop only needs to start at i+1 so for( i in 1:num.compd) { for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) { S1<-sum(x[i,]*x[j,]) S2<-sum(x[i,]^2) S3<-sum(x[j,]^2) sim2<-S1/(S2+S3-S1) diss2<-1-sim2 diss2.all<-diss2.all+diss2 } } diss2.all <- 2 * diss2.all On my pc this is about twice as fast as your version (with 202 samples and 263 variables) Alternative 2: all sum() are not necessary. Use some matrix algebra: xtx <- x %*% t(x) diss3.all <- 0 for( i in 1:num.compd) { for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) { S1 <- xtx[i,j] S2 <- xtx[i,i] S3 <- xtx[j,j] sim2<-S1/(S2+S3-S1) diss2<-1-sim2 diss3.all<-diss3.all+diss2 } } diss3.all <- 2 * diss3.all This is about four times as fast as alternative 1. I'm quite sure that more expert R gurus can get some more speed up. Note: I generated the x matrix with: set.seed(1);x<-matrix(runif(202*263),nrow=202) (Timings on iMac 2.16Ghz and using 64-bit R) Berend -- View this message in context: r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164262.html Sent from the R help mailing list archive at Nabble.com.
Dennis Murphy
2010-Dec-27 06:06 UTC
[R] how to replace my double for loop which is little efficient!
Hi: On Sun, Dec 26, 2010 at 4:18 AM, bbslover <dluthm@yeah.net> wrote:> > Dear all, > > My double for loop as follows, but it is little efficient, I hope all > friends can give me a "vectorized" program to replace my code. thanks > > x: is a matrix 202*263, that is 202 samples, and 263 independent > variables > > num.compd<-nrow(x); # number of compounds > diss.all<-0 > for( i in 1:num.compd) > for (j in 1:num.compd) > if (i!=j) { >Isn't this just X'X?> S1<-sum(x[i,]*x[j,]) >Aren't each of S2 and S3 just diag(X'X)?> S2<-sum(x[i,]^2) >S3<-sum(x[j,]^2)> sim2<-S1/(S2+S3-S1) > diss2<-1-sim2 > diss.all<-diss.all+diss2} >I tried s1 <- crossprod(x) s2 <- diag(s1) s3 <-outer(s2, s2, '+') - s1 s1/s3 This yields a symmetric matrix with 1's along the diagonal and quantities between 0 and 1 in the off-diagonal. Something like it could conceivably be used as a similarity matrix. Is that what you're looking for with sim2? I agree with Berend: it looks like a problem that could be easily solved with some matrix algebra. R can do matrix algebra quite efficiently, y'know... (BTW, I tried this on a 1000 x 1000 input matrix: system.time(myfunc(x)) user system elapsed 0.99 0.02 1.02 I expect it could be improved by an order of magnitude if one actually knew what you were computing... ) HTH, Dennis it will cost a long time to finish this computation! i really need "rapid"> code to replace my code. > > thanks > > kevin > > > -- > View this message in context: > r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
bbslover
2010-Dec-27 06:33 UTC
[R] how to replace my double for loop which is little efficient!
thanks for your help, it is great. In addition, In the beginning, the format of x is dataframe, and i run my code, it is so slow, after your help, I change x for matirx, it is so quick. I am very grateful your kind help, and your code is so good! kevin -- View this message in context: r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html Sent from the R help mailing list archive at Nabble.com.
bbslover
2010-Dec-27 07:10 UTC
[R] how to replace my double for loop which is little efficient!
thanks for your help. I am sorry I do not full understand your code, so i can not correct using your code to my data. here is the attachment of my data, and what I want to compute is the equation in the word document of the attachment: the code form Berend can get the answer i want to get. r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar -- View this message in context: r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html Sent from the R help mailing list archive at Nabble.com.