R help - Dec 2010 - how to replace my double for loop which is little efficient!

If this information is useful, please help other people find it:
Share via:

bbslover

2010-Dec-26 12:18 UTC

[R] how to replace my double for loop which is little efficient!

Dear all,

My double for loop as follows, but it is little efficient, I hope all
friends can give me a "vectorized" program to replace my code. thanks

x: is a matrix  202*263,  that is 202 samples, and 263 independent variables

num.compd<-nrow(x); # number of compounds
diss.all<-0
for( i in 1:num.compd)
   for (j in 1:num.compd)
      if (i!=j) {
        S1<-sum(x[i,]*x[j,])
        S2<-sum(x[i,]^2)
        S3<-sum(x[j,]^2)
        sim2<-S1/(S2+S3-S1)
        diss2<-1-sim2
        diss.all<-diss.all+diss2}

it will cost a long time to finish this computation! i really need
"rapid"
code to replace my code.

thanks

kevin


-- 
View this message in context:
r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
Sent from the R help mailing list archive at Nabble.com.

Berend Hasselman

2010-Dec-26 14:13 UTC

[R] how to replace my double for loop which is little efficient!

bbslover wrote:> 
> x: is a matrix  202*263,  that is 202 samples, and 263 independent
> variables
> 
> num.compd<-nrow(x); # number of compounds
> diss.all<-0
> for( i in 1:num.compd)
>    for (j in 1:num.compd)
>       if (i!=j) {
>         S1<-sum(x[i,]*x[j,])
>         S2<-sum(x[i,]^2)
>         S3<-sum(x[j,]^2)
>         sim2<-S1/(S2+S3-S1)
>         diss2<-1-sim2
>         diss.all<-diss.all+diss2}
> 
> it will cost a long time to finish this computation! i really need
"rapid"
> code to replace my code.
> 
Alternative 1:  j-loop only needs to start at i+1 so

for( i in 1:num.compd) {
    for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
            S1<-sum(x[i,]*x[j,])
            S2<-sum(x[i,]^2)
            S3<-sum(x[j,]^2)
            sim2<-S1/(S2+S3-S1)
            diss2<-1-sim2
            diss2.all<-diss2.all+diss2
    }
}
diss2.all <- 2 * diss2.all

On my pc this is about twice as fast as your version (with 202 samples and
263 variables)

Alternative 2: all sum() are not necessary. Use some matrix algebra:

xtx <- x %*% t(x)
diss3.all <- 0
for( i in 1:num.compd) {
    for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
            S1 <- xtx[i,j]
            S2 <- xtx[i,i]
            S3 <- xtx[j,j]
            sim2<-S1/(S2+S3-S1)
            diss2<-1-sim2
            diss3.all<-diss3.all+diss2
    }
}
diss3.all <- 2 * diss3.all

This is about four times as fast as alternative 1.

I'm quite sure that more expert R gurus can get some more speed up.

Note: I generated the x matrix with:
set.seed(1);x<-matrix(runif(202*263),nrow=202)
(Timings on iMac 2.16Ghz and using 64-bit R)

Berend

-- 
View this message in context:
r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164262.html
Sent from the R help mailing list archive at Nabble.com.

Dennis Murphy

2010-Dec-27 06:06 UTC

[R] how to replace my double for loop which is little efficient!

Hi:


On Sun, Dec 26, 2010 at 4:18 AM, bbslover <dluthm@yeah.net> wrote:
>
> Dear all,
>
> My double for loop as follows, but it is little efficient, I hope all
> friends can give me a "vectorized" program to replace my code.
thanks
>
> x: is a matrix  202*263,  that is 202 samples, and 263 independent
> variables
>
> num.compd<-nrow(x); # number of compounds
> diss.all<-0
> for( i in 1:num.compd)
>   for (j in 1:num.compd)
>      if (i!=j) {
>
Isn't this just X'X?
>        S1<-sum(x[i,]*x[j,])
>Aren't each of S2 and S3 just diag(X'X)?
>        S2<-sum(x[i,]^2)
>
       S3<-sum(x[j,]^2)>        sim2<-S1/(S2+S3-S1)
>        diss2<-1-sim2
>        diss.all<-diss.all+diss2}
>
I tried
s1 <- crossprod(x)
s2 <- diag(s1)
s3 <-outer(s2, s2, '+') - s1
s1/s3

This yields a symmetric matrix with 1's along the diagonal and quantities
between 0 and 1 in the off-diagonal. Something like it could conceivably be
used as a similarity matrix. Is that what you're looking for with sim2?

I agree with Berend: it looks like a problem that could be easily solved
with some matrix algebra. R can do matrix algebra quite efficiently,
y'know...

(BTW, I tried this on a 1000 x 1000 input matrix:
system.time(myfunc(x))
   user  system elapsed
   0.99    0.02    1.02

I expect it could be improved by an order of magnitude if one actually knew
what you were computing... )

HTH,
Dennis

it will cost a long time to finish this computation! i really need
"rapid"> code to replace my code.
>
> thanks
>
> kevin
>
>
> --
> View this message in context:
>
r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

bbslover

2010-Dec-27 06:33 UTC

[R] how to replace my double for loop which is little efficient!

thanks for your help, it is great. In addition, In the beginning, the format
of x is dataframe, and i run my code, it is so slow, after your help, I
change x for matirx, it is so quick. I am very grateful your kind help, and
your code is so good!

kevin
-- 
View this message in context:
r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html
Sent from the R help mailing list archive at Nabble.com.

bbslover

2010-Dec-27 07:10 UTC

[R] how to replace my double for loop which is little efficient!

thanks for your help. I am sorry I do not full understand your code, so i can
not correct using your code to my data. here is the attachment of my data,
and what I want to compute is the equation in the word document of the
attachment:

the code form Berend can get the answer i want to get.

r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar 


-- 
View this message in context:
r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html
Sent from the R help mailing list archive at Nabble.com.

Seemingly Similar Threads

Search for more possibly parallel threads