Dear Jim,
Here is again exactly what I did and with the output of Rprof (with this
reduced dataset and with a simpler function, it is here much faster than
in real life).
Thanks you again for your help!
## CODE ##
mydata1<- structure(list(species = structure(1:8, .Label =
c("alsen","gogor", "loalb", "mafas",
"pacyn", "patro", "poabe",
"thgel"), class = "factor"), fruit = c(0.52, 0.45, 0.43,
0.82, 0.35,
0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735,
160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv =
c(47068.1437773483, 43743.8087431582, 40323.5209129239,
23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625,
41747.6053810881)), .Names = c("species", "fruit",
"Asfc", "Tfv"),
row.names = c(NA, 8L), class = "data.frame")
mydata2<- mydata1[!(mydata1$species %in%
c("thgel","alsen")),]
mydata3<- mydata1[!(mydata1$species %in%
c("thgel","alsen","poabe")),]
mydata_list<- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)
library(WRS)
foo_reg<- function(dat, xvar, yvar, mycol, pos, name.dat){
tsts<- tstsreg(dat[[xvar]], dat[[yvar]])
tsts_inter<- signif(tsts$coef[1], digits=3)
tsts_slope<- signif(tsts$coef[2], digits=3)
abline(tsts$coef, lty=1, col=mycol)
legend(x=pos, legend=c(paste("TSTS ",name.dat,":
Y=",tsts_inter,"+",tsts_slope,"X",sep="")),
lty=1, col=mycol)
}
ind.xvar<- 2
seq.yvar<- 3:4
mypos<- c("topleft", "topright","bottomleft")
par(mfrow=c(2,1))
Rprof()
for (i in seq_along(seq.yvar)){
k<- seq.yvar[i]
plot(mydata1[[k]]~mydata1[[ind.xvar]], type="p",
xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
for (j in seq_along(mydata_list)){
foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
pos=mypos[j], name.dat=names(mydata_list)[j])
}
}
Rprof(NULL)
summaryRprof()
$by.self
self.time self.pct total.time total.pct
pt 0.04 18.18 0.04 18.18
plot 0.02 9.09 0.08 36.36
sc 0.02 9.09 0.08 36.36
mean 0.02 9.09 0.04 18.18
| 0.02 9.09 0.02 9.09
axis 0.02 9.09 0.02 9.09
box 0.02 9.09 0.02 9.09
ifelse 0.02 9.09 0.02 9.09
plot.new 0.02 9.09 0.02 9.09
sort 0.02 9.09 0.02 9.09
$by.total
total.time total.pct self.time self.pct
foo_reg 0.14 63.64 0.00 0.00
tstsreg 0.12 54.55 0.00 0.00
plot 0.08 36.36 0.02 9.09
sc 0.08 36.36 0.02 9.09
do.call 0.06 27.27 0.00 0.00
plot.default 0.06 27.27 0.00 0.00
plot.formula 0.06 27.27 0.00 0.00
pt 0.04 18.18 0.04 18.18
mean 0.04 18.18 0.02 9.09
corfun 0.04 18.18 0.00 0.00
median 0.04 18.18 0.00 0.00
median.default 0.04 18.18 0.00 0.00
tsreg 0.04 18.18 0.00 0.00
| 0.02 9.09 0.02 9.09
axis 0.02 9.09 0.02 9.09
box 0.02 9.09 0.02 9.09
ifelse 0.02 9.09 0.02 9.09
plot.new 0.02 9.09 0.02 9.09
sort 0.02 9.09 0.02 9.09
Axis 0.02 9.09 0.00 0.00
Axis.default 0.02 9.09 0.00 0.00
legend 0.02 9.09 0.00 0.00
localAxis 0.02 9.09 0.00 0.00
localBox 0.02 9.09 0.00 0.00
par 0.02 9.09 0.00 0.00
rect 0.02 9.09 0.00 0.00
rect2 0.02 9.09 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 0.22
sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] WRS_0.12.1 robustbase_0.6-2 akima_0.5-4 MASS_7.3-11
-------- Message original --------
Sujet: Re: [R] speed up process
Date : Fri, 25 Feb 2011 13:42:58 -0500
De : jim holtman <jholtman at gmail.com>
Pour : ivan.calandra at uni-hamburg.de
What did the output from Rprof look like? How much time is the rest
of the script taking in proportion of foo_reg? That would indicate if
it is worth it to spend the time in trying to improve it. Time to
change the code vs. the number of time that you expect to run it with
its current "slowness"?
On Fri, Feb 25, 2011 at 8:38 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:> Ha... it was way too simple!
> I thought it would be like system.time()... my bad. Thanks for the tip!
>
> As we thought, foo_reg() takes most of the computing time, and I cannot
> improve that.
> Any ideas of how to improve the rest?
>
> Thanks again for your help
> Ivan
>
>
> Le 2/25/2011 14:29, jim holtman a ?crit :
>>
>> You invoke Rprof, run your code and then terminate it:
>>
>>
>> Rprof()
>> ....... code you want to profile
>> Rprof(NULL) # generate output
>> summaryRprof()
>>
>> example:
>>
>>
>>> Rprof()
>>> for (i in 1:1e6) sin(i) + cos(i) + sqrt(i)
>>> Rprof(NULL)
>>> summaryRprof()
>>
>> $by.self
>> self.time self.pct total.time total.pct
>> sin 0.24 30.77 0.24 30.77
>> sqrt 0.22 28.21 0.22 28.21
>> cos 0.16 20.51 0.16 20.51
>> + 0.14 17.95 0.14 17.95
>> : 0.02 2.56 0.02 2.56
>>
>> $by.total
>> total.time total.pct self.time self.pct
>> sin 0.24 30.77 0.24 30.77
>> sqrt 0.22 28.21 0.22 28.21
>> cos 0.16 20.51 0.16 20.51
>> + 0.14 17.95 0.14 17.95
>> : 0.02 2.56 0.02 2.56
>>
>> $sample.interval
>> [1] 0.02
>>
>> $sampling.time
>> [1] 0.78
>>
>>
>> On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra
>> <ivan.calandra at uni-hamburg.de> wrote:
>>>
>>> Dear Jim,
>>>
>>> I've tried to use Rprof() as you advised me, but I don't
understand how
>>> it
>>> works.
>>> I've done this:
>>> Rprof(for (i in seq_along(seq.yvar)){
>>> all_my_commands
>>> })
>>> summaryRprof()
>>>
>>> But I got this error:
>>> Error in summaryRprof() : no lines found in ?Rprof.out?
>>>
>>> I couldn't really understand from the help page what I should
do.
>>>
>>> In any case, it's sure that the function tstsreg(), is what
takes the
>>> most
>>> computing time. But I wanted to optimize the rest of the code to
gain as
>>> much speed as possible.
>>>
>>> Ivan
>>>
>>> Le 2/25/2011 12:30, Jim Holtman a ?crit :
>>>>
>>>> use Rprof to find where time is being spent. probably in
'plot' which
>>>> might imply it is not the 'for' loop and therefore
beyond your control.
>>>>
>>>> Sent from my iPad
>>>>
>>>> On Feb 25, 2011, at 6:19, Ivan Calandra<ivan.calandra at
uni-hamburg.de>
>>>> wrote:
>>>>
>>>>> Thanks Nick for your quick answer.
>>>>> It does work (no missed bracket!) but unfortunately
doesn't really
>>>>> speed
>>>>> up anything: with my real data, it takes 82.78 seconds
with the double
>>>>> lapply() instead of 83.59s with the double loop (about 0.8
s).
>>>>>
>>>>> It looks like my double loop was not that bad. Does anyone
know another
>>>>> faster way to do this?
>>>>>
>>>>> Thanks again in advance,
>>>>> Ivan
>>>>>
>>>>> Le 2/25/2011 11:41, Nick Sabbe a ?crit :
>>>>>>
>>>>>> Simply avoiding the for loops by using lapply (I may
have missed a
>>>>>> bracket
>>>>>> here or there cause I did this without opening R)...
>>>>>> Haven't checked the speed up, though.
>>>>>>
>>>>>> lapply(seq.yvar, function(k){
>>>>>> plot(mydata1[[k]]~mydata1[[ind.xvar]],
type="p",
>>>>>> xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
>>>>>> lapply(seq_along(mydata_list), function(j){
>>>>>> foo_reg(dat=mydata_list[[j]], xvar=ind.xvar,
yvar=k, mycol=j,
>>>>>> pos=mypos[j], name.dat=names(mydata_list)[j])
>>>>>> return(NULL)
>>>>>> })
>>>>>> invisible(NULL)
>>>>>> })
>>>>>>
>>>>>> HTH,
>>>>>>
>>>>>> Nick Sabbe
>>>>>> --
>>>>>> ping: nick.sabbe at ugent.be
>>>>>> link: http://biomath.ugent.be
>>>>>> wink: A1.056, Coupure Links 653, 9000 Gent
>>>>>> ring: 09/264.59.36
>>>>>>
>>>>>> -- Do Not Disapprove
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: r-help-bounces at r-project.org
>>>>>> [mailto:r-help-bounces at r-project.org]
>>>>>> On
>>>>>> Behalf Of Ivan Calandra
>>>>>> Sent: vrijdag 25 februari 2011 11:20
>>>>>> To: r-help
>>>>>> Subject: [R] speed up process
>>>>>>
>>>>>> Dear users,
>>>>>>
>>>>>> I have a double for loop that does exactly what I
want, but is quite
>>>>>> slow. It is not so much with this simplified example,
but IRL it is
>>>>>> slow.
>>>>>> Can anyone help me improve it?
>>>>>>
>>>>>> The data and code for foo_reg() are available at the
end of the email;
>>>>>> I
>>>>>> preferred going directly into the problematic part.
>>>>>> Here is the code (I tried to simplify it but I cannot
do it too much
>>>>>> or
>>>>>> else it wouldn't represent my problem). It might
also look too complex
>>>>>> for what it is intended to do, but my colleagues who
are also supposed
>>>>>> to use it don't know much about R. So I wrote it
so that they don't
>>>>>> have
>>>>>> to modify the critical parts to run the script for
their needs.
>>>>>>
>>>>>> #column indexes for function
>>>>>> ind.xvar<- 2
>>>>>> seq.yvar<- 3:4
>>>>>> #position vector for legend(), stupid positioning but
it doesn't
>>>>>> matter
>>>>>> here
>>>>>> mypos<- c("topleft",
"topright","bottomleft")
>>>>>>
>>>>>> #run the function for columns 3&4 as y (seq.yvar)
with column 2 as x
>>>>>> (ind.xvar) for all 3 datasets (mydata_list)
>>>>>> par(mfrow=c(2,1))
>>>>>> for (i in seq_along(seq.yvar)){
>>>>>> k<- seq.yvar[i]
>>>>>> plot(mydata1[[k]]~mydata1[[ind.xvar]],
type="p",
>>>>>> xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
>>>>>> for (j in seq_along(mydata_list)){
>>>>>> foo_reg(dat=mydata_list[[j]], xvar=ind.xvar,
yvar=k, mycol=j,
>>>>>> pos=mypos[j], name.dat=names(mydata_list)[j])
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> I tried with lapply() or mapply() but couldn't
manage to pass the
>>>>>> arguments for names() and col= correctly, e.g. for the
2nd loop:
>>>>>> lapply(mydata_list, FUN=function(x){foo_reg(dat=x,
xvar=ind.xvar,
>>>>>> yvar=k, col1=1:3, pos=mypos[1:3],
name.dat=names(x)[1:3])})
>>>>>> mapply(FUN=function(x) {foo_reg(dat=x,
name.dat=names(x)[1:3])},
>>>>>> mydata_list, col1=1:3, pos=mypos,
MoreArgs=list(xvar=ind.xvar,
>>>>>> yvar=k))
>>>>>>
>>>>>> Thanks in advance for any hints.
>>>>>> Ivan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> #create data (it looks horrible with these datasets
but it doesn't
>>>>>> matter here)
>>>>>> mydata1<- structure(list(species = structure(1:8,
.Label = c("alsen",
>>>>>> "gogor", "loalb",
"mafas", "pacyn", "patro", "poabe",
"thgel"), class
>>>>>> >>>>>> "factor"), fruit =
c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc
>>>>>> >>>>>> c(207.463765, 138.5533755,
70.4391735, 160.9742745, 41.455809,
>>>>>> 119.155109, 26.241441, 148.337377), Tfv =
c(47068.1437773483,
>>>>>> 43743.8087431582, 40323.5209129239, 23420.9455581495,
>>>>>> 29382.6947428651,
>>>>>> 50460.2202192311, 21810.1456510625,
41747.6053810881)), .Names >>>>>> c("species",
"fruit", "Asfc", "Tfv"), row.names = c(NA, 8L),
class >>>>>> "data.frame")
>>>>>>
>>>>>> mydata2<- mydata1[!(mydata1$species %in%
c("thgel","alsen")),]
>>>>>> mydata3<- mydata1[!(mydata1$species %in%
c("thgel","alsen","poabe")),]
>>>>>> mydata_list<- list(mydata1=mydata1,
mydata2=mydata2, mydata3=mydata3)
>>>>>>
>>>>>> #function for regression
>>>>>> library(WRS)
>>>>>> foo_reg<- function(dat, xvar, yvar, mycol, pos,
name.dat){
>>>>>> tsts<- tstsreg(dat[[xvar]], dat[[yvar]])
>>>>>> tsts_inter<- signif(tsts$coef[1], digits=3)
>>>>>> tsts_slope<- signif(tsts$coef[2], digits=3)
>>>>>> abline(tsts$coef, lty=1, col=mycol)
>>>>>> legend(x=pos, legend=c(paste("TSTS
",name.dat,":
>>>>>>
Y=",tsts_inter,"+",tsts_slope,"X",sep="")),
lty=1, col=mycol)
>>>>>> }
>>>>>>
>>>>> --
>>>>> Ivan CALANDRA
>>>>> PhD Student
>>>>> University of Hamburg
>>>>> Biozentrum Grindel und Zoologisches Museum
>>>>> Abt. S?ugetiere
>>>>> Martin-Luther-King-Platz 3
>>>>> D-20146 Hamburg, GERMANY
>>>>> +49(0)40 42838 6231
>>>>> ivan.calandra at uni-hamburg.de
>>>>>
>>>>> **********
>>>>> http://www.for771.uni-bonn.de
>>>>> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>
>>> --
>>> Ivan CALANDRA
>>> PhD Student
>>> University of Hamburg
>>> Biozentrum Grindel und Zoologisches Museum
>>> Abt. S?ugetiere
>>> Martin-Luther-King-Platz 3
>>> D-20146 Hamburg, GERMANY
>>> +49(0)40 42838 6231
>>> ivan.calandra at uni-hamburg.de
>>>
>>> **********
>>> http://www.for771.uni-bonn.de
>>> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>>>
>>>
>>
>>
>
> --
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. S?ugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calandra at uni-hamburg.de
>
> **********
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?