First thing to do is to use Rprof to determine where the time is being
spent and then you can pinpoint what section of code is taking the
time. A quick look says to do all your subsetting at once. You might
look into using 'split' to create the subsets and then access the
subsets with "[[...]]". So use Rprof and see if most of the time is
being spent the the data.frame subsetting functions.
On 8/1/07, S?bastien <pomchip at free.fr> wrote:> Dear R-users,
>
> I have written the following code to generate some trellis plots. It
> works perfectly fine except that it is quite slow when it is apply to my
> typical datasets (over several thousands of lines). I believe the
> problem comes from the loops I am using to subset my data.frame. I read
> in the archives that the tapply function is often more efficient than a
> loop in R. Unfortunately , it seems that I am not enough familiar with
> the "philosophy" of this function to implement it in my code.
Would you
> have some suggestions to speed up the whole thing?
>
> Thanks in advance
>
> Sebastien
>
> PS: the rationale behind these loops is to split the trellis plots on
> different pages, all plots on a page (or a group of pages) having a
> given combination of values for the PLOT, DVID, PER and GRP parameters.
>
> ###############################################
>
> library(lattice)
> rm(list=ls(all=TRUE))
>
> # Generate a dummy dataset with
> # - 20 individuals (ID)
> # - individuals 1 to 10 belong to group (GRP) 1, 11 to 20 belong to group 2
> # - measurements (DV) done at 10 time points (TIME) per individuals on 2
> occassions (OCC)
> # - modelisation of the DV versus TIME relationships with 4 different
> models (MODEL)
> # - predicted values (Y)
> # - the PLOT column serves as a flag to plot together the models (A and
> B) and (C and D)
>
> PLOT<-rep(1:2,each=40,times=20)
> ID<-rep(1:20,each=80)
> OCC<-rep(1:2,each=10,times=80)
> GRP<-as.numeric(rep(gl(2,80),times=10))
>
MODEL<-as.vector(rep(gl(4,20,label=c("A","B","C","D")),times=20))
> TIME<-rep.int(1:10,160)
> DV<-OCC*(1:10)*rep(rnorm(20,50,10),each=80)+rep(rnorm(20,10,1),each=80)
> Y<-jitter(DV)
>
> mydata<-data.frame(PLOT,ID,OCC,GRP,MODEL,TIME,DV,Y)
>
> mydata$DVID<-rep.int(1,1600) #in a real dataset, DVID could have
> typically 2 to 3 levels
>
> #############
> # Plotting routine
> #############
>
> myPath<-"C:/" #TO BE MODIFIED
>
> nTrellisCol<-2 #number of columns per
> Trellis plot
> nTrellisRow<-3 #number of lines per
> Trellis plot
>
> nDVID<-nlevels(factor(mydata$DVID)) #number of
> DVID=observations types
> nidPlot<-nlevels(factor(mydata$PLOT)) #number of items in the
> PLOT column
> nPer<-nlevels(factor(mydata$OCC)) #number of occassions
> (OCC, PER, etc...)
> nGRP<-nlevels(factor(mydata$GRP)) #number of groups
>
> pdf(file=paste(myPath,"test.pdf",sep=""))
>
> trellis.par.set(par.main.text=list(cex=1))
> trellis.par.set(par.ylab.text=list(font=2))
> trellis.par.set(par.xlab.text=list(font=2))
>
> for (i in 1:nidPlot) {
> #loop on PLOT id
> # i=1
> idata<-subset(mydata,mydata$PLOT==i)
>
> for (j in 1:nDVID) {
> #loop on DVID
> # j=1
> ijdata<-subset(idata,idata$DVID==j)
>
> for (k in 1:nPer) {
> #loop on Period
> # k=1
>
> ijkdata<-subset(ijdata,ijdata$OCC==k)
>
> for (l in 1:nGRP) { #loop on Group
> # l=1
>
> subdata<-subset(ijkdata,ijkdata$GRP==l)
>
> nModel<-nlevels(factor(subdata$MODEL))
> #number of models to be plotted in this loop
> mylegend<-c("Raw data",levels(factor(subdata$MODEL)))
> subID<-nlevels(factor(subdata$ID)) #number of
> ID in the new dataset
>
> myplot<-xyplot(Y ~ TIME | ID, #creates
plot
> data = subdata, type = "l",
> groups = MODEL,
> observed = subdata$DV,
> as.table=TRUE,
> panel = function(x, y, ..., subscripts, observed) {
> panel.points(x, pch=3,col=1,observed[subscripts])
> panel.xyplot(x, y, ...,
> col=2:nlevels(subdata$MODEL),subscripts = subscripts)},
>
> strip=function (which.panel,...){
> col<-rep("Black",subID)
> llines(c(0,1,1,0,0),c(0,0,1,1,0),col.line=1)
> ltext(rep(0.5,subID),rep(0.5,subID),
> paste("Subject
>
",levels(factor(subdata$ID))[which.panel],sep=""),cex=trellis.par.get("axis.text")[2])},
>
> key=list(space="bottom",
> lines = list(pch = as.integer(c(3,rep("",nModel))),
type
> = c("p", gl(1,nModel,label="l")),
> col >
1:(nModel+1),"cex"=trellis.par.get("axis.text")[2]),
> text=list(mylegend,
> "cex"=trellis.par.get("axis.text")[2])),
>
> xlab="Time (hr)",
> ylab="Concentration (ng/mL)",
> layout=c(nTrellisCol,nTrellisRow),
>
> main=paste(paste(paste("Plot ",i,sep=""),
> paste(paste(", DVID
",j,sep=""),
> paste(paste(", Occasion
",k,sep=""),
> paste(", Group
> ",l,sep="")))),sep=""))
>
>
>
trellis.par.set(par.xlab.text=list(cex=trellis.par.get("axis.text")[2]))
>
trellis.par.set(par.ylab.text=list(cex=trellis.par.get("axis.text")[2]))
>
>
>
print(myplot,panel.width=list(x=(0.75/nTrellisCol),units="npc"),panel.height=list(x=(0.50/nTrellisRow),units="npc"))
>
> }}}}
> dev.off()
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?