thr3ads.net - R help - [R] Problem to remove loops in a routine [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Sébastien

2007-Aug-01 21:57 UTC

[R] Problem to remove loops in a routine

Dear R-users,

I have written the following code to generate some trellis plots. It 
works perfectly fine except that it is quite slow when it is apply to my 
typical datasets (over several thousands of lines). I believe the 
problem comes from the loops I am using to subset my data.frame. I read 
in the archives that the tapply function is often more efficient than a 
loop in R. Unfortunately , it seems that I am not enough familiar with 
the "philosophy" of this function to implement it in my code. Would
you
have some suggestions to speed up the whole thing?

Thanks in advance

Sebastien

PS: the rationale behind these loops is to split the trellis plots on 
different pages, all plots on a page (or a group of pages) having a 
given combination of values for the PLOT, DVID, PER and GRP parameters.

###############################################

library(lattice)
rm(list=ls(all=TRUE))

# Generate a dummy dataset with
# - 20 individuals (ID)
# - individuals 1 to 10 belong to group (GRP) 1, 11 to 20 belong to group 2
# - measurements (DV) done at 10 time points (TIME) per individuals on 2 
occassions (OCC)
# - modelisation of the DV versus TIME relationships with 4 different 
models (MODEL)
# - predicted values (Y)
# - the PLOT column serves as a flag to plot together the models (A and 
B) and (C and D)

PLOT<-rep(1:2,each=40,times=20)
ID<-rep(1:20,each=80)
OCC<-rep(1:2,each=10,times=80)
GRP<-as.numeric(rep(gl(2,80),times=10))
MODEL<-as.vector(rep(gl(4,20,label=c("A","B","C","D")),times=20))
TIME<-rep.int(1:10,160)
DV<-OCC*(1:10)*rep(rnorm(20,50,10),each=80)+rep(rnorm(20,10,1),each=80)
Y<-jitter(DV)

mydata<-data.frame(PLOT,ID,OCC,GRP,MODEL,TIME,DV,Y)

mydata$DVID<-rep.int(1,1600)   #in a real dataset, DVID could have 
typically 2 to 3 levels

#############
# Plotting routine
#############

myPath<-"C:/"                    #TO BE MODIFIED

nTrellisCol<-2                                    #number of columns per 
Trellis plot
nTrellisRow<-3                                    #number of lines per 
Trellis plot

nDVID<-nlevels(factor(mydata$DVID))                #number of 
DVID=observations types
nidPlot<-nlevels(factor(mydata$PLOT))            #number of items in the 
PLOT column
nPer<-nlevels(factor(mydata$OCC))                  #number of occassions 
(OCC, PER, etc...)
nGRP<-nlevels(factor(mydata$GRP))         #number of groups

pdf(file=paste(myPath,"test.pdf",sep=""))         

trellis.par.set(par.main.text=list(cex=1))
trellis.par.set(par.ylab.text=list(font=2))
trellis.par.set(par.xlab.text=list(font=2))

for (i in 1:nidPlot) {                                               
#loop on PLOT id
#    i=1
    idata<-subset(mydata,mydata$PLOT==i)   
   
for (j in 1:nDVID) {                                                 
#loop on DVID
#    j=1       
    ijdata<-subset(idata,idata$DVID==j)
   
        for (k in 1:nPer) {                                             
#loop on Period
#    k=1

    ijkdata<-subset(ijdata,ijdata$OCC==k)
   
         for (l in 1:nGRP) {                           #loop on Group
# l=1

  subdata<-subset(ijkdata,ijkdata$GRP==l)
 
  nModel<-nlevels(factor(subdata$MODEL))                           
#number of models to be plotted in this loop
    mylegend<-c("Raw data",levels(factor(subdata$MODEL)))
    subID<-nlevels(factor(subdata$ID))                       #number of 
ID in the new dataset

  myplot<-xyplot(Y ~ TIME | ID,                               #creates plot
            data = subdata, type = "l",
            groups = MODEL,
            observed = subdata$DV,
            as.table=TRUE,
            panel = function(x, y, ..., subscripts, observed) {
                panel.points(x, pch=3,col=1,observed[subscripts])
                panel.xyplot(x, y, ..., 
col=2:nlevels(subdata$MODEL),subscripts = subscripts)},
       
            strip=function (which.panel,...){
                col<-rep("Black",subID)
                llines(c(0,1,1,0,0),c(0,0,1,1,0),col.line=1)
                ltext(rep(0.5,subID),rep(0.5,subID),
                paste("Subject 
",levels(factor(subdata$ID))[which.panel],sep=""),cex=trellis.par.get("axis.text")[2])},
       
            key=list(space="bottom",
               lines = list(pch = as.integer(c(3,rep("",nModel))),
type
= c("p", gl(1,nModel,label="l")),
                             col = 
1:(nModel+1),"cex"=trellis.par.get("axis.text")[2]),
                     text=list(mylegend, 
"cex"=trellis.par.get("axis.text")[2])),
       
            xlab="Time (hr)",
            ylab="Concentration (ng/mL)",
            layout=c(nTrellisCol,nTrellisRow),

            main=paste(paste(paste("Plot ",i,sep=""),
                             paste(paste(", DVID
",j,sep=""),
                                   paste(paste(", Occasion
",k,sep=""),
                                         paste(", Group 
",l,sep="")))),sep=""))
                                        

 
trellis.par.set(par.xlab.text=list(cex=trellis.par.get("axis.text")[2]))
   
trellis.par.set(par.ylab.text=list(cex=trellis.par.get("axis.text")[2]))
               
    
print(myplot,panel.width=list(x=(0.75/nTrellisCol),units="npc"),panel.height=list(x=(0.50/nTrellisRow),units="npc"))
       
        }}}}
dev.off()

jim holtman

2007-Aug-01 22:57 UTC

head link

[R] Problem to remove loops in a routine

First thing to do is to use Rprof to determine where the time is being
spent and then you can pinpoint what section of code is taking the
time.  A quick look says to do all your subsetting at once.  You might
look into using 'split' to create the subsets and then access the
subsets with "[[...]]".  So use Rprof and see if most of the time is
being spent the the data.frame subsetting functions.

On 8/1/07, S?bastien <pomchip at free.fr> wrote:> Dear R-users,
>
> I have written the following code to generate some trellis plots. It
> works perfectly fine except that it is quite slow when it is apply to my
> typical datasets (over several thousands of lines). I believe the
> problem comes from the loops I am using to subset my data.frame. I read
> in the archives that the tapply function is often more efficient than a
> loop in R. Unfortunately , it seems that I am not enough familiar with
> the "philosophy" of this function to implement it in my code.
Would you
> have some suggestions to speed up the whole thing?
>
> Thanks in advance
>
> Sebastien
>
> PS: the rationale behind these loops is to split the trellis plots on
> different pages, all plots on a page (or a group of pages) having a
> given combination of values for the PLOT, DVID, PER and GRP parameters.
>
> ###############################################
>
> library(lattice)
> rm(list=ls(all=TRUE))
>
> # Generate a dummy dataset with
> # - 20 individuals (ID)
> # - individuals 1 to 10 belong to group (GRP) 1, 11 to 20 belong to group 2
> # - measurements (DV) done at 10 time points (TIME) per individuals on 2
> occassions (OCC)
> # - modelisation of the DV versus TIME relationships with 4 different
> models (MODEL)
> # - predicted values (Y)
> # - the PLOT column serves as a flag to plot together the models (A and
> B) and (C and D)
>
> PLOT<-rep(1:2,each=40,times=20)
> ID<-rep(1:20,each=80)
> OCC<-rep(1:2,each=10,times=80)
> GRP<-as.numeric(rep(gl(2,80),times=10))
>
MODEL<-as.vector(rep(gl(4,20,label=c("A","B","C","D")),times=20))
> TIME<-rep.int(1:10,160)
> DV<-OCC*(1:10)*rep(rnorm(20,50,10),each=80)+rep(rnorm(20,10,1),each=80)
> Y<-jitter(DV)
>
> mydata<-data.frame(PLOT,ID,OCC,GRP,MODEL,TIME,DV,Y)
>
> mydata$DVID<-rep.int(1,1600)   #in a real dataset, DVID could have
> typically 2 to 3 levels
>
> #############
> # Plotting routine
> #############
>
> myPath<-"C:/"                    #TO BE MODIFIED
>
> nTrellisCol<-2                                    #number of columns per
> Trellis plot
> nTrellisRow<-3                                    #number of lines per
> Trellis plot
>
> nDVID<-nlevels(factor(mydata$DVID))                #number of
> DVID=observations types
> nidPlot<-nlevels(factor(mydata$PLOT))            #number of items in the
> PLOT column
> nPer<-nlevels(factor(mydata$OCC))                  #number of occassions
> (OCC, PER, etc...)
> nGRP<-nlevels(factor(mydata$GRP))         #number of groups
>
> pdf(file=paste(myPath,"test.pdf",sep=""))
>
> trellis.par.set(par.main.text=list(cex=1))
> trellis.par.set(par.ylab.text=list(font=2))
> trellis.par.set(par.xlab.text=list(font=2))
>
> for (i in 1:nidPlot) {
> #loop on PLOT id
> #    i=1
>    idata<-subset(mydata,mydata$PLOT==i)
>
> for (j in 1:nDVID) {
> #loop on DVID
> #    j=1
>    ijdata<-subset(idata,idata$DVID==j)
>
>        for (k in 1:nPer) {
> #loop on Period
> #    k=1
>
>    ijkdata<-subset(ijdata,ijdata$OCC==k)
>
>         for (l in 1:nGRP) {                           #loop on Group
> # l=1
>
>  subdata<-subset(ijkdata,ijkdata$GRP==l)
>
>  nModel<-nlevels(factor(subdata$MODEL))
> #number of models to be plotted in this loop
>    mylegend<-c("Raw data",levels(factor(subdata$MODEL)))
>    subID<-nlevels(factor(subdata$ID))                       #number of
> ID in the new dataset
>
>  myplot<-xyplot(Y ~ TIME | ID,                               #creates
plot
>            data = subdata, type = "l",
>            groups = MODEL,
>            observed = subdata$DV,
>            as.table=TRUE,
>            panel = function(x, y, ..., subscripts, observed) {
>                panel.points(x, pch=3,col=1,observed[subscripts])
>                panel.xyplot(x, y, ...,
> col=2:nlevels(subdata$MODEL),subscripts = subscripts)},
>
>            strip=function (which.panel,...){
>                col<-rep("Black",subID)
>                llines(c(0,1,1,0,0),c(0,0,1,1,0),col.line=1)
>                ltext(rep(0.5,subID),rep(0.5,subID),
>                paste("Subject
>
",levels(factor(subdata$ID))[which.panel],sep=""),cex=trellis.par.get("axis.text")[2])},
>
>            key=list(space="bottom",
>               lines = list(pch = as.integer(c(3,rep("",nModel))),
type
> = c("p", gl(1,nModel,label="l")),
>                             col >
1:(nModel+1),"cex"=trellis.par.get("axis.text")[2]),
>                     text=list(mylegend,
> "cex"=trellis.par.get("axis.text")[2])),
>
>            xlab="Time (hr)",
>            ylab="Concentration (ng/mL)",
>            layout=c(nTrellisCol,nTrellisRow),
>
>            main=paste(paste(paste("Plot ",i,sep=""),
>                             paste(paste(", DVID
",j,sep=""),
>                                   paste(paste(", Occasion
",k,sep=""),
>                                         paste(", Group
> ",l,sep="")))),sep=""))
>
>
> 
trellis.par.set(par.xlab.text=list(cex=trellis.par.get("axis.text")[2]))
>   
trellis.par.set(par.ylab.text=list(cex=trellis.par.get("axis.text")[2]))
>
>
>
print(myplot,panel.width=list(x=(0.75/nTrellisCol),units="npc"),panel.height=list(x=(0.50/nTrellisRow),units="npc"))
>
>        }}}}
> dev.off()
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Apparently Analagous Threads

Search for more reasonably related threads

R help - Aug 2007 - Problem to remove loops in a routine

[R] Problem to remove loops in a routine

[R] Problem to remove loops in a routine

Apparently Analagous Threads