Clark Allan wrote:
> hi all
>
> i know that one should try and limit the amount of looping in R
> programs. i have supplied some code below. i am interested in seeing how
> the code cold be rewritten if we dont use the loops.
It is not always a good thing to remove loops (without having looked at
each details of the code below).
One case seems to be described below, where you probably already get
memory problems and hence cannot (at least not without memory penalty)
"vectorize" any more.
"Compromise" is the keyword.
Best,
Uwe Ligges
>
> a brief overview of what is done in the code.
> =============================================>
=============================================>
=============================================>
> 1. the input file contains 120*500*61 cells. 120*500 rows and 61
> columns.
>
> 2. we need to import the cells in 500 at a time and perform the same
> operations on each sub group
>
> 3. the file contais numeric values. there are quite a lot of missing
> values. this has been coded as NA in the text file (the file that is
> imported)
>
> 4. for each variable we check for outliers. this is done by setting all
> values that are greater than 3 standard deviations (sd) from the mean of
> a variable to be equal to the 3 sd value.
>
> 5. the data set has one response variable , the first column, and 60
> explanatory variables.
>
> 6. we regress each of the explanatory variables against the response and
> record the slope of the explanatory variable. (i.e. simple linear
> regression is performed)
>
> 7. nsize = 500 since we import 500 rows at a time
>
> 8. nruns = how many groups you want to run the analysis on
>
> =============================================>
=============================================>
=============================================>
>
> TRY<-function(nsize=500,filename="C:/A.txt",nvar=61,nruns=1)
> {
>
> #the matrix with the payoff weights
> fit.reg<-matrix(nrow=nruns,ncol=nvar-1)
>
> for (ii in 1:nruns)
> {
> skip=1+(ii-1)*nsize
>
> #import the data in batches of "nsize*nvar"
> #save as a matrix and then delete "dscan" to save memory space
>
>
dscan<-scan(file=filename,sep="\t",skip=skip,nlines=nsize,fill=T,quiet=T)
> dm<-matrix(dscan,nrow=nsize,byrow=T)
> rm(dscan)
>
> #this calculates which of the columns have entries in the columns
> #that are not NA
> #only perform regressions on those with more than 2 data points
> #obviously the number of points has to be much larger than 2
> #col.points = the number of points in the column that are not NA
>
> col.points<-apply(dm,2,function(x)
> sum(match(x,rep(NA,nsize),nomatch=0)))
> col.points
>
> #adjust for outliers
> dm.new<-dm
> mean.dm.new<-apply(dm.new,2,function(x) mean(x,na.rm=T))
> sd.dm.new<-apply(dm.new,2,function(x) sd(x,na.rm=T))
> top.dm.new<-mean.dm.new+3*sd.dm.new
> bottom.dm.new<-mean.dm.new-3*sd.dm.new
>
> for (i in 1:nvar)
> {
> dm.new[,i][dm.new[,i]>top.dm.new[i]]<-top.dm.new[i]
> dm.new[,i][dm.new[,i]<bottom.dm.new[i]]<-bottom.dm.new[i]
> }
>
> #standardize the variables
> #we dont have to change the variable names here but i did!
> means.dm.new<-apply(dm.new,2,function(x) mean(x,na.rm=T))
> std.dm.new<-apply(dm.new,2,function(x) sd(x,na.rm=T))
>
>
dm.new<-sweep(sweep(dm.new,2,means.dm.new,"-"),2,std.dm.new,"/")
>
> for (j in 2:nvar)
> {
> 'WE DO NOT PERFORM THE REGRESSION IF ALL VALUES IN THE COLUMN ARE
"NA"
> if (col.points[j]!=nsize)
> {
> #fit the regression equations
> fit.reg[ii,j-1]<-summary(lm(dm.new[,1]~dm.new[,j]))$coef[2,1]
> }
> else fit.reg[ii,j-1]<-"L"
> }
> }
>
>
dm.names<-scan(file=filename,sep="\t",skip=0,nlines=1,fill=T,quiet=T,what="charachter")
> dm.names<-matrix(dm.names,nrow=1,ncol=nvar,byrow=T)
> colnames(fit.reg)<-dm.names[-1]
>
> output<-c("$fit.reg")
>
> list(fit.reg=fit.reg,output=output)
>
> }
>
> a=TRY(nsize=500,filename="C:/A.txt",nvar=61,nruns=1)
>
>
> =============================================>
=============================================>
=============================================>
>
>
>
> thanking you in advance
> /
> allan
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html