thr3ads.net - R help - [R] shoudl I use apply, sapply, etc instead of a "for loop"? [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Thomas Pujol

2007-Jun-20 18:58 UTC

[R] shoudl I use apply, sapply, etc instead of a "for loop"?

I have been trying to learn the various "apply" functions but am still
learning their appropriate use.  I appreciate any help the R community can offer
me.  Sorry for the length of this post.

Background:

I have data on my hard drive organized in the following manner:

The data pertains to many different "samples" of data. (e.g. sample
001, sample, 002, sample 003, etc.)

Each "sample" contains many different "data frames" for a
large number of different data-items.
(e.g. sat score, median income of zip-code, gender, GPA, etc)

The data frames and files are each named with the data-item name as the
"prefix" of the name and the "sample number" as the suffix
of the name.
e.g. sat.001, income.001, sat.002, income.002

Each data frame has approximately 5,000 rows, 1 for each "person".

Note: The files are somehat large, and most of my analysis will be completed
within each "sample" .  (Thus, I think that I should probably keep the
files stored as separate files, and not combine them into a larger list or data
frames. I also do not think I want to load all the files for multiple samples at
once, as this mayy take up too much memory.)  Also, I have similar simplified
description of the files; many contain multiple columns of data.


###############
I have written a "for" loop that does the following:

a. For each "sample period" I load two files.
b. I perform a function on the data contain din these two files.
c. I take the results and save them as a new file.
I proceed to the next sample.

Is there a "better" (i.e. more elegant and/or efficient) way to do
this, perhaps with one of the "apply" functions? (e.g. apply, sapply,
lapply, tapply?)

#e.g. my simplified code

#this creates example data:
sat.001=c(500,400,750)
sat.002=c(245,455,767)

income.001=c(5020,4200,7250)
income.002=c(2425,4525,7627)

filenames=c('sat.001', 'sat.002', 'income.001',
'income.002')
sapply(filenames,function(x) { save( list=x , file = paste(x ,'.r', sep
='')  ) })
rm(sat.001,sat.002,income.001,income.002,filenames)
ls() #
##############
#my for loop

divide = function(x,y) {x/y} 
#creates a custom function


#inputs to my loop:
samplenames=c('001','002')
x.name='sat'
y.name='income'
fun='divide'

for (i in 1:length(samplenames) ) {

x.name.suf = paste(x.name,samplenames[i],sep='.') 
#name of x file on hrd drive

y.name.suf = paste(y.name,samplenames[i],sep='.') 
#name of y file on hrd drive

x=get(load(file = paste(x.name.suf ,'r', sep ='.')  , envir =
.GlobalEnv) )
#loads and gets the x file

y=get(load(file = paste(y.name.suf ,'r', sep ='.')  , envir =
.GlobalEnv) )
#loads and gets the y file

temp=get(fun)(x,y) 
#applies custom function specified in arguments above
# to data  contained in x and y files

save( list='temp' , file = paste(fun,x.name
,y.name,samplenames[i],sep='.') )
#save the results in files with name that specifies 
#name of function, name of x, name of y, and sample number
#files will be used for later analysis

rm(list=paste(x.name.suf , sep ='.'))
rm(list=paste(y.name.suf , sep ='.'))
rm(x.name.suf,y.name.suf,x,y,temp)
}

rm(divide,samplenames,x.name,y.name,fun,i)
ls()


 
---------------------------------
Bored stiff? Loosen up...

	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more reasonably related threads

R help - Jun 2007 - shoudl I use apply, sapply, etc instead of a "for loop"?

[R] shoudl I use apply, sapply, etc instead of a "for loop"?

Apparently Analagous Threads

Wisdom of the Ancients