On Mon, 2005-10-10 at 17:04 -0700, Christina Yau wrote:> Hi,
> I'm sort of a newbie to using R to deal with array data. I'm
trying to
> create a simple filtering function, which outputs only the rows of a
> data frame that satisfies a specific criterion. I've set up an
> iterative loop to apply the condition to each row. I can create a new
> matrix and use rbind to fill it in row by row in the loop, before
> writing the whole matrix to file. But it seems really inefficient,
> especially considering my very large dataset. In fact, I'm worried it
> will cause memory problems if I run the function on the full data set.
> Each row is from a data frame and is associated with a row name and a
> column name. I'm wondering if there's a way to write each row
that
> satisfy the condition to file within the iterative loop directly, while
> keeping the data structure. I've read the help on the 'cat'
function;
> but I'm still not entirely sure how to use it in my situation, or if it
> is the correctly function to use. Any advice will be greatly
> appreciated.
If you can do it with the full dataset, you are probably better off
using subset() to select the rows that meet your criteria and then use
write.table() to write the resultant smaller data frame to a file.
Alternatively, if you do need to do this within the loop, you can use
write.table() with the 'append' argument set to TRUE, so that each new
row from the data frame that meets your criteria gets added to the
existing file, rather than overwriting it. This will be a little slower,
since each time write.table() is called, it opens the file, writes the
line and closes the file, so there is some file I/O overhead.
You don't need to create a new matrix in the loop, just pass the
resultant single row of your subsetting operation to write.table().
See ?subset and ?write.table for more information.
HTH,
Marc Schwartz