Running R 1.6.1
Linux Slackware 8.1
233MHZ AMD-K6 96MB RAM
Using read.fwf, I tried to open a fixed-width file that of about 4 MB residing
in the working directory, using the command below:
dat<-read.fwf("sc01aai.dat", widths=fields$length)
where fields$lengths is a vector of column widths, 28 to be exact. The data
are a mix of character, text, and factor variables.
R started processing and continued doing so for more than an hour and a half
before I returned and stopped it. It was obviously still working, CPU,
memory, and swap space all ablaze.
Question: Did I miss something here in issuing the command?
Alas, I tired with various options with no success. Then, looked at the code
and tried to implement it another way. Don't have the code I used where I
am
right now, but the method was as follows:
1. Read in the data with readLines
2. Created beginning and end rows from the widths.
3. Set up an empty data frame with dims = number of nrow and length of the
widths vector.
4. Used sapply over 1:nrows to substring each row of the input data by the
two vectors, beginning and end created from widths and using <<- to assign
the
substring output to the nth row of the empty data frame.
This approach worked well and took only 5-10 minutes, success for my immediate
endeavor. But, this leads to my other questions.
Questions:
Are there any problems with the approach that I describe above?
Why such a difference between read.fwf, which (as I read it) cuts up a file,
stores it in a format that read.table can handle, then reads it with
read.table) and what I have described, other than what seems an extra step?
I am not confident with my use of global assignment, just out of
unfamiliarity. I think it is OK given that I have defined the object prior to
this assignment within the function, this way it doesn?t escape the function
environment, correct?
Finally, if this is faster, and not my misuse of read.fwf, is it
generalizable, and why not replace read.fwf with this approach?
Brett