thr3ads.net - R help - [R] file input with readLines [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Cable, Sam B Civ USAF AFMC AFRL/RVBXI

2011-Oct-03 17:19 UTC

[R] file input with readLines

I am using readLines to read a fairly large ASCII file.  readLines reads
a fixed number of lines, then other R code processes the data, then
readLines reads the same number of lines again, then other R code
processes the data, then ....

 

Sort of like:

 

conn<-file('filename','r')

for (chunk in 1:100000) {

   Lines<-readLines(conn,n=25)

  # process "Lines"

}

 

The code is working, but I notice that it slows down greatly as time
progresses.  It took 2 seconds to read my first chunk of data, 4 seconds
to read the next chunk, 10 after that.  The quasi-exponential trend has
slowed, thank goodness, but after about a hundred reads, the read time
for the next chunk is over a minute.  Let me stress that the number of
lines read in each chunk of data is absolutely fixed.

 

The only processing I am doing at the point is to parse the new data,
and rbind the results to an existing data frame.  Processing of new data
in no way depends on earlier data.

 

So, my question is why is the reading taking longer as time goes on?  Is
there a way to fix this?  Is there a better method than readLines?

 

Thanks.


	[[alternative HTML version deleted]]

Cable, Sam B Civ USAF AFMC AFRL/RVBXI

2011-Oct-03 18:26 UTC

head link

[R] file input with readLines

More on my previous question ...

I have put in timing statements to try to get a better idea of where the
problem is, like so:

conn<-file('filename','r')

for (chunk in 1:100000) {
   print(paste('begin read at',date()))
   Lines<-readLines(conn,n=25)
   print(paste('begin processing at',date()))
  # process "Lines"
   print(paste('end loop at',date()))
}

Every time I go through the loop, all the date() functions return
*exactly* the same time!  It *looks like* it runs through each iteration
very quickly and then takes longer and longer to simply start the next
iteration.  I don't believe this.  I think R must be doing some kind of
latency trick or something.  But, anyway, the point is that I was
assuming the problem was in the I/O, and now I don't know if it's I/O or
processing.  Either way, I don't understand it and would really
appreciate some wisdom from you guys.

Thanks.

Uwe Ligges

2011-Oct-04 15:41 UTC

head link

[R] file input with readLines

On 03.10.2011 19:19, Cable, Sam B Civ USAF AFMC AFRL/RVBXI
wrote:> I am using readLines to read a fairly large ASCII file.  readLines reads
> a fixed number of lines, then other R code processes the data, then
> readLines reads the same number of lines again, then other R code
> processes the data, then ....
>
>
>
> Sort of like:
>
>
>
> conn<-file('filename','r')
>
> for (chunk in 1:100000) {
>
>     Lines<-readLines(conn,n=25)
>
>    # process "Lines"
>
> }
>
>
>
> The code is working, but I notice that it slows down greatly as time
> progresses.  It took 2 seconds to read my first chunk of data, 4 seconds
> to read the next chunk, 10 after that.  The quasi-exponential trend has
> slowed, thank goodness, but after about a hundred reads, the read time
> for the next chunk is over a minute.  Let me stress that the number of
> lines read in each chunk of data is absolutely fixed.
>
>
>
> The only processing I am doing at the point is to parse the new data,
> and rbind the results to an existing data frame.
And that's may be the interesting point.
Have you tried to allocate the whole data.frame and assign into it 
later? It is probbaly not readLines() slowing you down.
A minute seems to be quite a lot for resonable sized data. How many 
columns are we talking about?.

Uwe Ligges



>  Processing of new data
> in no way depends on earlier data.
>
>
>
> So, my question is why is the reading taking longer as time goes on?  Is
> there a way to fix this?  Is there a better method than readLines?
>
>
>
> Thanks.
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Oct 2011 - file input with readLines

[R] file input with readLines

[R] file input with readLines

[R] file input with readLines

Seemingly Similar Threads