On Fri, 21 Jun 2002 james.holtman at convergys.com wrote:
> I was trying to read in a file and delete lines that did not have the
> correct
> number of fields on them. I was reading the file as one character vector
> per line
> using 'scan' with sep='\n'. I was then using
'count.fields' with
> 'textConnection' to the object I just read in.
>
> I thought at first the system was locked up, but further testing showed
> that the
> 'textConnection' was a very slow way to read in data to
'count.fields' as
> compared to
> 'count.fields' just reading the file.
>
> Is this a characteristic of using 'textConnection' on large
objects?
Yes, input from `textConnection' like this will be slow. It's a
character-at-a-time process, allowing for pushbacks. For large scratch
use, use a scratch file (via file()).
>
> =============================================================>
> > unix.time(x.1 <- scan('iostat.zigzag.020620',
what='', sep='\n'))
> Read 117163 items
> [1] 4.00 0.07 4.08 NA NA
> > str(x.1)
> chr [1:117163] "000035 atf233 0.0 0.8 0.0 5.9 0.0
0.0
> 9.3 0 0 " ...
> #
> # count.fields just reading the file directly; this appears to work fine
> (<4 seconds)
> #
> > unix.time(x.2 <- count.fields('iostat.zigzag.020620'))
> [1] 3.35 0.04 3.39 NA NA
> > str(x.2)
> int [1:117163] 11 11 11 11 11 11 11 11 11 11 ...
> > sum(x.2 != 11) # determine number of 'bad' records
> [1] 3
> #
> # processing times get longer with larger objects
> #
> > unix.time(x.3 <- count.fields(textConnection(x.1[1:3000])))
> [1] 0.94 0.00 0.94 NA NA
> > unix.time(x.3 <- count.fields(textConnection(x.1[1:7000])))
> [1] 13.61 0.02 13.64 NA NA
> > unix.time(x.3 <- count.fields(textConnection(x.1[1:10000])))
> [1] 31.61 0.00 31.75 NA NA
> >
>
>
> platform "i386-pc-mingw32"
> arch "i386"
> os "mingw32"
> system "i386, mingw32"
> status ""
> major "1"
> minor "5.1"
> year "2002"
> month "06"
> day "17"
> language "R"
>
> --
>
> NOTICE: The information contained in this electronic mail transmission is
> intended by Convergys Corporation for the use of the named individual or
> entity to which it is directed and may contain information that is
> privileged or otherwise confidential. If you have received this electronic
> mail transmission in error, please delete it from your system without
> copying or forwarding it, and notify the sender of the error by reply email
> or by telephone (collect), so that the sender's address records can be
> corrected.
>
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._