thr3ads.net - R help - [R] reading and parsing gzipped files [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Dmitriy Skvortsov

2008-Sep-05 05:48 UTC

[R] reading and parsing gzipped files

Hi all,I have  large  compressed text tab delimited  files,
I am trying to write efficient function to read them,
I am using   gzfile()  and readLines()

zz <- gzfile("exampl.txt.gz", "r")  # compressed file
system.time(temp1<-readLines(zz ))
close(zz)

which work fast, and create vector of strings.
The problem is to parse the result, if  I use strsplit  it takes longer then
decompress file manually , read it with scan and erase it.

Can anybody recommend  an efficient way of parsing large vector  ~200,000
entries
Dmitriy

	[[alternative HTML version deleted]]

Prof Brian Ripley

2008-Sep-05 07:25 UTC

head link

[R] reading and parsing gzipped files

On Thu, 4 Sep 2008, Dmitriy Skvortsov wrote:
> Hi all,I have  large  compressed text tab delimited  files,
> I am trying to write efficient function to read them,
> I am using   gzfile()  and readLines()
>
> zz <- gzfile("exampl.txt.gz", "r")  # compressed
file
> system.time(temp1<-readLines(zz ))
> close(zz)
>
> which work fast, and create vector of strings.
> The problem is to parse the result, if  I use strsplit  it takes longer
then
> decompress file manually , read it with scan and erase it.
>
> Can anybody recommend  an efficient way of parsing large vector  ~200,000
> entries
'parse'?  What is wrong with using read.delim (reading 'tab
delimited
files' is its job)?  It (and scan) work with gzfile connections, so there 
is no need to decompress manually.

See the 'R Data Import/Export Manual' for how to use read.delim
efficiently.
> Dmitriy
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
PLEASE do.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Sep 2008 - reading and parsing gzipped files

[R] reading and parsing gzipped files

[R] reading and parsing gzipped files

Possibly Parallel Threads