If I open a tgz archive with gzfile and then parse it using readLines I miss the initial line of each member of the archive - and also the name of the file although the archive otherwise complete (but useless!). Is there any way within R to extract both the list of files in a tgz archive and to extract any one of these files? Clearly I can use zcat and tar on Linux, but I need this to work within the R environment on Windows! Thanks John James [[alternative HTML version deleted]]
On Tue, 14 Nov 2006, John James wrote:> If I open a tgz archive with gzfile and then parse it using readLines I miss > the initial line of each member of the archive - and also the name of the > file although the archive otherwise complete (but useless!).You can use a gzfile connection to read the underlying .tar file, but that is not a text file and you will need to pick its structure apart yourself via readBin and readChar.> Is there any way within R to extract both the list of files in a tgz archive > and to extract any one of these files?> Clearly I can use zcat and tar on Linux, but I need this to work within the > R environment on Windows!You could use tar on Windows: it is in the R tools set. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
If you know how to use the unix/linux tools already, then you may want to look at cygwin (http://www.cygwin.com/), it allows those of us trapped in a windows world to still lead productive lives with the unix/linux tools. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of John James Sent: Tuesday, November 14, 2006 5:07 AM To: r-help at stat.math.ethz.ch Subject: [R] gzfile with multiple entries in the archive If I open a tgz archive with gzfile and then parse it using readLines I miss the initial line of each member of the archive - and also the name of the file although the archive otherwise complete (but useless!). Is there any way within R to extract both the list of files in a tgz archive and to extract any one of these files? Clearly I can use zcat and tar on Linux, but I need this to work within the R environment on Windows! Thanks John James [[alternative HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Maybe Matching Threads
- readLines() behaves differently for gzfile connection
- unexpected behavior from gzfile and unz
- readLines() behaves differently for gzfile connection
- How to read last (incomplete) line from gzfile()?
- getting corrupted data when using readBin() after seek() on a gzfile connection