Dear All, I'm trying to read a text data file that contains several records separated by a blank line. Each record starts with a row that contains it's ID and the number of rows for the records (two columns), then the data table itself, e.g. 123 5 89.1791 1.1024 90.5735 1.1024 92.5666 1.1024 95.0725 1.1024 101.2070 1.1024 321 3 60.1601 1.1024 64.8023 1.1024 70.0593 2.1502 ... I thought I coudl simply use something line this: con <- file("test2.txt"); do { e <- read.table(con, nlines = 1); if ( length(e) == 2 ) { d <- read.table(con, nrows = e[1,2]); #process data frame d } } while (length(e) == 2); The problem is that read.table closes the connection object, I assumed that it would not close the connection, and instead contines where it last stopped. Since the data is nearly a simple table I though read.table could work rather than using scan directly. Any suggestions to read this file efficently are welcome (the file can contain several thousand record and each record can contain several thousand rows). thanks a lot for your help, +kind regards, Arne [[alternative HTML version deleted]]
On Wed, Mar 08, 2006 at 12:32:28PM +0100, Arne.Muller at sanofi-aventis.com wrote:> I'm trying to read a text data file that contains several records > separated by a blank line. Each record starts with a row that contains > it's ID and the number of rows for the records (two columns), then the > data table itself, e.g. > > 123 5 > 89.1791 1.1024 > 90.5735 1.1024 > 92.5666 1.1024 > 95.0725 1.1024 > 101.2070 1.1024 > > 321 3 > 60.1601 1.1024 > 64.8023 1.1024 > 70.0593 2.1502That sound like a job for awk. I think it will be much easier to transform the data into a flat table using awk, python or perl an then just read the table with R. cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich Science Center Weihenstephan 85350 Freising, Germany and Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675 GSF - National Research Center Fax. +49-89-3187 3585 for Environment and Health Ingolst?dter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel
Well, the data is generated by a perl script, and I could just configure the perl script so that there is one file per data table, but I though I'd probably must more efficent to have all records in a single file rather than reading a thousands of small files ... . kind regards, Arne -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Philipp Pagel Sent: Wednesday, March 08, 2006 12:44 To: r-help at stat.math.ethz.ch Subject: Re: [R] data import problem On Wed, Mar 08, 2006 at 12:32:28PM +0100, Arne.Muller at sanofi-aventis.com wrote:> I'm trying to read a text data file that contains several records > separated by a blank line. Each record starts with a row that contains > it's ID and the number of rows for the records (two columns), then the > data table itself, e.g. > > 123 5 > 89.1791 1.1024 > 90.5735 1.1024 > 92.5666 1.1024 > 95.0725 1.1024 > 101.2070 1.1024 > > 321 3 > 60.1601 1.1024 > 64.8023 1.1024 > 70.0593 2.1502That sound like a job for awk. I think it will be much easier to transform the data into a flat table using awk, python or perl an then just read the table with R. cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich Science Center Weihenstephan 85350 Freising, Germany and Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675 GSF - National Research Center Fax. +49-89-3187 3585 for Environment and Health Ingolst?dter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Wed, Mar 08, 2006 at 12:49:29PM +0100, Arne.Muller at sanofi-aventis.com wrote:> Well, the data is generated by a perl script, and I could just > configure the perl script so that there is one file per data table, > but I though I'd probably must more efficent to have all records in a > single file rather than reading a thousands of small files ... .I guess I would make it a singe file and put the IDs in their own column: ID x y 123 89.1791 1.1024 123 90.5735 1.1024 123 92.5666 1.1024 123 95.0725 1.1024 123 101.2070 1.1024 321 60.1601 1.1024 321 64.8023 1.1024 321 70.0593 2.1502 ... cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich Science Center Weihenstephan 85350 Freising, Germany and Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675 GSF - National Research Center Fax. +49-89-3187 3585 for Environment and Health Ingolst?dter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel
Check out: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/52957.html for a similar problem. On 3/8/06, Arne.Muller at sanofi-aventis.com <Arne.Muller at sanofi-aventis.com> wrote:> Dear All, > > I'm trying to read a text data file that contains several records separated by a blank line. Each record starts with a row that contains it's ID and the number of rows for the records (two columns), then the data table itself, e.g. > > 123 5 > 89.1791 1.1024 > 90.5735 1.1024 > 92.5666 1.1024 > 95.0725 1.1024 > 101.2070 1.1024 > > 321 3 > 60.1601 1.1024 > 64.8023 1.1024 > 70.0593 2.1502 > > ... > > I thought I coudl simply use something line this: > > con <- file("test2.txt"); > do { > e <- read.table(con, nlines = 1); > if ( length(e) == 2 ) { > d <- read.table(con, nrows = e[1,2]); > #process data frame d > } > } while (length(e) == 2); > > The problem is that read.table closes the connection object, I assumed that it would not close the connection, and instead contines where it last stopped. > > Since the data is nearly a simple table I though read.table could work rather than using scan directly. Any suggestions to read this file efficently are welcome (the file can contain several thousand record and each record can contain several thousand rows). > > thanks a lot for your help, > +kind regards, > > Arne > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
On Wed, 8 Mar 2006, Arne.Muller at sanofi-aventis.com wrote:> > I thought I coudl simply use something line this: > > con <- file("test2.txt"); > do { > e <- read.table(con, nlines = 1); > if ( length(e) == 2 ) { > d <- read.table(con, nrows = e[1,2]); > #process data frame d > } > } while (length(e) == 2); > > The problem is that read.table closes the connection object, I assumed > that it would not close the connection, and instead contines where it > last stopped.I think the problem is just that you didn't open the connection before passing it to read.table. ?file says "By default the connection is not opened" and ?read.table says Alternatively, 'file' can be a 'connection', which will be opened if necessary, and if so closed at the end of the function call. -thomas