thr3ads.net - R help - [R] data import problem [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Arne.Muller at sanofi-aventis.com

2006-Mar-08 11:32 UTC

[R] data import problem

Dear All,

I'm trying to read a text data file that contains several records separated
by a blank line. Each record starts with a row that contains it's ID and the
number of rows for the records (two columns), then the data table itself, e.g.

123 5
89.1791    1.1024
90.5735    1.1024
92.5666    1.1024
95.0725    1.1024
101.2070    1.1024

321 3
60.1601    1.1024
64.8023    1.1024
70.0593    2.1502

...

I thought I coudl simply use something line this:

con <- file("test2.txt");
do {
    e <- read.table(con, nlines = 1);
    if ( length(e) == 2 ) {
	d <- read.table(con, nrows = e[1,2]);
	#process data frame d
    }
} while (length(e) == 2);

The problem is that read.table closes the connection object, I assumed that it
would not close the connection, and instead contines where it last stopped.

Since the data is nearly a simple table I though read.table could work rather
than using scan directly. Any suggestions to read this file efficently are
welcome (the file can contain several thousand record and each record can
contain several thousand rows).

	thanks a lot for your help,
	+kind regards,

	Arne


	[[alternative HTML version deleted]]

Philipp Pagel

2006-Mar-08 11:43 UTC

head link

[R] data import problem

On Wed, Mar 08, 2006 at 12:32:28PM +0100, Arne.Muller at sanofi-aventis.com
wrote:> I'm trying to read a text data file that contains several records
> separated by a blank line. Each record starts with a row that contains
> it's ID and the number of rows for the records (two columns), then the
> data table itself, e.g. 
> 
> 123 5
> 89.1791    1.1024
> 90.5735    1.1024
> 92.5666    1.1024
> 95.0725    1.1024
> 101.2070    1.1024
> 
> 321 3
> 60.1601    1.1024
> 64.8023    1.1024
> 70.0593    2.1502
That sound like a job for awk. I think it will be much easier to
transform the data into a flat table using awk, python or perl an then
just read the table with R. 

cu
	Philipp

-- 
Dr. Philipp Pagel                            Tel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics      Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS          Tel.  +49-89-3187 3675
GSF - National Research Center               Fax.  +49-89-3187 3585
      for Environment and Health
Ingolst?dter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

Arne.Muller at sanofi-aventis.com

2006-Mar-08 11:49 UTC

head link

[R] data import problem

Well, the data is generated by a perl script, and I could just configure the
perl script so that there is one file per data table, but I though I'd
probably must more efficent to have all records in a single file rather than
reading a thousands of small files ... .

	kind regards,

	Arne

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Philipp Pagel
Sent: Wednesday, March 08, 2006 12:44
To: r-help at stat.math.ethz.ch
Subject: Re: [R] data import problem


On Wed, Mar 08, 2006 at 12:32:28PM +0100, Arne.Muller at sanofi-aventis.com
wrote:> I'm trying to read a text data file that contains several records
> separated by a blank line. Each record starts with a row that contains
> it's ID and the number of rows for the records (two columns), then the
> data table itself, e.g. 
> 
> 123 5
> 89.1791    1.1024
> 90.5735    1.1024
> 92.5666    1.1024
> 95.0725    1.1024
> 101.2070    1.1024
> 
> 321 3
> 60.1601    1.1024
> 64.8023    1.1024
> 70.0593    2.1502
That sound like a job for awk. I think it will be much easier to
transform the data into a flat table using awk, python or perl an then
just read the table with R. 

cu
	Philipp

-- 
Dr. Philipp Pagel                            Tel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics      Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS          Tel.  +49-89-3187 3675
GSF - National Research Center               Fax.  +49-89-3187 3585
      for Environment and Health
Ingolst?dter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Philipp Pagel

2006-Mar-08 12:24 UTC

head link

[R] data import problem

On Wed, Mar 08, 2006 at 12:49:29PM +0100, Arne.Muller at sanofi-aventis.com
wrote:> Well, the data is generated by a perl script, and I could just
> configure the perl script so that there is one file per data table,
> but I though I'd probably must more efficent to have all records in a
> single file rather than reading a thousands of small files ... .
I guess I would make it a singe file and put the IDs in their own
column:

 ID         x         y
123   89.1791    1.1024
123   90.5735    1.1024
123   92.5666    1.1024
123   95.0725    1.1024
123  101.2070    1.1024
321   60.1601    1.1024
321   64.8023    1.1024
321   70.0593    2.1502
...

cu
	Philipp

-- 
Dr. Philipp Pagel                            Tel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics      Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS          Tel.  +49-89-3187 3675
GSF - National Research Center               Fax.  +49-89-3187 3585
      for Environment and Health
Ingolst?dter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

Gabor Grothendieck

2006-Mar-08 14:14 UTC

head link

[R] data import problem

Check out:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/52957.html

for a similar problem.

On 3/8/06, Arne.Muller at sanofi-aventis.com
<Arne.Muller at sanofi-aventis.com> wrote:> Dear All,
>
> I'm trying to read a text data file that contains several records
separated by a blank line. Each record starts with a row that contains it's
ID and the number of rows for the records (two columns), then the data table
itself, e.g.
>
> 123 5
> 89.1791    1.1024
> 90.5735    1.1024
> 92.5666    1.1024
> 95.0725    1.1024
> 101.2070    1.1024
>
> 321 3
> 60.1601    1.1024
> 64.8023    1.1024
> 70.0593    2.1502
>
> ...
>
> I thought I coudl simply use something line this:
>
> con <- file("test2.txt");
> do {
>    e <- read.table(con, nlines = 1);
>    if ( length(e) == 2 ) {
>        d <- read.table(con, nrows = e[1,2]);
>        #process data frame d
>    }
> } while (length(e) == 2);
>
> The problem is that read.table closes the connection object, I assumed that
it would not close the connection, and instead contines where it last stopped.
>
> Since the data is nearly a simple table I though read.table could work
rather than using scan directly. Any suggestions to read this file efficently
are welcome (the file can contain several thousand record and each record can
contain several thousand rows).
>
>        thanks a lot for your help,
>        +kind regards,
>
>        Arne
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Thomas Lumley

2006-Mar-08 14:55 UTC

head link

[R] data import problem

On Wed, 8 Mar 2006, Arne.Muller at sanofi-aventis.com
wrote:>
> I thought I coudl simply use something line this:
>
> con <- file("test2.txt");
> do {
>    e <- read.table(con, nlines = 1);
>    if ( length(e) == 2 ) {
> 	d <- read.table(con, nrows = e[1,2]);
> 	#process data frame d
>    }
> } while (length(e) == 2);
>
> The problem is that read.table closes the connection object, I assumed 
> that it would not close the connection, and instead contines where it 
> last stopped.
I think the problem is just that you didn't open the connection before 
passing it to read.table.

?file says "By default the connection is not opened"
and
?read.table says
           Alternatively, 'file' can be a 'connection', which
will be
           opened if necessary, and if so closed at the end of the
           function call.



 	-thomas

Maybe Matching Threads

Search for more possibly parallel threads

R help - Mar 2006 - data import problem

[R] data import problem

[R] data import problem

[R] data import problem

[R] data import problem

[R] data import problem

[R] data import problem

Maybe Matching Threads