Henrik Andersson
2004-Oct-25 13:56 UTC
[R] Reading sections of data files based on pattern matching
I am about to write general functions to read the output of simulations
models.
These model generate output files with different sections which I want
to analyze plot etc.
Since this will be used many people at the department I wanted to make
sure that will do this in the best way.
For instance I want to read a snippets of data from a text that look
like this.
-------------------------------
Lots of stuff
...
@@Start Values@@
Column1 Column2 Column3 ...
Row1 1 2 3 ...
...
@@End Values@@
More stuff
...
@@Start OtherValues@@
Column1 Column2 Column3 ...
Row1 1 2 3 ...
...
@@End OtherValues@@
I looked in the help files and found grep which operates on character
strings, do I have to like this then?
1. Read file with readLines("foo.txt")
2. grep this object for the start and end of each section ->startline &
stopline
3. Read the file again with
read.table("foo.txt",skip=startline,nrows=stoplin-startline)
Or is there a more beautiful way?
Cheers,
---------------------------------------------
Henrik Andersson
Netherlands Institute of Ecology -
Centre for Estuarine and Marine Ecology
P.O. Box 140
4400 AC Yerseke
Phone: +31 113 577473
h.andersson at nioo.knaw.nl
http://www.nioo.knaw.nl/ppages/handersson
Gabor Grothendieck
2004-Oct-25 15:06 UTC
[R] Reading sections of data files based on pattern matching
Henrik Andersson <h.andersson <at> nioo.knaw.nl> writes:
:
: I am about to write general functions to read the output of simulations
: models.
:
: These model generate output files with different sections which I want
: to analyze plot etc.
:
: Since this will be used many people at the department I wanted to make
: sure that will do this in the best way.
:
: For instance I want to read a snippets of data from a text that look
: like this.
: -------------------------------
: Lots of stuff
: ...
: <at> <at> Start Values <at> <at>
: Column1 Column2 Column3 ...
: Row1 1 2 3 ...
: ...
: <at> <at> End Values <at> <at>
:
: More stuff
: ...
: <at> <at> Start OtherValues <at> <at>
: Column1 Column2 Column3 ...
: Row1 1 2 3 ...
: ...
: <at> <at> End OtherValues <at> <at>
:
:
: I looked in the help files and found grep which operates on character
: strings, do I have to like this then?
:
: 1. Read file with readLines("foo.txt")
: 2. grep this object for the start and end of each section ->startline &
: stopline
: 3. Read the file again with
: read.table("foo.txt",skip=startline,nrows=stoplin-startline)
:
: Or is there a more beautiful way?
You could adapt the following to your situation (i.e. multiple sections
rather than just one):
https://www.stat.math.ethz.ch/pipermail/r-help/2003-November/040184.html
Also regarding your example, one potential gotcha to be aware of is
that skip= skips lines but nrow= counts rows of the data frame so they
are slightly different concepts.
Duncan Murdoch
2004-Oct-25 17:44 UTC
[R] Reading sections of data files based on pattern matching
On Mon, 25 Oct 2004 15:56:37 +0200, Henrik Andersson <h.andersson at nioo.knaw.nl> wrote :>I am about to write general functions to read the output of simulations >models. > >These model generate output files with different sections which I want >to analyze plot etc. > >Since this will be used many people at the department I wanted to make >sure that will do this in the best way. > >For instance I want to read a snippets of data from a text that look >like this. >------------------------------- >Lots of stuff >... >@@Start Values@@ > Column1 Column2 Column3 ... >Row1 1 2 3 ... >... >@@End Values@@ > >More stuff >... >@@Start OtherValues@@ > Column1 Column2 Column3 ... >Row1 1 2 3 ... >... >@@End OtherValues@@ > > >I looked in the help files and found grep which operates on character >strings, do I have to like this then? > >1. Read file with readLines("foo.txt") >2. grep this object for the start and end of each section ->startline & >stopline >3. Read the file again with >read.table("foo.txt",skip=startline,nrows=stoplin-startline) > >Or is there a more beautiful way?I would avoid putting mixing multiple tables in the same file. I think you'll run into fewer problems if you put each table into a separate file, and generate an index file to list all the tables. Each of the files in your scheme would then become a subdirectory in my scheme. If the multiplicity of files is a problem, you could use zip or winzip to put them all into a zip file; R can extract a file from one of those using zip.file.extract. Duncan Murdoch
Possibly Parallel Threads
- Function to read a string as the variables as opposed to taking the string name as the variable
- Split data.frames depeding values of a column
- accessing a data frame with row names
- R Shiny Help - Trouble passing user input columns to emmeans after ANOVA analysis
- Displaying median value over the horizontal(median)line in the boxplot