thr3ads.net - R help - [R] Gobbling up a repeating, irregular list of data [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Morway, Eric

2016-Nov-11 04:26 UTC

[R] Gobbling up a repeating, irregular list of data

What would be the sophisticated R method for reading the data shown below
into a list?  The data is output from a numerical model.  Pasting the
second block of example R commands (at the end of the message) results in a
failure ("Error in scan...line 2 did not have 6 elements").  I no
doubt
could cobble together some script for reading line-by-line using for loops,
and then appending vectors with values from each line, but this strikes me
as bad form.

One final note, the lines with 6 values contain important values that
should somehow remain associated with the data appearing in columns 5 & 6
(the continuous data).  The first value, which is always 1, can be
discarded, but the second value on these lines contain the time step number
("1.00E+00", "2.00E+00", etc.), the 3rd and 4th values are
contain a depth
and thickness, respectively. Columns 5 & 6 are a depth and water content
pairing and should be associated with the time steps.

Thanks, Eric

Start of example output data (Use of an R script to read in this data below)

  1    1.00E+00  1.24E+03  7.79E+00  1.925E-01  1.88E-01
                                     3.850E-01  1.88E-01
                                     5.775E-01  1.88E-01
                                     7.700E-01  1.88E-01
                                     9.626E-01  1.88E-01
                                     1.155E+00  1.88E-01
                                     1.347E+00  1.88E-01
  1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
                                     1.732E+00  2.80E-01
                                     1.925E+00  2.80E-01
                                     2.310E+00  2.93E-01
                                     2.502E+00  2.22E-01
                                     2.695E+00  1.88E-01
                                     2.887E+00  1.88E-01
  1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
                                     3.850E-01  1.30E-01
                                     5.775E-01  1.48E-01
                                     7.701E-01  1.61E-01
                                     9.626E-01  1.72E-01
                                     1.155E+00  1.86E-01
                                     1.347E+00  1.93E-01
  1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
                                     3.803E-01  1.80E-01
                                     5.705E-01  1.38E-01
                                     7.607E-01  1.32E-01
                                     2.282E+00  1.86E-01
                                     2.472E+00  1.98E-01
                                     2.662E+00  2.00E-01

Same data as above, but scan function fails.

dat <- read.table(textConnection("  1    1.00E+00  1.24E+03  7.79E+00
 1.925E-01  1.88E-01
                                     3.850E-01  1.88E-01
                                     5.775E-01  1.88E-01
                                     7.700E-01  1.88E-01
                                     9.626E-01  1.88E-01
                                     1.155E+00  1.88E-01
                                     1.347E+00  1.88E-01
  1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
                                     1.732E+00  2.80E-01
                                     1.925E+00  2.80E-01
                                     2.310E+00  2.93E-01
                                     2.502E+00  2.22E-01
                                     2.695E+00  1.88E-01
                                     2.887E+00  1.88E-01
  1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
                                     3.850E-01  1.30E-01
                                     5.775E-01  1.48E-01
                                     7.701E-01  1.61E-01
                                     9.626E-01  1.72E-01
                                     1.155E+00  1.86E-01
                                     1.347E+00  1.93E-01
  1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
                                     3.803E-01  1.80E-01
                                     5.705E-01  1.38E-01
                                     7.607E-01  1.32E-01
                                     2.282E+00  1.86E-01
                                     2.472E+00  1.98E-01
                                     2.662E+00  2.00E-01"),header=FALSE)

	[[alternative HTML version deleted]]

Peter Langfelder

2016-Nov-11 04:53 UTC

head link

[R] Gobbling up a repeating, irregular list of data

It's not clear whether your numbers are tab or space-separated, I will
assume space-separated. My lowtech (and not R) solution would be to
dump the output into a text file (call it data.in), then run a sed
command to first replace two initial spaces from each line, then
replace initial spaces with 4 (if I count correctly) tabs, then
replace all contiguous blocks of spaces by tabs, something like

sed 's/^  //' data.in | sed 's/^  */\t\t\t\t/' | sed 's/ 
*/\t/g' > data.txt

This should produce a regular 6-column table that should be readable
using standard read.delim or read.table. You will then have figure out
how to deal with the empty cells in R.

Peter

On Thu, Nov 10, 2016 at 8:26 PM, Morway, Eric <emorway at usgs.gov>
wrote:> What would be the sophisticated R method for reading the data shown below
> into a list?  The data is output from a numerical model.  Pasting the
> second block of example R commands (at the end of the message) results in a
> failure ("Error in scan...line 2 did not have 6 elements").  I no
doubt
> could cobble together some script for reading line-by-line using for loops,
> and then appending vectors with values from each line, but this strikes me
> as bad form.
>
> One final note, the lines with 6 values contain important values that
> should somehow remain associated with the data appearing in columns 5 &
6
> (the continuous data).  The first value, which is always 1, can be
> discarded, but the second value on these lines contain the time step number
> ("1.00E+00", "2.00E+00", etc.), the 3rd and 4th values
are contain a depth
> and thickness, respectively. Columns 5 & 6 are a depth and water
content
> pairing and should be associated with the time steps.
>
> Thanks, Eric
>
> Start of example output data (Use of an R script to read in this data
below)
>
>   1    1.00E+00  1.24E+03  7.79E+00  1.925E-01  1.88E-01
>                                      3.850E-01  1.88E-01
>                                      5.775E-01  1.88E-01
>                                      7.700E-01  1.88E-01
>                                      9.626E-01  1.88E-01
>                                      1.155E+00  1.88E-01
>                                      1.347E+00  1.88E-01
>   1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
>                                      1.732E+00  2.80E-01
>                                      1.925E+00  2.80E-01
>                                      2.310E+00  2.93E-01
>                                      2.502E+00  2.22E-01
>                                      2.695E+00  1.88E-01
>                                      2.887E+00  1.88E-01
>   1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
>                                      3.850E-01  1.30E-01
>                                      5.775E-01  1.48E-01
>                                      7.701E-01  1.61E-01
>                                      9.626E-01  1.72E-01
>                                      1.155E+00  1.86E-01
>                                      1.347E+00  1.93E-01
>   1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
>                                      3.803E-01  1.80E-01
>                                      5.705E-01  1.38E-01
>                                      7.607E-01  1.32E-01
>                                      2.282E+00  1.86E-01
>                                      2.472E+00  1.98E-01
>                                      2.662E+00  2.00E-01
>
> Same data as above, but scan function fails.
>
> dat <- read.table(textConnection("  1    1.00E+00  1.24E+03 
7.79E+00
>  1.925E-01  1.88E-01
>                                      3.850E-01  1.88E-01
>                                      5.775E-01  1.88E-01
>                                      7.700E-01  1.88E-01
>                                      9.626E-01  1.88E-01
>                                      1.155E+00  1.88E-01
>                                      1.347E+00  1.88E-01
>   1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
>                                      1.732E+00  2.80E-01
>                                      1.925E+00  2.80E-01
>                                      2.310E+00  2.93E-01
>                                      2.502E+00  2.22E-01
>                                      2.695E+00  1.88E-01
>                                      2.887E+00  1.88E-01
>   1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
>                                      3.850E-01  1.30E-01
>                                      5.775E-01  1.48E-01
>                                      7.701E-01  1.61E-01
>                                      9.626E-01  1.72E-01
>                                      1.155E+00  1.86E-01
>                                      1.347E+00  1.93E-01
>   1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
>                                      3.803E-01  1.80E-01
>                                      5.705E-01  1.38E-01
>                                      7.607E-01  1.32E-01
>                                      2.282E+00  1.86E-01
>                                      2.472E+00  1.98E-01
>                                      2.662E+00 
2.00E-01"),header=FALSE)
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

MacQueen, Don

2016-Nov-11 15:58 UTC

head link

[R] Gobbling up a repeating, irregular list of data

Like Peter, I too will assume that all the white space consists of space
characters, not tabs.

In that case, I would probably start with read.fwf().
I would expect that to get me a data frame with lots of NA in the first
four columns. Then (also like Peter says) you'll have to figure out how to
fill the empty cells.

By the way, I wouldn't worry too much about using "bad form." If
it works,
would be reasonably easy for someone else looking at your code to
understand
(or for you to understand 5 years from now), and runs fast enough,
that's good enough. But I do appreciate the satisfaction of doing
something "the R way."


Here's another way:

dat <- scan(textConnection("  1    1.00E+00  1.24E+03  7.79E+00 
1.925E-01
 1.88E-01
                                     3.850E-01  1.88E-01
                                     5.775E-01  1.88E-01
                                     7.700E-01  1.88E-01
                                     9.626E-01  1.88E-01
                                     1.155E+00  1.88E-01
                                     1.347E+00  1.88E-01
  1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
                                     1.732E+00  2.80E-01
                                     1.925E+00  2.80E-01
                                     2.310E+00  2.93E-01
                                     2.502E+00  2.22E-01
                                     2.695E+00  1.88E-01
                                     2.887E+00  1.88E-01
  1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
                                     3.850E-01  1.30E-01
                                     5.775E-01  1.48E-01
                                     7.701E-01  1.61E-01
                                     9.626E-01  1.72E-01
                                     1.155E+00  1.86E-01
                                     1.347E+00  1.93E-01
  1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
                                     3.803E-01  1.80E-01
                                     5.705E-01  1.38E-01
                                     7.607E-01  1.32E-01
                                     2.282E+00  1.86E-01
                                     2.472E+00  1.98E-01
                                     2.662E+00  2.00E-01"),
  what=list(0,0,0,0,0,0),fill=TRUE
  )
datf <- do.call(cbind, dat)

Then in datf you just have to move the first 2 columns over to be the last
two, in rows where there are missing values, and then fill in the missing
values in the first four columns from the non-missing values above them.



-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 11/10/16, 8:26 PM, "R-help on behalf of Morway, Eric"
<r-help-bounces at r-project.org on behalf of emorway at usgs.gov> wrote:
>What would be the sophisticated R method for reading the data shown below
>into a list?  The data is output from a numerical model.  Pasting the
>second block of example R commands (at the end of the message) results in
>a
>failure ("Error in scan...line 2 did not have 6 elements").  I no
doubt
>could cobble together some script for reading line-by-line using for
>loops,
>and then appending vectors with values from each line, but this strikes me
>as bad form.
>
>One final note, the lines with 6 values contain important values that
>should somehow remain associated with the data appearing in columns 5 &
6
>(the continuous data).  The first value, which is always 1, can be
>discarded, but the second value on these lines contain the time step
>number
>("1.00E+00", "2.00E+00", etc.), the 3rd and 4th values
are contain a depth
>and thickness, respectively. Columns 5 & 6 are a depth and water content
>pairing and should be associated with the time steps.
>
>Thanks, Eric
>
>Start of example output data (Use of an R script to read in this data
>below)
>
>  1    1.00E+00  1.24E+03  7.79E+00  1.925E-01  1.88E-01
>                                     3.850E-01  1.88E-01
>                                     5.775E-01  1.88E-01
>                                     7.700E-01  1.88E-01
>                                     9.626E-01  1.88E-01
>                                     1.155E+00  1.88E-01
>                                     1.347E+00  1.88E-01
>  1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
>                                     1.732E+00  2.80E-01
>                                     1.925E+00  2.80E-01
>                                     2.310E+00  2.93E-01
>                                     2.502E+00  2.22E-01
>                                     2.695E+00  1.88E-01
>                                     2.887E+00  1.88E-01
>  1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
>                                     3.850E-01  1.30E-01
>                                     5.775E-01  1.48E-01
>                                     7.701E-01  1.61E-01
>                                     9.626E-01  1.72E-01
>                                     1.155E+00  1.86E-01
>                                     1.347E+00  1.93E-01
>  1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
>                                     3.803E-01  1.80E-01
>                                     5.705E-01  1.38E-01
>                                     7.607E-01  1.32E-01
>                                     2.282E+00  1.86E-01
>                                     2.472E+00  1.98E-01
>                                     2.662E+00  2.00E-01
>
>Same data as above, but scan function fails.
>
>dat <- read.table(textConnection("  1    1.00E+00  1.24E+03 
7.79E+00
> 1.925E-01  1.88E-01
>                                     3.850E-01  1.88E-01
>                                     5.775E-01  1.88E-01
>                                     7.700E-01  1.88E-01
>                                     9.626E-01  1.88E-01
>                                     1.155E+00  1.88E-01
>                                     1.347E+00  1.88E-01
>  1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
>                                     1.732E+00  2.80E-01
>                                     1.925E+00  2.80E-01
>                                     2.310E+00  2.93E-01
>                                     2.502E+00  2.22E-01
>                                     2.695E+00  1.88E-01
>                                     2.887E+00  1.88E-01
>  1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
>                                     3.850E-01  1.30E-01
>                                     5.775E-01  1.48E-01
>                                     7.701E-01  1.61E-01
>                                     9.626E-01  1.72E-01
>                                     1.155E+00  1.86E-01
>                                     1.347E+00  1.93E-01
>  1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
>                                     3.803E-01  1.80E-01
>                                     5.705E-01  1.38E-01
>                                     7.607E-01  1.32E-01
>                                     2.282E+00  1.86E-01
>                                     2.472E+00  1.98E-01
>                                     2.662E+00 
2.00E-01"),header=FALSE)
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

R help - Nov 2016 - Gobbling up a repeating, irregular list of data

[R] Gobbling up a repeating, irregular list of data

[R] Gobbling up a repeating, irregular list of data

[R] Gobbling up a repeating, irregular list of data