What would be the sophisticated R method for reading the data shown below into a list? The data is output from a numerical model. Pasting the second block of example R commands (at the end of the message) results in a failure ("Error in scan...line 2 did not have 6 elements"). I no doubt could cobble together some script for reading line-by-line using for loops, and then appending vectors with values from each line, but this strikes me as bad form. One final note, the lines with 6 values contain important values that should somehow remain associated with the data appearing in columns 5 & 6 (the continuous data). The first value, which is always 1, can be discarded, but the second value on these lines contain the time step number ("1.00E+00", "2.00E+00", etc.), the 3rd and 4th values are contain a depth and thickness, respectively. Columns 5 & 6 are a depth and water content pairing and should be associated with the time steps. Thanks, Eric Start of example output data (Use of an R script to read in this data below) 1 1.00E+00 1.24E+03 7.79E+00 1.925E-01 1.88E-01 3.850E-01 1.88E-01 5.775E-01 1.88E-01 7.700E-01 1.88E-01 9.626E-01 1.88E-01 1.155E+00 1.88E-01 1.347E+00 1.88E-01 1 2.00E+00 1.26E+03 7.80E+00 1.925E-01 2.80E-01 1.732E+00 2.80E-01 1.925E+00 2.80E-01 2.310E+00 2.93E-01 2.502E+00 2.22E-01 2.695E+00 1.88E-01 2.887E+00 1.88E-01 1 3.00E+00 1.28E+03 7.70E+00 1.925E-01 1.03E-01 3.850E-01 1.30E-01 5.775E-01 1.48E-01 7.701E-01 1.61E-01 9.626E-01 1.72E-01 1.155E+00 1.86E-01 1.347E+00 1.93E-01 1 4.00E+00 1.29E+03 7.60E+00 1.901E-01 1.80E-01 3.803E-01 1.80E-01 5.705E-01 1.38E-01 7.607E-01 1.32E-01 2.282E+00 1.86E-01 2.472E+00 1.98E-01 2.662E+00 2.00E-01 Same data as above, but scan function fails. dat <- read.table(textConnection(" 1 1.00E+00 1.24E+03 7.79E+00 1.925E-01 1.88E-01 3.850E-01 1.88E-01 5.775E-01 1.88E-01 7.700E-01 1.88E-01 9.626E-01 1.88E-01 1.155E+00 1.88E-01 1.347E+00 1.88E-01 1 2.00E+00 1.26E+03 7.80E+00 1.925E-01 2.80E-01 1.732E+00 2.80E-01 1.925E+00 2.80E-01 2.310E+00 2.93E-01 2.502E+00 2.22E-01 2.695E+00 1.88E-01 2.887E+00 1.88E-01 1 3.00E+00 1.28E+03 7.70E+00 1.925E-01 1.03E-01 3.850E-01 1.30E-01 5.775E-01 1.48E-01 7.701E-01 1.61E-01 9.626E-01 1.72E-01 1.155E+00 1.86E-01 1.347E+00 1.93E-01 1 4.00E+00 1.29E+03 7.60E+00 1.901E-01 1.80E-01 3.803E-01 1.80E-01 5.705E-01 1.38E-01 7.607E-01 1.32E-01 2.282E+00 1.86E-01 2.472E+00 1.98E-01 2.662E+00 2.00E-01"),header=FALSE) [[alternative HTML version deleted]]
Peter Langfelder
2016-Nov-11 04:53 UTC
[R] Gobbling up a repeating, irregular list of data
It's not clear whether your numbers are tab or space-separated, I will assume space-separated. My lowtech (and not R) solution would be to dump the output into a text file (call it data.in), then run a sed command to first replace two initial spaces from each line, then replace initial spaces with 4 (if I count correctly) tabs, then replace all contiguous blocks of spaces by tabs, something like sed 's/^ //' data.in | sed 's/^ */\t\t\t\t/' | sed 's/ */\t/g' > data.txt This should produce a regular 6-column table that should be readable using standard read.delim or read.table. You will then have figure out how to deal with the empty cells in R. Peter On Thu, Nov 10, 2016 at 8:26 PM, Morway, Eric <emorway at usgs.gov> wrote:> What would be the sophisticated R method for reading the data shown below > into a list? The data is output from a numerical model. Pasting the > second block of example R commands (at the end of the message) results in a > failure ("Error in scan...line 2 did not have 6 elements"). I no doubt > could cobble together some script for reading line-by-line using for loops, > and then appending vectors with values from each line, but this strikes me > as bad form. > > One final note, the lines with 6 values contain important values that > should somehow remain associated with the data appearing in columns 5 & 6 > (the continuous data). The first value, which is always 1, can be > discarded, but the second value on these lines contain the time step number > ("1.00E+00", "2.00E+00", etc.), the 3rd and 4th values are contain a depth > and thickness, respectively. Columns 5 & 6 are a depth and water content > pairing and should be associated with the time steps. > > Thanks, Eric > > Start of example output data (Use of an R script to read in this data below) > > 1 1.00E+00 1.24E+03 7.79E+00 1.925E-01 1.88E-01 > 3.850E-01 1.88E-01 > 5.775E-01 1.88E-01 > 7.700E-01 1.88E-01 > 9.626E-01 1.88E-01 > 1.155E+00 1.88E-01 > 1.347E+00 1.88E-01 > 1 2.00E+00 1.26E+03 7.80E+00 1.925E-01 2.80E-01 > 1.732E+00 2.80E-01 > 1.925E+00 2.80E-01 > 2.310E+00 2.93E-01 > 2.502E+00 2.22E-01 > 2.695E+00 1.88E-01 > 2.887E+00 1.88E-01 > 1 3.00E+00 1.28E+03 7.70E+00 1.925E-01 1.03E-01 > 3.850E-01 1.30E-01 > 5.775E-01 1.48E-01 > 7.701E-01 1.61E-01 > 9.626E-01 1.72E-01 > 1.155E+00 1.86E-01 > 1.347E+00 1.93E-01 > 1 4.00E+00 1.29E+03 7.60E+00 1.901E-01 1.80E-01 > 3.803E-01 1.80E-01 > 5.705E-01 1.38E-01 > 7.607E-01 1.32E-01 > 2.282E+00 1.86E-01 > 2.472E+00 1.98E-01 > 2.662E+00 2.00E-01 > > Same data as above, but scan function fails. > > dat <- read.table(textConnection(" 1 1.00E+00 1.24E+03 7.79E+00 > 1.925E-01 1.88E-01 > 3.850E-01 1.88E-01 > 5.775E-01 1.88E-01 > 7.700E-01 1.88E-01 > 9.626E-01 1.88E-01 > 1.155E+00 1.88E-01 > 1.347E+00 1.88E-01 > 1 2.00E+00 1.26E+03 7.80E+00 1.925E-01 2.80E-01 > 1.732E+00 2.80E-01 > 1.925E+00 2.80E-01 > 2.310E+00 2.93E-01 > 2.502E+00 2.22E-01 > 2.695E+00 1.88E-01 > 2.887E+00 1.88E-01 > 1 3.00E+00 1.28E+03 7.70E+00 1.925E-01 1.03E-01 > 3.850E-01 1.30E-01 > 5.775E-01 1.48E-01 > 7.701E-01 1.61E-01 > 9.626E-01 1.72E-01 > 1.155E+00 1.86E-01 > 1.347E+00 1.93E-01 > 1 4.00E+00 1.29E+03 7.60E+00 1.901E-01 1.80E-01 > 3.803E-01 1.80E-01 > 5.705E-01 1.38E-01 > 7.607E-01 1.32E-01 > 2.282E+00 1.86E-01 > 2.472E+00 1.98E-01 > 2.662E+00 2.00E-01"),header=FALSE) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Like Peter, I too will assume that all the white space consists of space characters, not tabs. In that case, I would probably start with read.fwf(). I would expect that to get me a data frame with lots of NA in the first four columns. Then (also like Peter says) you'll have to figure out how to fill the empty cells. By the way, I wouldn't worry too much about using "bad form." If it works, would be reasonably easy for someone else looking at your code to understand (or for you to understand 5 years from now), and runs fast enough, that's good enough. But I do appreciate the satisfaction of doing something "the R way." Here's another way: dat <- scan(textConnection(" 1 1.00E+00 1.24E+03 7.79E+00 1.925E-01 1.88E-01 3.850E-01 1.88E-01 5.775E-01 1.88E-01 7.700E-01 1.88E-01 9.626E-01 1.88E-01 1.155E+00 1.88E-01 1.347E+00 1.88E-01 1 2.00E+00 1.26E+03 7.80E+00 1.925E-01 2.80E-01 1.732E+00 2.80E-01 1.925E+00 2.80E-01 2.310E+00 2.93E-01 2.502E+00 2.22E-01 2.695E+00 1.88E-01 2.887E+00 1.88E-01 1 3.00E+00 1.28E+03 7.70E+00 1.925E-01 1.03E-01 3.850E-01 1.30E-01 5.775E-01 1.48E-01 7.701E-01 1.61E-01 9.626E-01 1.72E-01 1.155E+00 1.86E-01 1.347E+00 1.93E-01 1 4.00E+00 1.29E+03 7.60E+00 1.901E-01 1.80E-01 3.803E-01 1.80E-01 5.705E-01 1.38E-01 7.607E-01 1.32E-01 2.282E+00 1.86E-01 2.472E+00 1.98E-01 2.662E+00 2.00E-01"), what=list(0,0,0,0,0,0),fill=TRUE ) datf <- do.call(cbind, dat) Then in datf you just have to move the first 2 columns over to be the last two, in rows where there are missing values, and then fill in the missing values in the first four columns from the non-missing values above them. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 11/10/16, 8:26 PM, "R-help on behalf of Morway, Eric" <r-help-bounces at r-project.org on behalf of emorway at usgs.gov> wrote:>What would be the sophisticated R method for reading the data shown below >into a list? The data is output from a numerical model. Pasting the >second block of example R commands (at the end of the message) results in >a >failure ("Error in scan...line 2 did not have 6 elements"). I no doubt >could cobble together some script for reading line-by-line using for >loops, >and then appending vectors with values from each line, but this strikes me >as bad form. > >One final note, the lines with 6 values contain important values that >should somehow remain associated with the data appearing in columns 5 & 6 >(the continuous data). The first value, which is always 1, can be >discarded, but the second value on these lines contain the time step >number >("1.00E+00", "2.00E+00", etc.), the 3rd and 4th values are contain a depth >and thickness, respectively. Columns 5 & 6 are a depth and water content >pairing and should be associated with the time steps. > >Thanks, Eric > >Start of example output data (Use of an R script to read in this data >below) > > 1 1.00E+00 1.24E+03 7.79E+00 1.925E-01 1.88E-01 > 3.850E-01 1.88E-01 > 5.775E-01 1.88E-01 > 7.700E-01 1.88E-01 > 9.626E-01 1.88E-01 > 1.155E+00 1.88E-01 > 1.347E+00 1.88E-01 > 1 2.00E+00 1.26E+03 7.80E+00 1.925E-01 2.80E-01 > 1.732E+00 2.80E-01 > 1.925E+00 2.80E-01 > 2.310E+00 2.93E-01 > 2.502E+00 2.22E-01 > 2.695E+00 1.88E-01 > 2.887E+00 1.88E-01 > 1 3.00E+00 1.28E+03 7.70E+00 1.925E-01 1.03E-01 > 3.850E-01 1.30E-01 > 5.775E-01 1.48E-01 > 7.701E-01 1.61E-01 > 9.626E-01 1.72E-01 > 1.155E+00 1.86E-01 > 1.347E+00 1.93E-01 > 1 4.00E+00 1.29E+03 7.60E+00 1.901E-01 1.80E-01 > 3.803E-01 1.80E-01 > 5.705E-01 1.38E-01 > 7.607E-01 1.32E-01 > 2.282E+00 1.86E-01 > 2.472E+00 1.98E-01 > 2.662E+00 2.00E-01 > >Same data as above, but scan function fails. > >dat <- read.table(textConnection(" 1 1.00E+00 1.24E+03 7.79E+00 > 1.925E-01 1.88E-01 > 3.850E-01 1.88E-01 > 5.775E-01 1.88E-01 > 7.700E-01 1.88E-01 > 9.626E-01 1.88E-01 > 1.155E+00 1.88E-01 > 1.347E+00 1.88E-01 > 1 2.00E+00 1.26E+03 7.80E+00 1.925E-01 2.80E-01 > 1.732E+00 2.80E-01 > 1.925E+00 2.80E-01 > 2.310E+00 2.93E-01 > 2.502E+00 2.22E-01 > 2.695E+00 1.88E-01 > 2.887E+00 1.88E-01 > 1 3.00E+00 1.28E+03 7.70E+00 1.925E-01 1.03E-01 > 3.850E-01 1.30E-01 > 5.775E-01 1.48E-01 > 7.701E-01 1.61E-01 > 9.626E-01 1.72E-01 > 1.155E+00 1.86E-01 > 1.347E+00 1.93E-01 > 1 4.00E+00 1.29E+03 7.60E+00 1.901E-01 1.80E-01 > 3.803E-01 1.80E-01 > 5.705E-01 1.38E-01 > 7.607E-01 1.32E-01 > 2.282E+00 1.86E-01 > 2.472E+00 1.98E-01 > 2.662E+00 2.00E-01"),header=FALSE) > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.