useRs-
The output generated from a groundwater model post-processor contains
millions of lines of text. Using the custom R function shown below, I can
quickly gather values from this file.
As you can see in the textConnection provided below (which is only a small
snippet from the file), the output is repetitive but does have some header
lines I hope to make use of to narrow the collected output. The header
lines I'm speaking of are:
1) " Flow Budget for Zone 1 at Time Step 1 of Stress Period
2"
2) " Flow Budget for Zone 2 at Time Step 1 of Stress Period
2"
3) " Flow Budget for Zone 3 at Time Step 1 of Stress Period
2"
4) " Flow Budget for Zone 1 at Time Step 1 of Stress Period
3"
...
and so on for 111 different "zones" as well as 575 distinct
"stress
periods". In the custom function that follows, currently named
"g", I can
collect all values of "Recharge". If instead I want to restrict the
collected "Recharge" values to "Zone 2" for all 575 stress
periods, is
there a way to first look for the header "Flow Budget for Zone 2",
collect
only the next two values of Recharge, and then skip down to the next header
containing "Zone 2", collect 2 more values of "Recharge",
and on like this
to the end? 'Peeling' out targeted flow budget terms will facilitate
generation of budget-specific plots through time.
The "edm" variable at the end of the R code that follows currently
looks
like this:
edm
# [1] 1.28980e+05 0.00000e+00 *2.74161e-01* 0.00000e+00 8.10840e+04
0.00000e+00
# [7] 1.28980e+05 0.00000e+00 *2.74165e-01* 0.00000e+00 8.10840e+04
0.00000e+00
but with the proposed revision, which only collects Recharge values from
Zone 2, it would look like:
edm
# [1] *2.74161e-01* 0.00000e+00 *2.74165e-01* 0.00000e+00
txt_con<-textConnection(" mark_zone
Flow Budget for Zone 1 at Time Step 1 of Stress Period 2
-------------------------------------------------------------
Budget Term Flow (L**3/T)
-----------------------------
IN:
---
STORAGE = 0.37855E-02
CONSTANT HEAD = 0.0000
RECHARGE = 0.12898E+06
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 16 to 1 = 0.0000
Zone 31 to 1 = 0.0000
Zone 40 to 1 = 0.0000
Zone 91 to 1 = 0.0000
Total IN = 0.12898E+06
OUT:
----
STORAGE = 0.58275E-04
CONSTANT HEAD = 0.0000
RECHARGE = 0.0000
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 1 to 16 = 399.88
Zone 1 to 31 = 85204.
Zone 1 to 40 = 12404.
Zone 1 to 91 = 30968.
Total OUT = 0.12898E+06
IN - OUT = 0.14138E-03
Percent Discrepancy = 0.00
1
mark_zone
Flow Budget for Zone 2 at Time Step 1 of Stress Period 2
-------------------------------------------------------------
Budget Term Flow (L**3/T)
-----------------------------
IN:
---
STORAGE = 0.18833E-05
CONSTANT HEAD = 0.0000
RECHARGE = 0.274161E+06
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 15 to 2 = 0.0000
Zone 31 to 2 = 0.0000
Zone 91 to 2 = 13134.
Total IN = 0.28729E+06
OUT:
----
STORAGE = 0.10823E-04
CONSTANT HEAD = 0.0000
RECHARGE = 0.0000
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 2 to 15 = 6812.7
Zone 2 to 31 = 0.20820E+06
Zone 2 to 91 = 72274.
Total OUT = 0.28729E+06
IN - OUT = 0.58504E-02
Percent Discrepancy = 0.00
1
mark_zone
Flow Budget for Zone 3 at Time Step 1 of Stress Period 2
-------------------------------------------------------------
Budget Term Flow (L**3/T)
-----------------------------
IN:
---
STORAGE = 0.84894E-04
CONSTANT HEAD = 0.0000
RECHARGE = 81084.
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 31 to 3 = 0.0000
Zone 91 to 3 = 1234.9
Total IN = 82319.
OUT:
----
STORAGE = 0.0000
CONSTANT HEAD = 0.0000
RECHARGE = 0.0000
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 3 to 31 = 53937.
Zone 3 to 91 = 28382.
Total OUT = 82319.
IN - OUT = 0.81732E-03
Percent Discrepancy = 0.00
1
mark_zone
Flow Budget for Zone 1 at Time Step 1 of Stress Period 3
-------------------------------------------------------------
Budget Term Flow (L**3/T)
-----------------------------
IN:
---
STORAGE = 0.15770E-04
CONSTANT HEAD = 0.0000
RECHARGE = 0.12898E+06
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 16 to 1 = 0.0000
Zone 31 to 1 = 0.0000
Zone 40 to 1 = 0.0000
Zone 91 to 1 = 0.0000
Total IN = 0.12898E+06
OUT:
----
STORAGE = 0.38262E-02
CONSTANT HEAD = 0.0000
RECHARGE = 0.0000
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 1 to 16 = 399.88
Zone 1 to 31 = 85214.
Zone 1 to 40 = 12405.
Zone 1 to 91 = 30958.
Total OUT = 0.12898E+06
IN - OUT = 0.88928E-03
Percent Discrepancy = 0.00
1
mark_zone
Flow Budget for Zone 2 at Time Step 1 of Stress Period 3
-------------------------------------------------------------
Budget Term Flow (L**3/T)
-----------------------------
IN:
---
STORAGE = 0.0000
CONSTANT HEAD = 0.0000
RECHARGE = 0.274165E+06
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 15 to 2 = 0.0000
Zone 31 to 2 = 0.0000
Zone 91 to 2 = 13215.
Total IN = 0.28737E+06
OUT:
----
STORAGE = 0.27267E-02
CONSTANT HEAD = 0.0000
RECHARGE = 0.0000
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 2 to 15 = 6813.6
Zone 2 to 31 = 0.20827E+06
Zone 2 to 91 = 72291.
Total OUT = 0.28737E+06
IN - OUT = 0.69125E-03
Percent Discrepancy = 0.00
1
mark_zone
Flow Budget for Zone 3 at Time Step 1 of Stress Period 3
-------------------------------------------------------------
Budget Term Flow (L**3/T)
-----------------------------
IN:
---
STORAGE = 0.0000
CONSTANT HEAD = 0.0000
RECHARGE = 81084.
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 31 to 3 = 0.0000
Zone 91 to 3 = 1262.7
Total IN = 82346.
OUT:
----
STORAGE = 0.18113E-03
CONSTANT HEAD = 0.0000
RECHARGE = 0.0000
STREAM LEAKAGE = 0.0000
LAKE SEEPAGE = 0.0000
UZF ET = 0.0000
GW ET = 0.0000
UZF INFILTR. = 0.0000
SFR-DIV. INFLTR. = 0.0000
UZF RECHARGE = 0.0000
SURFACE LEAKAGE = 0.0000
Zone 3 to 31 = 53843.
Zone 3 to 91 = 28503.
Total OUT = 82346.
IN - OUT = -0.14018E-02
Percent Discrepancy = 0.00
")
g<-function(txt_con, string, from, to, ...) {
L <- readLines(txt_con)
matched <- grep(string, L, value = TRUE, ...)
as.numeric(substring(matched, from, to))
}
#Now, strip out values
edm<-g(txt_con, " RECHARGE =", 37, 50)
[[alternative HTML version deleted]]
I am not sure what happened but it may be that you accidentally sent your message in html but the text connection data seems unuseable. It is much better to use ?dput to supply sample data. Have a look at http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example John Kane Kingston ON Canada> -----Original Message----- > From: emorway at usgs.gov > Sent: Wed, 21 Aug 2013 06:50:07 -0700 > To: r-help at r-project.org > Subject: [R] Narrowing values collected from .txt file > > useRs- > > The output generated from a groundwater model post-processor contains > millions of lines of text. Using the custom R function shown below, I > can > quickly gather values from this file. > > As you can see in the textConnection provided below (which is only a > small > snippet from the file), the output is repetitive but does have some > header > lines I hope to make use of to narrow the collected output. The header > lines I'm speaking of are: > 1) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 2" > 2) " Flow Budget for Zone 2 at Time Step 1 of Stress Period 2" > 3) " Flow Budget for Zone 3 at Time Step 1 of Stress Period 2" > 4) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 3" > ... > > and so on for 111 different "zones" as well as 575 distinct "stress > periods". In the custom function that follows, currently named "g", I > can > collect all values of "Recharge". If instead I want to restrict the > collected "Recharge" values to "Zone 2" for all 575 stress periods, is > there a way to first look for the header "Flow Budget for Zone 2", > collect > only the next two values of Recharge, and then skip down to the next > header > containing "Zone 2", collect 2 more values of "Recharge", and on like > this > to the end? 'Peeling' out targeted flow budget terms will facilitate > generation of budget-specific plots through time. > > The "edm" variable at the end of the R code that follows currently looks > like this: > edm > # [1] 1.28980e+05 0.00000e+00 *2.74161e-01* 0.00000e+00 8.10840e+04 > 0.00000e+00 > # [7] 1.28980e+05 0.00000e+00 *2.74165e-01* 0.00000e+00 8.10840e+04 > 0.00000e+00 > > but with the proposed revision, which only collects Recharge values from > Zone 2, it would look like: > edm > # [1] *2.74161e-01* 0.00000e+00 *2.74165e-01* 0.00000e+00 > > > txt_con<-textConnection(" mark_zone > > > Flow Budget for Zone 1 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.37855E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.12898E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 16 to 1 = 0.0000 > Zone 31 to 1 = 0.0000 > Zone 40 to 1 = 0.0000 > Zone 91 to 1 = 0.0000 > > Total IN = 0.12898E+06 > > OUT: > ---- > STORAGE = 0.58275E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 1 to 16 = 399.88 > Zone 1 to 31 = 85204. > Zone 1 to 40 = 12404. > Zone 1 to 91 = 30968. > > Total OUT = 0.12898E+06 > > IN - OUT = 0.14138E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 2 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.18833E-05 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.274161E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 15 to 2 = 0.0000 > Zone 31 to 2 = 0.0000 > Zone 91 to 2 = 13134. > > Total IN = 0.28729E+06 > > OUT: > ---- > STORAGE = 0.10823E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 2 to 15 = 6812.7 > Zone 2 to 31 = 0.20820E+06 > Zone 2 to 91 = 72274. > > Total OUT = 0.28729E+06 > > IN - OUT = 0.58504E-02 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 3 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.84894E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 81084. > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 31 to 3 = 0.0000 > Zone 91 to 3 = 1234.9 > > Total IN = 82319. > > OUT: > ---- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 3 to 31 = 53937. > Zone 3 to 91 = 28382. > > Total OUT = 82319. > > IN - OUT = 0.81732E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 1 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.15770E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.12898E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 16 to 1 = 0.0000 > Zone 31 to 1 = 0.0000 > Zone 40 to 1 = 0.0000 > Zone 91 to 1 = 0.0000 > > Total IN = 0.12898E+06 > > OUT: > ---- > STORAGE = 0.38262E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 1 to 16 = 399.88 > Zone 1 to 31 = 85214. > Zone 1 to 40 = 12405. > Zone 1 to 91 = 30958. > > Total OUT = 0.12898E+06 > > IN - OUT = 0.88928E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 2 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.274165E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 15 to 2 = 0.0000 > Zone 31 to 2 = 0.0000 > Zone 91 to 2 = 13215. > > Total IN = 0.28737E+06 > > OUT: > ---- > STORAGE = 0.27267E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 2 to 15 = 6813.6 > Zone 2 to 31 = 0.20827E+06 > Zone 2 to 91 = 72291. > > Total OUT = 0.28737E+06 > > IN - OUT = 0.69125E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 3 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 81084. > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 31 to 3 = 0.0000 > Zone 91 to 3 = 1262.7 > > Total IN = 82346. > > OUT: > ---- > STORAGE = 0.18113E-03 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 3 to 31 = 53843. > Zone 3 to 91 = 28503. > > Total OUT = 82346. > > IN - OUT = -0.14018E-02 > > Percent Discrepancy = 0.00 > ") > > > g<-function(txt_con, string, from, to, ...) { > L <- readLines(txt_con) > matched <- grep(string, L, value = TRUE, ...) > as.numeric(substring(matched, from, to)) > } > > #Now, strip out values > edm<-g(txt_con, " RECHARGE =", 37, 50) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
A relatively concise, commented, working solution to the problem
originally motivating
this thread was found (below). I suspect the approach I've taken has a
major inefficiency through the use of the "scan" statement appearing
inside
the function "g". The way the code works right now, it has to re-open
and
read the file 'length(matched) times' rather than sequentially reading
through to the next pertinent section of the txt file. Does anyone have a
more efficient approach in mind so I don't have to wait 1/2 hour to get the
results? (The only adjustment to the code that follows is to point
"txt" to
wherever the attached file is placed)
#The file that the code works on is attached as: MCR_Budgets.zip (76MB
uncompressed)
# where is the file? (original dat file is ~147MB, only half of this file
is attached)
txt<-"c:/temp/MCR_Budgets.txt"
# Demarcation header to narrow list of retrieved 'Recharge' values
hdr_str<-"Flow Budget for Zone 2"
# string to identify lines with desired values
srch_str<-" RECHARGE ="
# retrieves desired values
g<-function(txt_con, hdr_str, srch_str, from, to, ...) {
L <- readLines(txt_con)
#matched contains the line #s w/ hdr_str
matched <- grep(hdr_str, L, value = FALSE, ...)
#initialize output list
fetched_list<-numeric()
#for each instance of hdr_str, loop
for(i in 1:(length(matched))){
#retrieve a section of text following each hdr_str, suspect this
is highly
inefficient!!!
snippet<-scan(txt_con, what=character(), skip=matched[i]-1, n=42,
sep='\n')
#get the two lines containing 'srch_str' within the short
section of retrieved
text
fetched <- grep(srch_str, snippet, value=TRUE)
#append output vector for plotting time series
fetched_list <- c(fetched_list, as.numeric(substring(fetched, from,
to)))
#monitor
print(i)
}
#return desired values
as.numeric(fetched_list)
}
#The results of system.time reflect the fact the function was run on the full
147 MB file,
# only half of which is attached.
system.time(
rech_z2<-g(txt,hdr_str,srch_str,37,51)
)
# user system elapsed
#1740.48 36.08 1825.77
On Wed, Aug 21, 2013 at 6:50 AM, Morway, Eric
>
> The output generated from a groundwater model post-processor contains
> millions of lines of text. Using the custom R function shown below, I
> can quickly gather values from this file.
>
> As you can see in the textConnection provided below (which is only a
> small snippet from the file), the output is repetitive but does have some
> header lines I hope to make use of to narrow the collected output. The
> header lines I'm speaking of are:
> 1) " Flow Budget for Zone 1 at Time Step 1 of Stress Period
2"
> 2) " Flow Budget for Zone 2 at Time Step 1 of Stress Period
2"
> 3) " Flow Budget for Zone 3 at Time Step 1 of Stress Period
2"
> 4) " Flow Budget for Zone 1 at Time Step 1 of Stress Period
3"
> ...
>
> and so on for 111 different "zones" as well as 575 distinct
"stress
> periods". In the custom function that follows, currently named
"g", I can
> collect all values of "Recharge". If instead I want to restrict
the
> collected "Recharge" values to "Zone 2" for all 575
stress periods, is
> there a way to first look for the header "Flow Budget for Zone
2", collect
> only the next two values of Recharge, and then skip down to the next
> header containing "Zone 2", collect 2 more values of
"Recharge", and on
> like this to the end? 'Peeling' out targeted flow budget terms
will
> facilitate generation of budget-specific plots through time.
>
> The "edm" variable at the end of the R code that follows
currently looks
> like this:
> edm
> # [1] 1.28980e+05 0.00000e+00 *2.74161e-01* 0.00000e+00 8.10840e+04
> 0.00000e+00
> # [7] 1.28980e+05 0.00000e+00 *2.74165e-01* 0.00000e+00 8.10840e+04
> 0.00000e+00
>
> but with the proposed revision, which only collects Recharge values from
> Zone 2, it would look like:
> edm
> # [1] *2.74161e-01* 0.00000e+00 *2.74165e-01* 0.00000e+00
>
>
> txt_con<-textConnection(" mark_zone
>
>
> Flow Budget for Zone 1 at Time Step 1 of Stress Period 2
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.37855E-02
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.12898E+06
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 16 to 1 = 0.0000
> Zone 31 to 1 = 0.0000
> Zone 40 to 1 = 0.0000
> Zone 91 to 1 = 0.0000
>
> Total IN = 0.12898E+06
>
> OUT:
> ----
> STORAGE = 0.58275E-04
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 1 to 16 = 399.88
> Zone 1 to 31 = 85204.
> Zone 1 to 40 = 12404.
> Zone 1 to 91 = 30968.
>
> Total OUT = 0.12898E+06
>
> IN - OUT = 0.14138E-03
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 2 at Time Step 1 of Stress Period 2
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.18833E-05
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.274161E+06
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 15 to 2 = 0.0000
> Zone 31 to 2 = 0.0000
> Zone 91 to 2 = 13134.
>
> Total IN = 0.28729E+06
>
> OUT:
> ----
> STORAGE = 0.10823E-04
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 2 to 15 = 6812.7
> Zone 2 to 31 = 0.20820E+06
> Zone 2 to 91 = 72274.
>
> Total OUT = 0.28729E+06
>
> IN - OUT = 0.58504E-02
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 3 at Time Step 1 of Stress Period 2
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.84894E-04
> CONSTANT HEAD = 0.0000
> RECHARGE = 81084.
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 31 to 3 = 0.0000
> Zone 91 to 3 = 1234.9
>
> Total IN = 82319.
>
> OUT:
> ----
> STORAGE = 0.0000
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 3 to 31 = 53937.
> Zone 3 to 91 = 28382.
>
> Total OUT = 82319.
>
> IN - OUT = 0.81732E-03
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 1 at Time Step 1 of Stress Period 3
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.15770E-04
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.12898E+06
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 16 to 1 = 0.0000
> Zone 31 to 1 = 0.0000
> Zone 40 to 1 = 0.0000
> Zone 91 to 1 = 0.0000
>
> Total IN = 0.12898E+06
>
> OUT:
> ----
> STORAGE = 0.38262E-02
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 1 to 16 = 399.88
> Zone 1 to 31 = 85214.
> Zone 1 to 40 = 12405.
> Zone 1 to 91 = 30958.
>
> Total OUT = 0.12898E+06
>
> IN - OUT = 0.88928E-03
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 2 at Time Step 1 of Stress Period 3
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.0000
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.274165E+06
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 15 to 2 = 0.0000
> Zone 31 to 2 = 0.0000
> Zone 91 to 2 = 13215.
>
> Total IN = 0.28737E+06
>
> OUT:
> ----
> STORAGE = 0.27267E-02
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 2 to 15 = 6813.6
> Zone 2 to 31 = 0.20827E+06
> Zone 2 to 91 = 72291.
>
> Total OUT = 0.28737E+06
>
> IN - OUT = 0.69125E-03
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 3 at Time Step 1 of Stress Period 3
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.0000
> CONSTANT HEAD = 0.0000
> RECHARGE = 81084.
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 31 to 3 = 0.0000
> Zone 91 to 3 = 1262.7
>
> Total IN = 82346.
>
> OUT:
> ----
> STORAGE = 0.18113E-03
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 3 to 31 = 53843.
> Zone 3 to 91 = 28503.
>
> Total OUT = 82346.
>
> IN - OUT = -0.14018E-02
>
> Percent Discrepancy = 0.00
> ")
>
>
> g<-function(txt_con, string, from, to, ...) {
> L <- readLines(txt_con)
> matched <- grep(string, L, value = TRUE, ...)
> as.numeric(substring(matched, from, to))
> }
>
> #Now, strip out values
> edm<-g(txt_con, " RECHARGE =", 37, 50)
>
>
>
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ---------- Forwarded message ---------- From: jim holtman <jholtman at gmail.com> Date: Thu, Aug 29, 2013 at 8:43 AM Subject: Re: [R] Narrowing values collected from .txt file To: "Morway, Eric" <emorway at usgs.gov> FYI, I duped your data to 100MB file and it took less that 10 seconds to process. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Aug 28, 2013 at 7:45 PM, Morway, Eric <emorway at usgs.gov> wrote:> It looks as though the attachment to my last post didn't make the cut (or > at least it's not appearing on the Nabble forum), for one reason or > another. I'm reattaching a smaller version so folks can run the code > (won't work without a text file to operate on). So, while the attached > file is only a small sample of the larger file and will therefore run > quickly, I would still be helpful if someone knows a more efficient > approach to the code in the previous post. > > > On Wed, Aug 28, 2013 at 11:28 AM, > >> >> A relatively concise, commented, working solution to the problem >> originally motivating this thread was found (below). I suspect the >> approach I've taken has a major inefficiency through the use of the >> "scan" statement appearing inside the function "g". The way the code >> works right now, it has to re-open and read the file 'length(matched) >> times' rather than sequentially reading through to the next pertinent >> section of the txt file. Does anyone have a more efficient approach in >> mind so I don't have to wait 1/2 hour to get the results? (The only >> adjustment to the code that follows is to point "txt" to wherever the >> attached file is placed) >> >> >> # where is the file? >> txt<-"c:/temp/MCR_Budgets.txt" >> >> # Demarcation header >> hdr_str<-"Flow Budget for Zone 2" >> >> # string to identify lines with desired values >> srch_str<-" RECHARGE =" >> >> # retrieves desired values >> g<-function(txt_con, hdr_str, srch_str, from, to, ...) { >> >> L <- readLines(txt_con) >> >> #matched contains the line #s w/ hdr_str >> matched <- grep(hdr_str, L, value = FALSE, ...) >> >> #initialize output list >> fetched_list<-numeric() >> >> #for each instance of hdr_str, loop >> for(i in 1:(length(matched))){ >> >> #retrieve a section of text following each hdr_str >> snippet<-scan(txt_con, what=character(), skip=matched[i]-1, n=42, >> sep='\n') >> >> #get data within the short section of retrieved text >> fetched <- grep(srch_str, snippet, value=TRUE) >> >> #append output vector for plotting time series >> fetched_list <- c(fetched_list, as.numeric(substring(fetched, from, >> to))) >> >> #monitor >> print(i) >> } >> >> #return desired values >> as.numeric(fetched_list) >> } >> >> #The results of system.time reflect full 147 MB file, >> # only half of which is attached. >> system.time( >> rech_z2<-g(txt,hdr_str,srch_str,37,51) >> ) >> # user system elapsed >> #1740.48 36.08 1825.77 >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >