useRs- The output generated from a groundwater model post-processor contains millions of lines of text. Using the custom R function shown below, I can quickly gather values from this file. As you can see in the textConnection provided below (which is only a small snippet from the file), the output is repetitive but does have some header lines I hope to make use of to narrow the collected output. The header lines I'm speaking of are: 1) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 2" 2) " Flow Budget for Zone 2 at Time Step 1 of Stress Period 2" 3) " Flow Budget for Zone 3 at Time Step 1 of Stress Period 2" 4) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 3" ... and so on for 111 different "zones" as well as 575 distinct "stress periods". In the custom function that follows, currently named "g", I can collect all values of "Recharge". If instead I want to restrict the collected "Recharge" values to "Zone 2" for all 575 stress periods, is there a way to first look for the header "Flow Budget for Zone 2", collect only the next two values of Recharge, and then skip down to the next header containing "Zone 2", collect 2 more values of "Recharge", and on like this to the end? 'Peeling' out targeted flow budget terms will facilitate generation of budget-specific plots through time. The "edm" variable at the end of the R code that follows currently looks like this: edm # [1] 1.28980e+05 0.00000e+00 *2.74161e-01* 0.00000e+00 8.10840e+04 0.00000e+00 # [7] 1.28980e+05 0.00000e+00 *2.74165e-01* 0.00000e+00 8.10840e+04 0.00000e+00 but with the proposed revision, which only collects Recharge values from Zone 2, it would look like: edm # [1] *2.74161e-01* 0.00000e+00 *2.74165e-01* 0.00000e+00 txt_con<-textConnection(" mark_zone Flow Budget for Zone 1 at Time Step 1 of Stress Period 2 ------------------------------------------------------------- Budget Term Flow (L**3/T) ----------------------------- IN: --- STORAGE = 0.37855E-02 CONSTANT HEAD = 0.0000 RECHARGE = 0.12898E+06 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 16 to 1 = 0.0000 Zone 31 to 1 = 0.0000 Zone 40 to 1 = 0.0000 Zone 91 to 1 = 0.0000 Total IN = 0.12898E+06 OUT: ---- STORAGE = 0.58275E-04 CONSTANT HEAD = 0.0000 RECHARGE = 0.0000 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 1 to 16 = 399.88 Zone 1 to 31 = 85204. Zone 1 to 40 = 12404. Zone 1 to 91 = 30968. Total OUT = 0.12898E+06 IN - OUT = 0.14138E-03 Percent Discrepancy = 0.00 1 mark_zone Flow Budget for Zone 2 at Time Step 1 of Stress Period 2 ------------------------------------------------------------- Budget Term Flow (L**3/T) ----------------------------- IN: --- STORAGE = 0.18833E-05 CONSTANT HEAD = 0.0000 RECHARGE = 0.274161E+06 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 15 to 2 = 0.0000 Zone 31 to 2 = 0.0000 Zone 91 to 2 = 13134. Total IN = 0.28729E+06 OUT: ---- STORAGE = 0.10823E-04 CONSTANT HEAD = 0.0000 RECHARGE = 0.0000 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 2 to 15 = 6812.7 Zone 2 to 31 = 0.20820E+06 Zone 2 to 91 = 72274. Total OUT = 0.28729E+06 IN - OUT = 0.58504E-02 Percent Discrepancy = 0.00 1 mark_zone Flow Budget for Zone 3 at Time Step 1 of Stress Period 2 ------------------------------------------------------------- Budget Term Flow (L**3/T) ----------------------------- IN: --- STORAGE = 0.84894E-04 CONSTANT HEAD = 0.0000 RECHARGE = 81084. STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 31 to 3 = 0.0000 Zone 91 to 3 = 1234.9 Total IN = 82319. OUT: ---- STORAGE = 0.0000 CONSTANT HEAD = 0.0000 RECHARGE = 0.0000 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 3 to 31 = 53937. Zone 3 to 91 = 28382. Total OUT = 82319. IN - OUT = 0.81732E-03 Percent Discrepancy = 0.00 1 mark_zone Flow Budget for Zone 1 at Time Step 1 of Stress Period 3 ------------------------------------------------------------- Budget Term Flow (L**3/T) ----------------------------- IN: --- STORAGE = 0.15770E-04 CONSTANT HEAD = 0.0000 RECHARGE = 0.12898E+06 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 16 to 1 = 0.0000 Zone 31 to 1 = 0.0000 Zone 40 to 1 = 0.0000 Zone 91 to 1 = 0.0000 Total IN = 0.12898E+06 OUT: ---- STORAGE = 0.38262E-02 CONSTANT HEAD = 0.0000 RECHARGE = 0.0000 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 1 to 16 = 399.88 Zone 1 to 31 = 85214. Zone 1 to 40 = 12405. Zone 1 to 91 = 30958. Total OUT = 0.12898E+06 IN - OUT = 0.88928E-03 Percent Discrepancy = 0.00 1 mark_zone Flow Budget for Zone 2 at Time Step 1 of Stress Period 3 ------------------------------------------------------------- Budget Term Flow (L**3/T) ----------------------------- IN: --- STORAGE = 0.0000 CONSTANT HEAD = 0.0000 RECHARGE = 0.274165E+06 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 15 to 2 = 0.0000 Zone 31 to 2 = 0.0000 Zone 91 to 2 = 13215. Total IN = 0.28737E+06 OUT: ---- STORAGE = 0.27267E-02 CONSTANT HEAD = 0.0000 RECHARGE = 0.0000 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 2 to 15 = 6813.6 Zone 2 to 31 = 0.20827E+06 Zone 2 to 91 = 72291. Total OUT = 0.28737E+06 IN - OUT = 0.69125E-03 Percent Discrepancy = 0.00 1 mark_zone Flow Budget for Zone 3 at Time Step 1 of Stress Period 3 ------------------------------------------------------------- Budget Term Flow (L**3/T) ----------------------------- IN: --- STORAGE = 0.0000 CONSTANT HEAD = 0.0000 RECHARGE = 81084. STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 31 to 3 = 0.0000 Zone 91 to 3 = 1262.7 Total IN = 82346. OUT: ---- STORAGE = 0.18113E-03 CONSTANT HEAD = 0.0000 RECHARGE = 0.0000 STREAM LEAKAGE = 0.0000 LAKE SEEPAGE = 0.0000 UZF ET = 0.0000 GW ET = 0.0000 UZF INFILTR. = 0.0000 SFR-DIV. INFLTR. = 0.0000 UZF RECHARGE = 0.0000 SURFACE LEAKAGE = 0.0000 Zone 3 to 31 = 53843. Zone 3 to 91 = 28503. Total OUT = 82346. IN - OUT = -0.14018E-02 Percent Discrepancy = 0.00 ") g<-function(txt_con, string, from, to, ...) { L <- readLines(txt_con) matched <- grep(string, L, value = TRUE, ...) as.numeric(substring(matched, from, to)) } #Now, strip out values edm<-g(txt_con, " RECHARGE =", 37, 50) [[alternative HTML version deleted]]
I am not sure what happened but it may be that you accidentally sent your message in html but the text connection data seems unuseable. It is much better to use ?dput to supply sample data. Have a look at http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example John Kane Kingston ON Canada> -----Original Message----- > From: emorway at usgs.gov > Sent: Wed, 21 Aug 2013 06:50:07 -0700 > To: r-help at r-project.org > Subject: [R] Narrowing values collected from .txt file > > useRs- > > The output generated from a groundwater model post-processor contains > millions of lines of text. Using the custom R function shown below, I > can > quickly gather values from this file. > > As you can see in the textConnection provided below (which is only a > small > snippet from the file), the output is repetitive but does have some > header > lines I hope to make use of to narrow the collected output. The header > lines I'm speaking of are: > 1) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 2" > 2) " Flow Budget for Zone 2 at Time Step 1 of Stress Period 2" > 3) " Flow Budget for Zone 3 at Time Step 1 of Stress Period 2" > 4) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 3" > ... > > and so on for 111 different "zones" as well as 575 distinct "stress > periods". In the custom function that follows, currently named "g", I > can > collect all values of "Recharge". If instead I want to restrict the > collected "Recharge" values to "Zone 2" for all 575 stress periods, is > there a way to first look for the header "Flow Budget for Zone 2", > collect > only the next two values of Recharge, and then skip down to the next > header > containing "Zone 2", collect 2 more values of "Recharge", and on like > this > to the end? 'Peeling' out targeted flow budget terms will facilitate > generation of budget-specific plots through time. > > The "edm" variable at the end of the R code that follows currently looks > like this: > edm > # [1] 1.28980e+05 0.00000e+00 *2.74161e-01* 0.00000e+00 8.10840e+04 > 0.00000e+00 > # [7] 1.28980e+05 0.00000e+00 *2.74165e-01* 0.00000e+00 8.10840e+04 > 0.00000e+00 > > but with the proposed revision, which only collects Recharge values from > Zone 2, it would look like: > edm > # [1] *2.74161e-01* 0.00000e+00 *2.74165e-01* 0.00000e+00 > > > txt_con<-textConnection(" mark_zone > > > Flow Budget for Zone 1 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.37855E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.12898E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 16 to 1 = 0.0000 > Zone 31 to 1 = 0.0000 > Zone 40 to 1 = 0.0000 > Zone 91 to 1 = 0.0000 > > Total IN = 0.12898E+06 > > OUT: > ---- > STORAGE = 0.58275E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 1 to 16 = 399.88 > Zone 1 to 31 = 85204. > Zone 1 to 40 = 12404. > Zone 1 to 91 = 30968. > > Total OUT = 0.12898E+06 > > IN - OUT = 0.14138E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 2 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.18833E-05 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.274161E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 15 to 2 = 0.0000 > Zone 31 to 2 = 0.0000 > Zone 91 to 2 = 13134. > > Total IN = 0.28729E+06 > > OUT: > ---- > STORAGE = 0.10823E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 2 to 15 = 6812.7 > Zone 2 to 31 = 0.20820E+06 > Zone 2 to 91 = 72274. > > Total OUT = 0.28729E+06 > > IN - OUT = 0.58504E-02 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 3 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.84894E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 81084. > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 31 to 3 = 0.0000 > Zone 91 to 3 = 1234.9 > > Total IN = 82319. > > OUT: > ---- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 3 to 31 = 53937. > Zone 3 to 91 = 28382. > > Total OUT = 82319. > > IN - OUT = 0.81732E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 1 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.15770E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.12898E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 16 to 1 = 0.0000 > Zone 31 to 1 = 0.0000 > Zone 40 to 1 = 0.0000 > Zone 91 to 1 = 0.0000 > > Total IN = 0.12898E+06 > > OUT: > ---- > STORAGE = 0.38262E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 1 to 16 = 399.88 > Zone 1 to 31 = 85214. > Zone 1 to 40 = 12405. > Zone 1 to 91 = 30958. > > Total OUT = 0.12898E+06 > > IN - OUT = 0.88928E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 2 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.274165E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 15 to 2 = 0.0000 > Zone 31 to 2 = 0.0000 > Zone 91 to 2 = 13215. > > Total IN = 0.28737E+06 > > OUT: > ---- > STORAGE = 0.27267E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 2 to 15 = 6813.6 > Zone 2 to 31 = 0.20827E+06 > Zone 2 to 91 = 72291. > > Total OUT = 0.28737E+06 > > IN - OUT = 0.69125E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 3 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 81084. > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 31 to 3 = 0.0000 > Zone 91 to 3 = 1262.7 > > Total IN = 82346. > > OUT: > ---- > STORAGE = 0.18113E-03 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 3 to 31 = 53843. > Zone 3 to 91 = 28503. > > Total OUT = 82346. > > IN - OUT = -0.14018E-02 > > Percent Discrepancy = 0.00 > ") > > > g<-function(txt_con, string, from, to, ...) { > L <- readLines(txt_con) > matched <- grep(string, L, value = TRUE, ...) > as.numeric(substring(matched, from, to)) > } > > #Now, strip out values > edm<-g(txt_con, " RECHARGE =", 37, 50) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
A relatively concise, commented, working solution to the problem originally motivating this thread was found (below). I suspect the approach I've taken has a major inefficiency through the use of the "scan" statement appearing inside the function "g". The way the code works right now, it has to re-open and read the file 'length(matched) times' rather than sequentially reading through to the next pertinent section of the txt file. Does anyone have a more efficient approach in mind so I don't have to wait 1/2 hour to get the results? (The only adjustment to the code that follows is to point "txt" to wherever the attached file is placed) #The file that the code works on is attached as: MCR_Budgets.zip (76MB uncompressed) # where is the file? (original dat file is ~147MB, only half of this file is attached) txt<-"c:/temp/MCR_Budgets.txt" # Demarcation header to narrow list of retrieved 'Recharge' values hdr_str<-"Flow Budget for Zone 2" # string to identify lines with desired values srch_str<-" RECHARGE =" # retrieves desired values g<-function(txt_con, hdr_str, srch_str, from, to, ...) { L <- readLines(txt_con) #matched contains the line #s w/ hdr_str matched <- grep(hdr_str, L, value = FALSE, ...) #initialize output list fetched_list<-numeric() #for each instance of hdr_str, loop for(i in 1:(length(matched))){ #retrieve a section of text following each hdr_str, suspect this is highly inefficient!!! snippet<-scan(txt_con, what=character(), skip=matched[i]-1, n=42, sep='\n') #get the two lines containing 'srch_str' within the short section of retrieved text fetched <- grep(srch_str, snippet, value=TRUE) #append output vector for plotting time series fetched_list <- c(fetched_list, as.numeric(substring(fetched, from, to))) #monitor print(i) } #return desired values as.numeric(fetched_list) } #The results of system.time reflect the fact the function was run on the full 147 MB file, # only half of which is attached. system.time( rech_z2<-g(txt,hdr_str,srch_str,37,51) ) # user system elapsed #1740.48 36.08 1825.77 On Wed, Aug 21, 2013 at 6:50 AM, Morway, Eric> > The output generated from a groundwater model post-processor contains > millions of lines of text. Using the custom R function shown below, I > can quickly gather values from this file. > > As you can see in the textConnection provided below (which is only a > small snippet from the file), the output is repetitive but does have some > header lines I hope to make use of to narrow the collected output. The > header lines I'm speaking of are: > 1) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 2" > 2) " Flow Budget for Zone 2 at Time Step 1 of Stress Period 2" > 3) " Flow Budget for Zone 3 at Time Step 1 of Stress Period 2" > 4) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 3" > ... > > and so on for 111 different "zones" as well as 575 distinct "stress > periods". In the custom function that follows, currently named "g", I can > collect all values of "Recharge". If instead I want to restrict the > collected "Recharge" values to "Zone 2" for all 575 stress periods, is > there a way to first look for the header "Flow Budget for Zone 2", collect > only the next two values of Recharge, and then skip down to the next > header containing "Zone 2", collect 2 more values of "Recharge", and on > like this to the end? 'Peeling' out targeted flow budget terms will > facilitate generation of budget-specific plots through time. > > The "edm" variable at the end of the R code that follows currently looks > like this: > edm > # [1] 1.28980e+05 0.00000e+00 *2.74161e-01* 0.00000e+00 8.10840e+04 > 0.00000e+00 > # [7] 1.28980e+05 0.00000e+00 *2.74165e-01* 0.00000e+00 8.10840e+04 > 0.00000e+00 > > but with the proposed revision, which only collects Recharge values from > Zone 2, it would look like: > edm > # [1] *2.74161e-01* 0.00000e+00 *2.74165e-01* 0.00000e+00 > > > txt_con<-textConnection(" mark_zone > > > Flow Budget for Zone 1 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.37855E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.12898E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 16 to 1 = 0.0000 > Zone 31 to 1 = 0.0000 > Zone 40 to 1 = 0.0000 > Zone 91 to 1 = 0.0000 > > Total IN = 0.12898E+06 > > OUT: > ---- > STORAGE = 0.58275E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 1 to 16 = 399.88 > Zone 1 to 31 = 85204. > Zone 1 to 40 = 12404. > Zone 1 to 91 = 30968. > > Total OUT = 0.12898E+06 > > IN - OUT = 0.14138E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 2 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.18833E-05 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.274161E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 15 to 2 = 0.0000 > Zone 31 to 2 = 0.0000 > Zone 91 to 2 = 13134. > > Total IN = 0.28729E+06 > > OUT: > ---- > STORAGE = 0.10823E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 2 to 15 = 6812.7 > Zone 2 to 31 = 0.20820E+06 > Zone 2 to 91 = 72274. > > Total OUT = 0.28729E+06 > > IN - OUT = 0.58504E-02 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 3 at Time Step 1 of Stress Period 2 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.84894E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 81084. > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 31 to 3 = 0.0000 > Zone 91 to 3 = 1234.9 > > Total IN = 82319. > > OUT: > ---- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 3 to 31 = 53937. > Zone 3 to 91 = 28382. > > Total OUT = 82319. > > IN - OUT = 0.81732E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 1 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.15770E-04 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.12898E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 16 to 1 = 0.0000 > Zone 31 to 1 = 0.0000 > Zone 40 to 1 = 0.0000 > Zone 91 to 1 = 0.0000 > > Total IN = 0.12898E+06 > > OUT: > ---- > STORAGE = 0.38262E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 1 to 16 = 399.88 > Zone 1 to 31 = 85214. > Zone 1 to 40 = 12405. > Zone 1 to 91 = 30958. > > Total OUT = 0.12898E+06 > > IN - OUT = 0.88928E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 2 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.274165E+06 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 15 to 2 = 0.0000 > Zone 31 to 2 = 0.0000 > Zone 91 to 2 = 13215. > > Total IN = 0.28737E+06 > > OUT: > ---- > STORAGE = 0.27267E-02 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 2 to 15 = 6813.6 > Zone 2 to 31 = 0.20827E+06 > Zone 2 to 91 = 72291. > > Total OUT = 0.28737E+06 > > IN - OUT = 0.69125E-03 > > Percent Discrepancy = 0.00 > 1 > mark_zone > > > Flow Budget for Zone 3 at Time Step 1 of Stress Period 3 > ------------------------------------------------------------- > > Budget Term Flow (L**3/T) > ----------------------------- > > IN: > --- > STORAGE = 0.0000 > CONSTANT HEAD = 0.0000 > RECHARGE = 81084. > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 31 to 3 = 0.0000 > Zone 91 to 3 = 1262.7 > > Total IN = 82346. > > OUT: > ---- > STORAGE = 0.18113E-03 > CONSTANT HEAD = 0.0000 > RECHARGE = 0.0000 > STREAM LEAKAGE = 0.0000 > LAKE SEEPAGE = 0.0000 > UZF ET = 0.0000 > GW ET = 0.0000 > UZF INFILTR. = 0.0000 > SFR-DIV. INFLTR. = 0.0000 > UZF RECHARGE = 0.0000 > SURFACE LEAKAGE = 0.0000 > Zone 3 to 31 = 53843. > Zone 3 to 91 = 28503. > > Total OUT = 82346. > > IN - OUT = -0.14018E-02 > > Percent Discrepancy = 0.00 > ") > > > g<-function(txt_con, string, from, to, ...) { > L <- readLines(txt_con) > matched <- grep(string, L, value = TRUE, ...) > as.numeric(substring(matched, from, to)) > } > > #Now, strip out values > edm<-g(txt_con, " RECHARGE =", 37, 50) > > >
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ---------- Forwarded message ---------- From: jim holtman <jholtman at gmail.com> Date: Thu, Aug 29, 2013 at 8:43 AM Subject: Re: [R] Narrowing values collected from .txt file To: "Morway, Eric" <emorway at usgs.gov> FYI, I duped your data to 100MB file and it took less that 10 seconds to process. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Aug 28, 2013 at 7:45 PM, Morway, Eric <emorway at usgs.gov> wrote:> It looks as though the attachment to my last post didn't make the cut (or > at least it's not appearing on the Nabble forum), for one reason or > another. I'm reattaching a smaller version so folks can run the code > (won't work without a text file to operate on). So, while the attached > file is only a small sample of the larger file and will therefore run > quickly, I would still be helpful if someone knows a more efficient > approach to the code in the previous post. > > > On Wed, Aug 28, 2013 at 11:28 AM, > >> >> A relatively concise, commented, working solution to the problem >> originally motivating this thread was found (below). I suspect the >> approach I've taken has a major inefficiency through the use of the >> "scan" statement appearing inside the function "g". The way the code >> works right now, it has to re-open and read the file 'length(matched) >> times' rather than sequentially reading through to the next pertinent >> section of the txt file. Does anyone have a more efficient approach in >> mind so I don't have to wait 1/2 hour to get the results? (The only >> adjustment to the code that follows is to point "txt" to wherever the >> attached file is placed) >> >> >> # where is the file? >> txt<-"c:/temp/MCR_Budgets.txt" >> >> # Demarcation header >> hdr_str<-"Flow Budget for Zone 2" >> >> # string to identify lines with desired values >> srch_str<-" RECHARGE =" >> >> # retrieves desired values >> g<-function(txt_con, hdr_str, srch_str, from, to, ...) { >> >> L <- readLines(txt_con) >> >> #matched contains the line #s w/ hdr_str >> matched <- grep(hdr_str, L, value = FALSE, ...) >> >> #initialize output list >> fetched_list<-numeric() >> >> #for each instance of hdr_str, loop >> for(i in 1:(length(matched))){ >> >> #retrieve a section of text following each hdr_str >> snippet<-scan(txt_con, what=character(), skip=matched[i]-1, n=42, >> sep='\n') >> >> #get data within the short section of retrieved text >> fetched <- grep(srch_str, snippet, value=TRUE) >> >> #append output vector for plotting time series >> fetched_list <- c(fetched_list, as.numeric(substring(fetched, from, >> to))) >> >> #monitor >> print(i) >> } >> >> #return desired values >> as.numeric(fetched_list) >> } >> >> #The results of system.time reflect full 147 MB file, >> # only half of which is attached. >> system.time( >> rech_z2<-g(txt,hdr_str,srch_str,37,51) >> ) >> # user system elapsed >> #1740.48 36.08 1825.77 >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >