thr3ads.net - R help - [R] extracting information from txt file [Oct 2012]

If this information is useful, please help other people find it:
Share via:

chuck.01

2012-Oct-31 16:46 UTC

[R] extracting information from txt file

Hello,

Here is a link to some data:
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt

I am trying to read this in, and want to use: 
chmval <-
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
sep=",", skip= 84, header=T)

the # 84, for 84 lines skipped needs to be derived from the 5th line of the
txt file  
# Header Records:  85 

so, I need that # (-1) for input into the read.table statement above

I've tried grep but that didn't work: 
 (for this I downloaded the txt file and manually removed that hash mark!)

grep("Header Records:", read.table("chmval.txt", header=T))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 5 elements

Any ideas?
Can I just extract the 5th line?




--
View this message in context:
http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

2012-Oct-31 17:54 UTC

head link

[R] extracting information from txt file

Hello,

Use readLines instead.

?readLines  # see argument 'n'
readLines("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
n = 5)[5]


Hope this helps,

Rui Barradas
Em 31-10-2012 16:46, chuck.01 escreveu:> Hello,
>
> Here is a link to some data:
>
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt
>
> I am trying to read this in, and want to use:
> chmval <-
>
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
> sep=",", skip= 84, header=T)
>
> the # 84, for 84 lines skipped needs to be derived from the 5th line of the
> txt file
> # Header Records:  85
>
> so, I need that # (-1) for input into the read.table statement above
>
> I've tried grep but that didn't work:
>   (for this I downloaded the txt file and manually removed that hash mark!)
>
> grep("Header Records:", read.table("chmval.txt",
header=T))
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
>    line 1 did not have 5 elements
>
> Any ideas?
> Can I just extract the 5th line?
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Taimur Sajid

2012-Oct-31 17:56 UTC

head link

[R] extracting information from txt file

This worked for the example you provided. Assumes the header count is the only
numeric value on the 5th line.

	epa_extract <- function(address){
		doc <- readLines(address, n = 5)[5]
		
		head_count <- as.numeric(gsub("\\D", "", doc))
		
		read.table(address, sep = ",", header = TRUE, skip = head_count)
		}
		
	foo <-
epa_extract("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt")


Taimur Sajid
Research & Development Analyst
Primatics Financial

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of chuck.01
Sent: Wednesday, October 31, 2012 12:47 PM
To: r-help at r-project.org
Subject: [R] extracting information from txt file

Hello,

Here is a link to some data:
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt

I am trying to read this in, and want to use: 
chmval <-
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
sep=",", skip= 84, header=T)

the # 84, for 84 lines skipped needs to be derived from the 5th line of the txt
file # Header Records:  85

so, I need that # (-1) for input into the read.table statement above

I've tried grep but that didn't work: 
 (for this I downloaded the txt file and manually removed that hash mark!)

grep("Header Records:", read.table("chmval.txt", header=T))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
: 
  line 1 did not have 5 elements

Any ideas?
Can I just extract the 5th line?




--
View this message in context:
http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

jim holtman

2012-Oct-31 18:10 UTC

head link

[R] extracting information from txt file

This worked fine for me:
> x <-
read.csv("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
skip=84, as.is = TRUE)
> str(x)'data.frame':   711 obs. of  75 variables:
 $ ALDI    : chr  "." "." "." "." ...
 $ ALDS    : chr  "." "S" "S" "S" ...
 $ ALDSF   : chr  " " " " " " " " ...
 $ ALKCALC : chr  "106.05" "210.7" "73.51"
"432.63" ...
 $ ALOR    : chr  "." "S" "S" "S" ...
 $ ALORF   : chr  " " " " " " " " ...
 $ ALTD    : chr  "54" "36" "47" "12"
...
 $ ALTDF   : chr  " " " " " " " " ...
 $ ANC     : chr  "115" "207.2" "82.2"
"435.2" ...


On Wed, Oct 31, 2012 at 12:46 PM, chuck.01 <CharlieTheBrown77 at
gmail.com> wrote:> Hello,
>
> Here is a link to some data:
>
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt
>
> I am trying to read this in, and want to use:
> chmval <-
>
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
> sep=",", skip= 84, header=T)
>
> the # 84, for 84 lines skipped needs to be derived from the 5th line of the
> txt file
> # Header Records:  85
>
> so, I need that # (-1) for input into the read.table statement above
>
> I've tried grep but that didn't work:
>  (for this I downloaded the txt file and manually removed that hash mark!)
>
> grep("Header Records:", read.table("chmval.txt",
header=T))
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
>   line 1 did not have 5 elements
>
> Any ideas?
> Can I just extract the 5th line?
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

David Winsemius

2012-Oct-31 18:11 UTC

head link

[R] extracting information from txt file

On Oct 31, 2012, at 9:46 AM, chuck.01 wrote:
> Hello,
> 
> Here is a link to some data:
>
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt
> 
> I am trying to read this in, and want to use: 
> chmval <-
>
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
> sep=",", skip= 84, header=T)
> 
> the # 84, for 84 lines skipped needs to be derived from the 5th line of the
> txt file  
> # Header Records:  85 
> 
> so, I need that # (-1) for input into the read.table statement above
That "# (-1)" is fairly cryptic to my reading, but it appears you are
seeing the behavior of the "3" character in terminating input for
comments. Changing the comment character in the call to read.table will allow
input from that line.

?read.table

You will need to read only the first 5 or 6 lines first, then execute a separate
read.table while skipping input from those lines as well as the variable list
that forms a secondary header.
> headfrm <- read.table( file=url(
"http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt"),
nrows=6, sep=":", comment.char="")
> headfrm                V1                                       V2
1          Dataset               EMAP Stream Chemistry Data
2        File Name                                   chmval
3     Date Created                                 02/22/99
4      # Variables                                       75
5 # Header Records                                       85
6   # Data Records                                      711


> 
> I've tried grep but that didn't work: 
> (for this I downloaded the txt file and manually removed that hash mark!)
> 
> grep("Header Records:", read.table("chmval.txt",
header=T))
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
> : 
>  line 1 did not have 5 elements
> 
> Any ideas?
> Can I just extract the 5th line?
> 
> 
> 
> 
> --
> View this message in context:
http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA

jim holtman

2012-Oct-31 18:14 UTC

head link

[R] extracting information from txt file

Using na.string works better:
> x <-
read.csv("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
skip=84, as.is = TRUE, na.string = '.')
> str(x)'data.frame':   711 obs. of  75 variables:
 $ ALDI    : int  NA NA NA NA NA NA NA NA NA NA ...
 $ ALDS    : chr  NA "S" "S" "S" ...
 $ ALDSF   : chr  " " " " " " " " ...
 $ ALKCALC : num  106 210.7 73.5 432.6 38.7 ...
 $ ALOR    : chr  NA "S" "S" "S" ...
 $ ALORF   : chr  " " " " " " " " ...
 $ ALTD    : int  54 36 47 12 19 10 12 5 8 6 ...
 $ ALTDF   : chr  " " " " " " " " ...
 $ ANC     : num  115 207.2 82.2 435.2 37.4 ...
 $ ANCF    : chr  " " " " " " " " ...
 $ ANDEF   : num  82.5 52.3 31.8 21.9 12.2 ...
 $ ANSUM   : num  771 728 328 892 251 ...
 $ CA      : num  303 529 182 392 124 ...


On Wed, Oct 31, 2012 at 12:46 PM, chuck.01 <CharlieTheBrown77 at
gmail.com> wrote:> Hello,
>
> Here is a link to some data:
>
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt
>
> I am trying to read this in, and want to use:
> chmval <-
>
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
> sep=",", skip= 84, header=T)
>
> the # 84, for 84 lines skipped needs to be derived from the 5th line of the
> txt file
> # Header Records:  85
>
> so, I need that # (-1) for input into the read.table statement above
>
> I've tried grep but that didn't work:
>  (for this I downloaded the txt file and manually removed that hash mark!)
>
> grep("Header Records:", read.table("chmval.txt",
header=T))
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
>   line 1 did not have 5 elements
>
> Any ideas?
> Can I just extract the 5th line?
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Maybe Matching Threads

Search for more apparently analagous threads

R help - Oct 2012 - extracting information from txt file

[R] extracting information from txt file

[R] extracting information from txt file

[R] extracting information from txt file

[R] extracting information from txt file

[R] extracting information from txt file

[R] extracting information from txt file

Maybe Matching Threads