Hello, Here is a link to some data: http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt I am trying to read this in, and want to use: chmval <- read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", sep=",", skip= 84, header=T) the # 84, for 84 lines skipped needs to be derived from the 5th line of the txt file # Header Records: 85 so, I need that # (-1) for input into the read.table statement above I've tried grep but that didn't work: (for this I downloaded the txt file and manually removed that hash mark!) grep("Header Records:", read.table("chmval.txt", header=T)) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 5 elements Any ideas? Can I just extract the 5th line? -- View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html Sent from the R help mailing list archive at Nabble.com.
Hello, Use readLines instead. ?readLines # see argument 'n' readLines("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", n = 5)[5] Hope this helps, Rui Barradas Em 31-10-2012 16:46, chuck.01 escreveu:> Hello, > > Here is a link to some data: > http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt > > I am trying to read this in, and want to use: > chmval <- > read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", > sep=",", skip= 84, header=T) > > the # 84, for 84 lines skipped needs to be derived from the 5th line of the > txt file > # Header Records: 85 > > so, I need that # (-1) for input into the read.table statement above > > I've tried grep but that didn't work: > (for this I downloaded the txt file and manually removed that hash mark!) > > grep("Header Records:", read.table("chmval.txt", header=T)) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 1 did not have 5 elements > > Any ideas? > Can I just extract the 5th line? > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
This worked for the example you provided. Assumes the header count is the only numeric value on the 5th line. epa_extract <- function(address){ doc <- readLines(address, n = 5)[5] head_count <- as.numeric(gsub("\\D", "", doc)) read.table(address, sep = ",", header = TRUE, skip = head_count) } foo <- epa_extract("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt") Taimur Sajid Research & Development Analyst Primatics Financial -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of chuck.01 Sent: Wednesday, October 31, 2012 12:47 PM To: r-help at r-project.org Subject: [R] extracting information from txt file Hello, Here is a link to some data: http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt I am trying to read this in, and want to use: chmval <- read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", sep=",", skip= 84, header=T) the # 84, for 84 lines skipped needs to be derived from the 5th line of the txt file # Header Records: 85 so, I need that # (-1) for input into the read.table statement above I've tried grep but that didn't work: (for this I downloaded the txt file and manually removed that hash mark!) grep("Header Records:", read.table("chmval.txt", header=T)) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 5 elements Any ideas? Can I just extract the 5th line? -- View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
This worked fine for me:> x <- read.csv("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", skip=84, as.is = TRUE) > str(x)'data.frame': 711 obs. of 75 variables: $ ALDI : chr "." "." "." "." ... $ ALDS : chr "." "S" "S" "S" ... $ ALDSF : chr " " " " " " " " ... $ ALKCALC : chr "106.05" "210.7" "73.51" "432.63" ... $ ALOR : chr "." "S" "S" "S" ... $ ALORF : chr " " " " " " " " ... $ ALTD : chr "54" "36" "47" "12" ... $ ALTDF : chr " " " " " " " " ... $ ANC : chr "115" "207.2" "82.2" "435.2" ... On Wed, Oct 31, 2012 at 12:46 PM, chuck.01 <CharlieTheBrown77 at gmail.com> wrote:> Hello, > > Here is a link to some data: > http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt > > I am trying to read this in, and want to use: > chmval <- > read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", > sep=",", skip= 84, header=T) > > the # 84, for 84 lines skipped needs to be derived from the 5th line of the > txt file > # Header Records: 85 > > so, I need that # (-1) for input into the read.table statement above > > I've tried grep but that didn't work: > (for this I downloaded the txt file and manually removed that hash mark!) > > grep("Header Records:", read.table("chmval.txt", header=T)) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 1 did not have 5 elements > > Any ideas? > Can I just extract the 5th line? > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
On Oct 31, 2012, at 9:46 AM, chuck.01 wrote:> Hello, > > Here is a link to some data: > http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt > > I am trying to read this in, and want to use: > chmval <- > read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", > sep=",", skip= 84, header=T) > > the # 84, for 84 lines skipped needs to be derived from the 5th line of the > txt file > # Header Records: 85 > > so, I need that # (-1) for input into the read.table statement aboveThat "# (-1)" is fairly cryptic to my reading, but it appears you are seeing the behavior of the "3" character in terminating input for comments. Changing the comment character in the call to read.table will allow input from that line. ?read.table You will need to read only the first 5 or 6 lines first, then execute a separate read.table while skipping input from those lines as well as the variable list that forms a secondary header.> headfrm <- read.table( file=url( "http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt"), nrows=6, sep=":", comment.char="") > headfrmV1 V2 1 Dataset EMAP Stream Chemistry Data 2 File Name chmval 3 Date Created 02/22/99 4 # Variables 75 5 # Header Records 85 6 # Data Records 711> > I've tried grep but that didn't work: > (for this I downloaded the txt file and manually removed that hash mark!) > > grep("Header Records:", read.table("chmval.txt", header=T)) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 1 did not have 5 elements > > Any ideas? > Can I just extract the 5th line? > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Alameda, CA, USA
Using na.string works better:> x <- read.csv("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", skip=84, as.is = TRUE, na.string = '.') > str(x)'data.frame': 711 obs. of 75 variables: $ ALDI : int NA NA NA NA NA NA NA NA NA NA ... $ ALDS : chr NA "S" "S" "S" ... $ ALDSF : chr " " " " " " " " ... $ ALKCALC : num 106 210.7 73.5 432.6 38.7 ... $ ALOR : chr NA "S" "S" "S" ... $ ALORF : chr " " " " " " " " ... $ ALTD : int 54 36 47 12 19 10 12 5 8 6 ... $ ALTDF : chr " " " " " " " " ... $ ANC : num 115 207.2 82.2 435.2 37.4 ... $ ANCF : chr " " " " " " " " ... $ ANDEF : num 82.5 52.3 31.8 21.9 12.2 ... $ ANSUM : num 771 728 328 892 251 ... $ CA : num 303 529 182 392 124 ... On Wed, Oct 31, 2012 at 12:46 PM, chuck.01 <CharlieTheBrown77 at gmail.com> wrote:> Hello, > > Here is a link to some data: > http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt > > I am trying to read this in, and want to use: > chmval <- > read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt", > sep=",", skip= 84, header=T) > > the # 84, for 84 lines skipped needs to be derived from the 5th line of the > txt file > # Header Records: 85 > > so, I need that # (-1) for input into the read.table statement above > > I've tried grep but that didn't work: > (for this I downloaded the txt file and manually removed that hash mark!) > > grep("Header Records:", read.table("chmval.txt", header=T)) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 1 did not have 5 elements > > Any ideas? > Can I just extract the 5th line? > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.