Hi all, I am a new graduate student who is also new to R. I am ok with the basics, but the problem I am having right now seems beyond what I can do..so I am looking for advice. I am trying to pull data from flat ASCII files, but they do not have a "nice" structure so a simple "read.table" doesn't work. An example first half of a data file is below: ---------------------------------------------------------------------------------------------- 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt 10 s name of program that wrote this file trkplt name of program that wrote this file 10 GORDON machine that generated this file machine that generated this file 10 3.7 version of program 10 3.6 version of this data file 10 5.81 version of Universal Library 10 20081121.145730 when this file was written 10 Windows_XP operating system used operating system used * * radar characteristics 11 WF-100 11 20000000 A/D rate, samples/second 11 7.5 bin width, m 11 800 nominal PRF, Hz 11 0.25 nominal pulse width, microsec 11 0 tuning, volts 11 3.19779 nominal wave length, cm ----------------------------------------------------------------------------------------------- ..the file goes on from there... How would I go about getting this data into some kind of useful format? This is one of about 1000 files I will need to go through. I would ideally like to get these into a format with each data file as a row with columns for the various values with the description text removed(version of program, file version, tuning volts, etc...). I'm not looking for a cut and paste answer, but perhaps some direction on where I should start. I have only done basic .csv, table, and line inputs up until now. Thanks for any advice -- View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ACII-file...-tp21059239p21059239.html Sent from the R help mailing list archive at Nabble.com.
I usually use Unix tools to process really data beforehand (sed, awk), but if you want a pure R solution it is usually possible to kludge something together with scan() working line by line. # read a line # if it contains stuff you aren't interested in, go on to the next line # if it contains one kind of interesting stuff, do X # if it contains another kind of interesting stuff, do Y and so on. I've done this when it was easier than alternative processing (though slower), and found that it worked best for me to read the entire line in as a string, then split it apart later and convert to numeric if appropriate. Sarah On Wed, Dec 17, 2008 at 2:37 PM, Titan8883 <jplaney at gmail.com> wrote:> > Hi all, > > I am a new graduate student who is also new to R. I am ok with the basics, > but the problem I am having right now seems beyond what I can do..so I am > looking for advice. I am trying to pull data from flat ASCII files, but they > do not have a "nice" structure so a simple "read.table" doesn't work. An > example first half of a data file is below: > ---------------------------------------------------------------------------------------------- > 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt > 10 s name of program that wrote this file trkplt name of program that > wrote this file > 10 GORDON machine that generated this file machine that generated this > file > 10 3.7 version of program > 10 3.6 version of this data file > 10 5.81 version of Universal Library > 10 20081121.145730 when this file was written > 10 Windows_XP operating system used operating system used > * > * radar characteristics > 11 WF-100 > 11 20000000 A/D rate, samples/second > 11 7.5 bin width, m > 11 800 nominal PRF, Hz > 11 0.25 nominal pulse width, microsec > 11 0 tuning, volts > 11 3.19779 nominal wave length, cm > ----------------------------------------------------------------------------------------------- > ..the file goes on from there... > > How would I go about getting this data into some kind of useful format? This > is one of about 1000 files I will need to go through. I would ideally like > to get these into a format with each data file as a row with columns for the > various values with the description text removed(version of program, file > version, tuning volts, etc...). > > I'm not looking for a cut and paste answer, but perhaps some direction on > where I should start. I have only done basic .csv, table, and line inputs up > until now. > > Thanks for any advice-- Sarah Goslee http://www.functionaldiversity.org
It would be helpful if you could show what the output would be for the example given. Exactly what are 'values' and what would be the 'headings'. As mentioned before, you can use readLines and then parse the data you want, but something like Perl might be easier, but it is hard to tell from the mail. On Wed, Dec 17, 2008 at 2:37 PM, Titan8883 <jplaney at gmail.com> wrote:> > Hi all, > > I am a new graduate student who is also new to R. I am ok with the basics, > but the problem I am having right now seems beyond what I can do..so I am > looking for advice. I am trying to pull data from flat ASCII files, but they > do not have a "nice" structure so a simple "read.table" doesn't work. An > example first half of a data file is below: > ---------------------------------------------------------------------------------------------- > 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt > 10 s name of program that wrote this file trkplt name of program that > wrote this file > 10 GORDON machine that generated this file machine that generated this > file > 10 3.7 version of program > 10 3.6 version of this data file > 10 5.81 version of Universal Library > 10 20081121.145730 when this file was written > 10 Windows_XP operating system used operating system used > * > * radar characteristics > 11 WF-100 > 11 20000000 A/D rate, samples/second > 11 7.5 bin width, m > 11 800 nominal PRF, Hz > 11 0.25 nominal pulse width, microsec > 11 0 tuning, volts > 11 3.19779 nominal wave length, cm > ----------------------------------------------------------------------------------------------- > ..the file goes on from there... > > How would I go about getting this data into some kind of useful format? This > is one of about 1000 files I will need to go through. I would ideally like > to get these into a format with each data file as a row with columns for the > various values with the description text removed(version of program, file > version, tuning volts, etc...). > > I'm not looking for a cut and paste answer, but perhaps some direction on > where I should start. I have only done basic .csv, table, and line inputs up > until now. > > Thanks for any advice > -- > View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ASCII-file...-tp21059239p21059239.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Wed, 17 Dec 2008, Titan8883 wrote:> > Hi all, > > I am a new graduate student who is also new to R. I am ok with the basics, > but the problem I am having right now seems beyond what I can do..so I am > looking for advice.Advice? OK. Here goes. I would suggest you pull one of the data files into a character vector using readLines().>From there you can try out different methods of finding the data elementsin the file that you want to extract. If it is guaranteed that 'nominal pulse width' ALWAYS shows up on the same line in every file, you can use the line numbers to figure out where to look for data elements. If not, you will probably want to get familiar with grep() and regular expressions, see ?regex and use RSiteSearch("regexpr") and the like to turn up the many useful discussions of them on this list.>From there sub(), gsub(), strsplit(), and friends will help you. They maytake a good deal of fiddling to get them to digest your data. If parts of your file can be read using read.csv() or scan() or something, you can use a textConnection() to pass some lines that readLines() has stored for you to read.csv(). Once you get so that one data file can be processed, rolling up your code as a function should not be too hard. Put the function in a loop using res <- list() for(ifile in your.file.list ) res[[ifile]] <- your.function( ifile) or res <- sapply(your.file.list, your.function) or res <- lapply(your.file.list, your.function) and you are ready to chomp away at your files. HTH, Chuck I am trying to pull data from flat ASCII files, but they> do not have a "nice" structure so a simple "read.table" doesn't work. An > example first half of a data file is below: > ---------------------------------------------------------------------------------------------- > 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt > 10 s name of program that wrote this file trkplt name of program that > wrote this file > 10 GORDON machine that generated this file machine that generated this > file > 10 3.7 version of program > 10 3.6 version of this data file > 10 5.81 version of Universal Library > 10 20081121.145730 when this file was written > 10 Windows_XP operating system used operating system used > * > * radar characteristics > 11 WF-100 > 11 20000000 A/D rate, samples/second > 11 7.5 bin width, m > 11 800 nominal PRF, Hz > 11 0.25 nominal pulse width, microsec > 11 0 tuning, volts > 11 3.19779 nominal wave length, cm > ----------------------------------------------------------------------------------------------- > ..the file goes on from there... > > How would I go about getting this data into some kind of useful format? This > is one of about 1000 files I will need to go through. I would ideally like > to get these into a format with each data file as a row with columns for the > various values with the description text removed(version of program, file > version, tuning volts, etc...). > > I'm not looking for a cut and paste answer, but perhaps some direction on > where I should start. I have only done basic .csv, table, and line inputs up > until now. > > Thanks for any advice > -- > View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ACII-file...-tp21059239p21059239.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Gabor Grothendieck
2008-Dec-18 04:00 UTC
[R] Trouble pulling data from a messy ACII file...
Its not clear what the result would be but you may be able to use read.table. Try this: Lines <- "19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt 10 s name of program that wrote this file trkplt name of program that wrote this file 10 GORDON machine that generated this file machine that generated this file 10 3.7 version of program 10 3.6 version of this data file 10 5.81 version of Universal Library 10 20081121.145730 when this file was written 10 Windows_XP operating system used operating system used * * radar characteristics 11 WF-100 11 20000000 A/D rate, samples/second 11 7.5 bin width, m 11 800 nominal PRF, Hz 11 0.25 nominal pulse width, microsec 11 0 tuning, volts 11 3.19779 nominal wave length, cm" DF <- read.table(textConnection(Lines), fill = TRUE) DF2 <- with(DF, na.omit(data.frame(V1, V2 = as.numeric(V2), V3 do.call(paste, DF[-(1:2)])))) You may need to remove the na.omit if you really do need those rows and make other changes but that at least gives the idea. On Wed, Dec 17, 2008 at 2:03 PM, Titan8883 <jplaney at gmail.com> wrote:> > Hi all, > > I am a new graduate student who is also new to R. I am ok with the basics, > but the problem I am having right now seems beyond what I can do..so I am > looking for advice. I am trying to pull data from flat ASCII files, but they > do not have a "nice" structure so a simple "read.table" doesn't work. An > example first half of a data file is below: > ---------------------------------------------------------------------------------------------- > 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt > 10 s name of program that wrote this file trkplt name of program that > wrote this file > 10 GORDON machine that generated this file machine that generated this > file > 10 3.7 version of program > 10 3.6 version of this data file > 10 5.81 version of Universal Library > 10 20081121.145730 when this file was written > 10 Windows_XP operating system used operating system used > * > * radar characteristics > 11 WF-100 > 11 20000000 A/D rate, samples/second > 11 7.5 bin width, m > 11 800 nominal PRF, Hz > 11 0.25 nominal pulse width, microsec > 11 0 tuning, volts > 11 3.19779 nominal wave length, cm > ----------------------------------------------------------------------------------------------- > ..the file goes on from there... > > How would I go about getting this data into some kind of useful format? This > is one of about 1000 files I will need to go through. I would ideally like > to get these into a format with each data file as a row with columns for the > various values with the description text removed(version of program, file > version, tuning volts, etc...). > > I'm not looking for a cut and paste answer, but perhaps some direction on > where I should start. I have only done basic .csv, table, and line inputs up > until now. > > Thanks for any advice > -- > View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ACII-file...-tp21059239p21059239.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thanks to all who replied! This is great! I am going to dig into this over the holidays. I will post what I end up with... Thanks again, James Planey Graduate Research Assistant Department of Animal Biology University of Illinois @ Urbana-Champaign -- View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ASCII-file...-tp21059239p21093949.html Sent from the R help mailing list archive at Nabble.com.