On Fri, 11 Apr 2008, Zev Ross wrote:
> Hi All,
>
> Can anyone direct me to a read function in R that will allow me to only
> read in rows of a text file that begin with a particular value such as
> the data below. I would read the entire file in and then limit, but the
> files were constructed such that the first two letters determine how
> many variables are in the row (different letters mean different numbers
> of columns and different column names/types).
>
> I can do this in SAS, but I'd prefer to use R. The approximate SAS code
> is below with the key piece of code being "if
rectype='RD'" then do.
>
> Thoughts?
If your data are in 'tmp.dat':
> txt <- readLines( "tmp.dat" )
> con <- textConnection( grep( "^RD", txt, value=TRUE ) )
> dat <- read.csv( con, sep='|', header=FALSE)
> close(con)
> summary( dat[ , 1:3 ] )
V1 V2 V3
RD:6 I:6 Min. :1
1st Qu.:1
Median :1
Mean :1
3rd Qu.:1
Max. :1
Alternatively, if you have 'grep' in your system and in the path:
> con2 <- pipe( 'grep "^RD" tmp.dat' )
> dat2 <- read.csv( con2, sep='|', header=FALSE)
>
See
?connection
?textConnection
?grep
HTH,
Chuck>
> Zev
>
>
> RD|I|01|073|0023|68103|5|7|017|810|20070103|00:00|0.6||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070106|00:00|9.5||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070109|00:00|2.5||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070112|00:00|13.7||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070115|00:00|7.3||3|||||||||||||
> RA|I|01|073|0023|A334|5|7|017|810|20070118|00:00|3.7||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070121|00:00|6.9||3|||||||||||||
> RC|I|01|073|0023|Quer|5|7|017|810|20070124|00:00|1.8||3|||||||||||||
>
>
> infile 'C:\junk\RD_501_88101_2006-0.txt'
> dlm='|' firstobs=3 missover;
> rectype $2. @;
> if rectype = 'RD' then do;
>
> --
> Zev Ross
> ZevRoss Spatial Analysis
> 303 Fairmount Ave
> Ithaca, NY 14850
> 607-277-0004 (phone)
> 866-877-3690 (fax, toll-free)
> zev at zevross.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901