Hi, I have an upcoming project that will involve a large text file. I want to 1. read the file into R one line at a time 2. do some string manipulations on the line 3. write the line to another text file. I can handle the last two parts. Scan and read.table seem to read the whole file in at once. Since this is a very large file (several hundred thousand lines), this is not practical. Hence the idea of reading one line at at time. The question is, can R read one line at a time? If so, how? Any suggestions are appreciated. Thanks, Walt ________________________ Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 ________________________ (V) 609-936-8999 (F) 609-936-3733 walt at dataanalyticscorp.com www.dataanalyticscorp.com _____________________________________________________ -- ________________________ Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 ________________________ (V) 609-936-8999 (F) 609-936-3733 walt at dataanalyticscorp.com www.dataanalyticscorp.com
Walt, Something like: con <- file("your-large-file.txt", "rt") readLines(con, 1) # Read one line -Matt On Sun, 2010-08-15 at 10:58 -0400, Data Analytics Corp. wrote:> Hi, > > I have an upcoming project that will involve a large text file. I want to > > 1. read the file into R one line at a time > 2. do some string manipulations on the line > 3. write the line to another text file. > > I can handle the last two parts. Scan and read.table seem to read the > whole file in at once. Since this is a very large file (several hundred > thousand lines), this is not practical. Hence the idea of reading one > line at at time. The question is, can R read one line at a time? If > so, how? Any suggestions are appreciated. > > Thanks, > > Walt > > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > walt at dataanalyticscorp.com > www.dataanalyticscorp.com > > _____________________________________________________ > >-- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina
On Aug 15, 2010, at 10:58 AM, Data Analytics Corp. wrote:> Hi, > > I have an upcoming project that will involve a large text file. I > want to > > 1. read the file into R one line at a time?readLines> 2. do some string manipulations on the line > 3. write the line to another text file. > > I can handle the last two parts. Scan and read.table seem to read > the whole file in at once. Since this is a very large file (several > hundred thousand lines), this is not practical. Hence the idea of > reading one line at at time. The question is, can R read one line > at a time? If so, how? Any suggestions are appreciated. > > Thanks, > > Walt > > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > walt at dataanalyticscorp.com > www.dataanalyticscorp.com > > _____________________________________________________ > > > -- > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > walt at dataanalyticscorp.com > www.dataanalyticscorp.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
In read.table - under parameters can't you specify the number of lines to be read in? You could create a loop and read them one by one. Dimitri On Sun, Aug 15, 2010 at 10:58 AM, Data Analytics Corp. <walt at dataanalyticscorp.com> wrote:> Hi, > > I have an upcoming project that will involve a large text file. ?I want to > > ?1. read the file into R one line at a time > ?2. do some string manipulations on the line > ?3. write the line to another text file. > > I can handle the last two parts. ?Scan and read.table seem to read the whole > file in at once. ?Since this is a very large file (several hundred thousand > lines), this is not practical. ?Hence the idea of reading one line at at > time. ?The question is, can R read one line at a time? ?If so, how? ?Any > suggestions are appreciated. > > Thanks, > > Walt > > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > walt at dataanalyticscorp.com > www.dataanalyticscorp.com > > _____________________________________________________ > > > -- > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > walt at dataanalyticscorp.com > www.dataanalyticscorp.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitri Liakhovitski Ninah Consulting www.ninah.com
For efficiency of processing, look at reading in several hundred/thousand lines at a time. One line read/write will probably spend most of the time in the system calls to do the I/O and will take a long time. So do something like this: con <- file('yourInputFile', 'r') outfile <- file('yourOutputFile', 'w') while (length(input <- readLines(con, n=1000) > 0){ for (i in 1:length(input)){ ......your one line at a time processing } writeLines(output, con=outfile) } On Sun, Aug 15, 2010 at 7:58 AM, Data Analytics Corp. <walt at dataanalyticscorp.com> wrote:> Hi, > > I have an upcoming project that will involve a large text file. ?I want to > > ?1. read the file into R one line at a time > ?2. do some string manipulations on the line > ?3. write the line to another text file. > > I can handle the last two parts. ?Scan and read.table seem to read the whole > file in at once. ?Since this is a very large file (several hundred thousand > lines), this is not practical. ?Hence the idea of reading one line at at > time. ?The question is, can R read one line at a time? ?If so, how? ?Any > suggestions are appreciated. > > Thanks, > > Walt > > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > walt at dataanalyticscorp.com > www.dataanalyticscorp.com > > _____________________________________________________ > > > -- > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > walt at dataanalyticscorp.com > www.dataanalyticscorp.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Sun, Aug 15, 2010 at 10:58:51AM -0400, Data Analytics Corp. wrote:> I have an upcoming project that will involve a large text file. I want to > > 1. read the file into R one line at a time > 2. do some string manipulations on the line > 3. write the line to another text file.You already got some good advice about how to solve this in R. I would just like to add that many people, including myself, prefer to do all text file scrubbing and especially string manipulations in scripting languages like Python or Perl followed by statistical analysis in R. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/