Walter R. Paczkowski
2007-Mar-08 02:04 UTC
[R] reading a text file with a stray carriage return
Hi, I'm hoping someone has a suggestion for handling a simple problem. A client gave me a comma separated value file (call it x.csv) that has an id and name and address for about 25,000 people (25,000 records). I used read.table to read it, but then discovered that there are stray carriage returns on several records. This plays havoc with read.table since it starts a new input line when it sees the carriage return. In short, the read is all wrong. I thought I could write a simple function to parse a line and write it back out, character by character. If a carriage return is found, it would simply be ignored on the writing back out part. But how do I identify a carriage return? What is the code or symbol? Is there any easier way to rid the file of carriage returns in the middle of the input lines? Any help is appreciated. Walt Paczkowski _________________________________ Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 (V) 609-936-8999 (F) 609-936-3733
How do you define a carriage return in the middle of a line if a carriage return is also used to delimit a line? One of the things you can do is to use 'count.fields' to determine the number of fields in each line. For those lines that are not the right length, you could combine them together with a 'paste' command when you write them out. On 3/7/07, Walter R. Paczkowski <dataanalytics@earthlink.net> wrote:> > > Hi, > I'm hoping someone has a suggestion for handling a simple problem. A > client gave me a comma separated value file (call it x.csv) that has > an id and name and address for about 25,000 people (25,000 records). > I used read.table to read it, but then discovered that there are stray > carriage returns on several records. This plays havoc with read.table > since it starts a new input line when it sees the carriage return. In > short, the read is all wrong. > I thought I could write a simple function to parse a line and write it > back out, character by character. If a carriage return is found, it > would simply be ignored on the writing back out part. But how do I > identify a carriage return? What is the code or symbol? Is there any > easier way to rid the file of carriage returns in the middle of the > input lines? > Any help is appreciated. > Walt Paczkowski > > _________________________________ > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > (V) 609-936-8999 > (F) 609-936-3733 > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]