David Epstein
2009-Feb-05 09:01 UTC
[R] eliminating control characters from formatted data files
I have a few hundred files of formatted data. Unfortunately most of them end with a spurious CONTROL-Z. I want to rewrite the files without the spurious character. Here's what I've come up with so far, but my code is unsafe because it assumes without justification that the last row of df contains a control character (and some NAs to fill up the record). options(warn=-1) #turn off irritating warning from read.table() df<-read.table(file=filename) df.new<-df[1:nrow(df)-1,] write.table(df.new,file=filename.new, quote=F) Before defining df.new, I want to check that the last line really does contain a control character. I've tried various methods, but none of them work. I have been wondering if I should use a function (scan?) that reads in the file line by line and checks each line for control characters, but I don't know how to do this either. Thanks for any help David -- View this message in context: http://www.nabble.com/eliminating-control-characters-from-formatted-data-files-tp21847583p21847583.html Sent from the R help mailing list archive at Nabble.com.
jim holtman
2009-Feb-05 13:52 UTC
[R] eliminating control characters from formatted data files
Here is one way of doing it. You can read it in as "raw" and then either replace/delete the control character and write the file back out:> # read in as 'raw' and delete the control-Z from the string > x <- readBin('/tempyy.txt', 'raw', n=100000) > x[1] 54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 1a 2e 0d 0a 4d 4f 52 45 20 4f 46 20 54 48 45 20 44 41 54 45 4d [37] 1a 1a 0d 0a 74 68 69 73 20 69 73 20 73 6f 6d 65 20 64 61 74 61 0d 0a 6c 61 73 74 20 6c 69 6e 65 0d 0a> rawToChar(x)[1] "This is a test \032.\r\nMORE OF THE DATEM\032\032\r\nthis is some data\r\nlast line\r\n"> # delete ^Z > x <- x[x != as.raw(26)] > x[1] 54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 2e 0d 0a 4d 4f 52 45 20 4f 46 20 54 48 45 20 44 41 54 45 4d 0d [37] 0a 74 68 69 73 20 69 73 20 73 6f 6d 65 20 64 61 74 61 0d 0a 6c 61 73 74 20 6c 69 6e 65 0d 0a> rawToChar(x)[1] "This is a test .\r\nMORE OF THE DATEM\r\nthis is some data\r\nlast line\r\n"> # can now write out 'x' > >On Thu, Feb 5, 2009 at 4:01 AM, David Epstein <David.Epstein at warwick.ac.uk> wrote:> > I have a few hundred files of formatted data. Unfortunately most of them end > with a spurious CONTROL-Z. I want to rewrite the files without the spurious > character. Here's what I've come up with so far, but my code is unsafe > because it assumes without justification that the last row of df contains a > control character (and some NAs to fill up the record). > > options(warn=-1) #turn off irritating warning from read.table() > df<-read.table(file=filename) > df.new<-df[1:nrow(df)-1,] > write.table(df.new,file=filename.new, quote=F) > > Before defining df.new, I want to check that the last line really does > contain a control character. I've tried various methods, but none of them > work. > > I have been wondering if I should use a function (scan?) that reads in the > file line by line and checks each line for control characters, but I don't > know how to do this either. > > Thanks for any help > David > -- > View this message in context: http://www.nabble.com/eliminating-control-characters-from-formatted-data-files-tp21847583p21847583.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Murray Cooper
2009-Feb-05 14:13 UTC
[R] eliminating control characters from formatted data files
David, This may be a case of "If all you have is a hammer, everything looks like a nail". If all you want to do is remove the last line if it contains a CONTROL-Z, why not use something like perl to process the files? Murray M Cooper, Ph.D. Richland Statistics 9800 N 24th St Richland, MI, USA 49083 Mail: richstat at earthlink.net ----- Original Message ----- From: "David Epstein" <David.Epstein at warwick.ac.uk> To: <r-help at r-project.org> Sent: Thursday, February 05, 2009 4:01 AM Subject: [R] eliminating control characters from formatted data files> > I have a few hundred files of formatted data. Unfortunately most of them > end > with a spurious CONTROL-Z. I want to rewrite the files without the > spurious > character. Here's what I've come up with so far, but my code is unsafe > because it assumes without justification that the last row of df contains > a > control character (and some NAs to fill up the record). > > options(warn=-1) #turn off irritating warning from read.table() > df<-read.table(file=filename) > df.new<-df[1:nrow(df)-1,] > write.table(df.new,file=filename.new, quote=F) > > Before defining df.new, I want to check that the last line really does > contain a control character. I've tried various methods, but none of them > work. > > I have been wondering if I should use a function (scan?) that reads in the > file line by line and checks each line for control characters, but I don't > know how to do this either. > > Thanks for any help > David > -- > View this message in context: > http://www.nabble.com/eliminating-control-characters-from-formatted-data-files-tp21847583p21847583.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >