Hi, I have a flat file that contains a bunch of strings that look like this. The file was originally in Unix and brought over into Windows: E123456E234567E345678E456789E567891E678910E. . . . Basically the string starts with E and is followed with 6 numbers. One string=E123456, length=7 characters. This file contains 10,000's of these strings. I want to separate them into one vector the length of the number of strings in the flat file, where each string is it's on unique value. cc<-c(7,7,7,7,7,7,7)> aa<- file("Master","r", raw=TRUE) > readChar(aa, cc, useBytes = FALSE)[1] "E123456" "\nE23456" "7\nE3456" "78\nE456" "789\nE56" "7891\nE6" "78910\nE"> close(aa) > unlink("Master")The biggest issue is I am getting \n added into the string, which I am not sure where it is coming from, and splices the strings. Any suggestions on getting rid of the /n and create an infinite sequence of 7's for the string length for the cc vector? Is there a better way to do this? Sarah [[alternative HTML version deleted]]
On Apr 5, 2011, at 7:48 PM, Kalicin, Sarah wrote:> Hi, > > I have a flat file that contains a bunch of strings that look like > this. The file was originally in Unix and brought over into Windows: > > E123456E234567E345678E456789E567891E678910E. . . . > Basically the string starts with E and is followed with 6 numbers. > One string=E123456, length=7 characters. This file contains 10,000's > of these strings. I want to separate them into one vector the length > of the number of strings in the flat file, where each string is it's > on unique value. > > cc<-c(7,7,7,7,7,7,7) >> aa<- file("Master","r", raw=TRUE) >> readChar(aa, cc, useBytes = FALSE) > [1] "E123456" "\nE23456" "7\nE3456" "78\nE456" "789\nE56" > "7891\nE6" "78910\nE" >> close(aa) >> unlink("Master")> txt <- "E123456E234567E345678E456789E567891E678910E" # You could use readLines to bring in from the file # and assign to a character vector for work in R. > gsub("(E[[:digit:]]{6})", "\\1\n", txt) [1] "E123456\nE234567\nE345678\nE456789\nE567891\nE678910\nE" # Seems to be "working" properly > ?scan > scan(textConnection(gsub("(E[[:digit:]]{6})", "\\1\n", txt)), what="character") Read 7 items [1] "E123456" "E234567" "E345678" "E456789" "E567891" "E678910" "E" You might be able to use read.table or variants.> > The biggest issue is I am getting \n added into the string, which I > am not sure where it is coming from, and splices the strings. Any > suggestions on getting rid of the /n and create an infinite sequence > of 7's for the string length for the cc vector? Is there a better > way to do this? > > Sarah >David Winsemius, MD West Hartford, CT
Isn't all you need read.fwf? ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Kalicin, Sarah [sarah.kalicin at intel.com] Sent: 06 April 2011 09:48 To: r-help at r-project.org Subject: [R] Pulling strings from a Flat file Hi, I have a flat file that contains a bunch of strings that look like this. The file was originally in Unix and brought over into Windows: E123456E234567E345678E456789E567891E678910E. . . . Basically the string starts with E and is followed with 6 numbers. One string=E123456, length=7 characters. This file contains 10,000's of these strings. I want to separate them into one vector the length of the number of strings in the flat file, where each string is it's on unique value. cc<-c(7,7,7,7,7,7,7)> aa<- file("Master","r", raw=TRUE) > readChar(aa, cc, useBytes = FALSE)[1] "E123456" "\nE23456" "7\nE3456" "78\nE456" "789\nE56" "7891\nE6" "78910\nE"> close(aa) > unlink("Master")The biggest issue is I am getting \n added into the string, which I am not sure where it is coming from, and splices the strings. Any suggestions on getting rid of the /n and create an infinite sequence of 7's for the string length for the cc vector? Is there a better way to do this? Sarah [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.