It might be a primitive question but I have a file of text and there is no separator between character on each line and the strings on each line have the same length. The format is like the following absfjdslf jfdldskjff jfsldfjslk When I read the file with read.table("myfile",colClasses = "character"), instead of putting the strings in a table of number of rows x length of string, read.table saves the file in a table of number of rows x 1 and each element seems to be a factor. Why does read.table not account for colClasses = "character"? thanks, Carol [[alternative HTML version deleted]]
Hi Carol,> It might be a primitive question but I have a file of text and there is no separator between character on each line and the strings on each line have the same length. The format is like the following > > absfjdslf > jfdldskjff > jfsldfjslk > > When I read the file with read.table("myfile",colClasses = "character"), instead of putting the strings in a table of number of rows x length of string, read.table saves the file in a table of number of rows x 1 and each element seems to be a factor. Why does read.table not account for colClasses = "character"?read.table relies on a separator to differentiate between columns, so it is not appropriate for your file, read.fwf would do the job. Setting colClasses (in my understanding) tells read.table how to treat input as it comes in - so it disables some testing of data types and makes reading quicker, it does not disable the setting of character data to be factors, which is the default. You need to use the stringsAsFactors=FALSE option for that. So, for your example (and I have added a letter to the first row to make it the same length as the others): cf <- "absfjdslfx jfdldskjff jfsldfjslk" cdf <- read.fwf(textConnection(cf),widths=rep(1,10),colClasses="character",stringsAsFactors=FALSE) See ?read.fwf for more information. A width is required for each column (in this case 1 repeated 10 times). Hope this helps. Ron.
Hi Carol I cannot reproduce what you're seeing.> tmp <- read.table(text = "absfjdslf+ jfdldskjff + jfsldfjslk")> str(tmp)'data.frame': 3 obs. of 1 variable: $ V1: Factor w/ 3 levels "absfjdslf","jfdldskjff",..: 1 2 3> tmp <- read.table(text = "absfjdslf+ jfdldskjff + jfsldfjslk", + colClasses = "character")> str(tmp)'data.frame': 3 obs. of 1 variable: $ V1: chr "absfjdslf" "jfdldskjff" "jfsldfjslk" Yours sincerely / Med venlig hilsen Frede Aakmann T?gersen Specialist, M.Sc., Ph.D. Plant Performance & Modeling Technology & Service Solutions T +45 9730 5135 M +45 2547 6050 frtog at vestas.com http://www.vestas.com Company reg. name: Vestas Wind Systems A/S This e-mail is subject to our e-mail disclaimer statement. Please refer to www.vestas.com/legal/notice If you have received this e-mail in error please contact the sender.> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of carol white > Sent: 26. juni 2014 09:33 > To: r-help at r-project.org > Subject: [R] read a file of text with read.table > > It might be a primitive question but I have a file of text and there is no > separator between character on each line and the strings on each line have > the same length. The format is like the following > > absfjdslf > jfdldskjff > jfsldfjslk > > When I read the file with read.table("myfile",colClasses = "character"), > instead of putting the strings in a table of number of rows x length of string, > read.table saves the file in a table of number of rows x 1 and each element > seems to be a factor. Why does read.table not account for? colClasses > "character"? > > thanks, > > Carol > [[alternative HTML version deleted]]
On 26/06/14 19:32, carol white wrote:> It might be a primitive questionAll questions are primitive; some questions are more primitive than others.> but I have a file of text and there > is no separator between character on each line and the strings on > each line have the same length. The format is like the following > absfjdslf > jfdldskjff > jfsldfjslk > > When I read the file with read.table("myfile",colClasses > "character"), instead of putting the strings in a table of number of > rows x length of string, read.table saves the file in a table of > number of rows x 1 and each element seems to be a factor. Why does > read.table not account for colClasses = "character"?(1) You might try setting stringsAsFactors=FALSE rather than colClasses = "character". (2) Since your "table" has only one column you might as well use scan() (with what="") and save wear and tear on the system. (3) In your example the strings do *not* have the same length; the first has 9 characters, the next two have 10 each. (4) Do you want to get a data frame each column of which is a single character? This was not clear from your email. Do you know how to do this? (It's easy --- when the string lengths are indeed all the same.) I appended a "g" to the first string and did: ttt <- scan("temp.txt",what="") sss <- strsplit(ttt,"") rrr <- as.data.frame(do.call(rbind,sss)) rrr V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 a b s f j d s l f g 2 j f d l d s k j f f 3 j f s l d f j s l k Is this what you want? cheers, Rolf Turner
Hello, Try using option stringsAsFactors = FALSE. Hope this helps, Rui Barradas On 26/06/2014 09:32, carol white wrote:> It might be a primitive question but I have a file of text and there is no separator between character on each line and the strings on each line have the same length. The format is like the following > > absfjdslf > jfdldskjff > jfsldfjslk > > When I read the file with read.table("myfile",colClasses = "character"), instead of putting the strings in a table of number of rows x length of string, read.table saves the file in a table of number of rows x 1 and each element seems to be a factor. Why does read.table not account for colClasses = "character"? > > thanks, > > Carol > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi, with read.fwf, it works. But I still don't understand why it doesn't work with read.table since the sep by default is "", which is the case and in one trial, I used read.table("myfile",colClasses = "character", stringsAsFactors=FALSE, and stil didn't work but it should have. Regards, On Thursday, June 26, 2014 9:59 AM, Ron Crump <R.E.Crump@warwick.ac.uk> wrote: Hi Carol,> It might be a primitive question but I have a file of text and there is no separator between character on each line and the strings on each line have the same length. The format is like the following > > absfjdslf > jfdldskjff > jfsldfjslk > > When I read the file with read.table("myfile",colClasses = "character"), instead of putting the strings in a table of number of rows x length of string, read.table saves the file in a table of number of rows x 1 and each element seems to be a factor. Why does read.table not account for colClasses = "character"?read.table relies on a separator to differentiate between columns, so it is not appropriate for your file, read.fwf would do the job. Setting colClasses (in my understanding) tells read.table how to treat input as it comes in - so it disables some testing of data types and makes reading quicker, it does not disable the setting of character data to be factors, which is the default. You need to use the stringsAsFactors=FALSE option for that. So, for your example (and I have added a letter to the first row to make it the same length as the others): cf <- "absfjdslfx jfdldskjff jfsldfjslk" cdf <- read.fwf(textConnection(cf),widths=rep(1,10),colClasses="character",stringsAsFactors=FALSE) See ?read.fwf for more information. A width is required for each column (in this case 1 repeated 10 times). Hope this helps. Ron. [[alternative HTML version deleted]]