Hi I have data like 1, A, 24, The Red House 2, A, 25, King's Home, by the Sea ... I'd like to read this in as three variables. I first tried temp <-read.csv(addresses, sep = "," ) it worked but line 2 was broken after King's Home, and by the Sea as placed in another line. and so i eneded up with more number of rows than in the data. when i tried temp <-read.csv(addresses, sep = "," , flush = TRUE) i got things right number of rows, but column 3 was truncated till the 3rd comma. Is there a way i can specify to R that "King's Home, by the Sea" is one word? u r pal, al _________________________________________________________________ All-in-one security and maintenance for your PC.? Get a free 90-day trial!
> 1, A, 24, The Red House > 2, A, 25, King's Home, by the Sea > ...> Is there a way i can specify to R that "King's Home, by the Sea" is one > word?Yes: It needs to be quoted in the file: 2, A, 25, "King's Home, by the Sea" cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich
Alexander Nervedi wrote:> Hi > > I have data like > > 1, A, 24, The Red House > 2, A, 25, King's Home, by the Sea > ... > > I'd like to read this in as three variables. I first tried > > temp <-read.csv(addresses, sep = "," ) it worked but line 2 was broken > after King's Home, and by the Sea as placed in another line. and so i > eneded up with more number of rows than in the data. when i tried > temp <-read.csv(addresses, sep = "," , flush = TRUE) i got things > right number of rows, but column 3 was truncated till the 3rd comma. > > Is there a way i can specify to R that "King's Home, by the Sea" is > one word? > > u r pal, alHi, If you know that the "guilty" column will be the last one, you can always try to make up your own read function using readLines : do.call( rbind, lapply( strsplit(readLines("data.txt"), "," ), function(x) { gsub("^[[:space:]]|[[:space:]]$", "", # just removing leading and trailing spaces c( head(x,3), paste(tail(x, -3), collapse=","))) } ) ) Cheers, Romain -- Mango Solutions Tel +44 1249 467 467 Fax +44 1249 467 468 Mob +44 7813 526 123 data analysis that delivers
Preprocess the file surrounding the last field with quotes. The xx <- line is for purposes of making it self contained here and in reality would be replaced with the commented line above it. It simply reads the text in. The yy <- line then surrounds the fourth field with double quotes assuming at least a space separates the fields as in your example. Finally read.table reads it now that the fourth field has been protected with quotes and closeAllConnections closes the dangling connections. # test data Lines <- "1, A, 24, The Red House 2, A, 25, King's Home, by the Sea " # xx <- readLines("myfile.dat") xx <- readLines(textConnection(Lines)) yy <- gsub('^( *[^ ]+ [^ ]+ [^ ]+ )(.*)', '\\1"\\2"', xx) read.table(textConnection(yy), sep = ",") closeAllConnections() On 11/29/06, Alexander Nervedi <alexnerdy at hotmail.com> wrote:> Hi > > I have data like > > 1, A, 24, The Red House > 2, A, 25, King's Home, by the Sea > ... > > I'd like to read this in as three variables. I first tried > > temp <-read.csv(addresses, sep = "," ) it worked but line 2 was broken after > King's Home, and by the Sea as placed in another line. and so i eneded up > with more number of rows than in the data. when i tried > temp <-read.csv(addresses, sep = "," , flush = TRUE) i got things right > number of rows, but column 3 was truncated till the 3rd comma. > > Is there a way i can specify to R that "King's Home, by the Sea" is one > word? > > u r pal, al > > _________________________________________________________________ > All-in-one security and maintenance for your PC. Get a free 90-day trial! > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >