Hi I have data like 1, A, 24, The Red House 2, A, 25, King's Home, by the Sea ... I'd like to read this in as three variables. I first tried temp <-read.csv(addresses, sep = "," ) it worked but line 2 was broken after King's Home, and by the Sea as placed in another line. and so i eneded up with more number of rows than in the data. when i tried temp <-read.csv(addresses, sep = "," , flush = TRUE) i got things right number of rows, but column 3 was truncated till the 3rd comma. Is there a way i can specify to R that "King's Home, by the Sea" is one word? u r pal, al _________________________________________________________________ All-in-one security and maintenance for your PC.? Get a free 90-day trial!
> 1, A, 24, The Red House > 2, A, 25, King's Home, by the Sea > ...> Is there a way i can specify to R that "King's Home, by the Sea" is one > word?Yes: It needs to be quoted in the file: 2, A, 25, "King's Home, by the Sea" cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich
Alexander Nervedi wrote:> Hi > > I have data like > > 1, A, 24, The Red House > 2, A, 25, King's Home, by the Sea > ... > > I'd like to read this in as three variables. I first tried > > temp <-read.csv(addresses, sep = "," ) it worked but line 2 was broken > after King's Home, and by the Sea as placed in another line. and so i > eneded up with more number of rows than in the data. when i tried > temp <-read.csv(addresses, sep = "," , flush = TRUE) i got things > right number of rows, but column 3 was truncated till the 3rd comma. > > Is there a way i can specify to R that "King's Home, by the Sea" is > one word? > > u r pal, alHi, If you know that the "guilty" column will be the last one, you can always try to make up your own read function using readLines : do.call( rbind, lapply( strsplit(readLines("data.txt"), "," ), function(x) { gsub("^[[:space:]]|[[:space:]]$", "", # just removing leading and trailing spaces c( head(x,3), paste(tail(x, -3), collapse=","))) } ) ) Cheers, Romain -- Mango Solutions Tel +44 1249 467 467 Fax +44 1249 467 468 Mob +44 7813 526 123 data analysis that delivers
Preprocess the file surrounding the last field with quotes. The
xx <- line is for purposes of making it self contained here and in
reality would
be replaced with the commented
line above it. It simply reads the text in.
The yy <- line
then surrounds the fourth field with double quotes assuming
at least a space separates the fields as in your example.
Finally read.table reads it now that the fourth field
has been protected with quotes and closeAllConnections
closes the dangling connections.
# test data
Lines <- "1, A, 24, The Red House
2, A, 25, King's Home, by the Sea
"
# xx <- readLines("myfile.dat")
xx <- readLines(textConnection(Lines))
yy <- gsub('^( *[^ ]+ [^ ]+ [^ ]+ )(.*)',
'\\1"\\2"', xx)
read.table(textConnection(yy), sep = ",")
closeAllConnections()
On 11/29/06, Alexander Nervedi <alexnerdy at hotmail.com>
wrote:> Hi
>
> I have data like
>
> 1, A, 24, The Red House
> 2, A, 25, King's Home, by the Sea
> ...
>
> I'd like to read this in as three variables. I first tried
>
> temp <-read.csv(addresses, sep = "," ) it worked but line 2
was broken after
> King's Home, and by the Sea as placed in another line. and so i eneded
up
> with more number of rows than in the data. when i tried
> temp <-read.csv(addresses, sep = "," , flush = TRUE) i got
things right
> number of rows, but column 3 was truncated till the 3rd comma.
>
> Is there a way i can specify to R that "King's Home, by the
Sea" is one
> word?
>
> u r pal, al
>
> _________________________________________________________________
> All-in-one security and maintenance for your PC. Get a free 90-day trial!
>
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>