Manoranjan Muthusamy
2018-Oct-03 10:05 UTC
[R] R - Reading a horizontally arranged csv file, replacing values and saving again
Thanks for the short but informative answer, Bill. But still, each row has four columns..right? Although the *NA* is replaced by a blank cell, because of the extra comma it still is a four column row. Is there any way to avoid/remove the extra comma when NA is replaced which will make it a three column row? On Mon, Oct 1, 2018 at 4:56 PM William Dunlap <wdunlap at tibco.com> wrote:> Read the help file for write.table > ?write.table > and look at the descriptions of its arguments. > > df <- data.frame(Text=c("Abe","Ben",NA,"David"), Age=c(19, NA, 12, 10)) > > write.table(file=stdout(), t(df), sep=",") > "V1","V2","V3","V4" > "Text","Abe","Ben",NA,"David" > "Age","19",NA,"12","10" > > write.table(file=stdout(), t(df), quote=FALSE, na="", sep=",") > V1,V2,V3,V4 > Text,Abe,Ben,,David > Age,19,,12,10 > > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Mon, Oct 1, 2018 at 2:24 AM, Manoranjan Muthusamy < > ranjanmano167 at gmail.com> wrote: > >> I have a horizontally arranged CSV file >> <https://www.dropbox.com/s/0ukyuifvpq1olqk/samplefile.csv?dl=0> ( >> samplefile.csv) with headers are in the first column. Also, each row has a >> different number of columns. I want to read this CSV file, replace one of >> the cell value and save again as a CSV file with the same format as the >> original file with exactly same number of columns and rows. It sounds like >> a simple task, but I am struggling to find a way. I tried to do this with >> the help of this >> < >> https://stackoverflow.com/questions/17288197/reading-a-csv-file-organized-horizontally >> > >> and this >> < >> https://stackoverflow.com/questions/20308621/how-do-i-write-a-csv-file-in-r-where-my-input-is-written-to-the-file-as-row >> > >> posts, >> but still, I can't get the output the way I want. Can somebody help me >> here? >> >> My attempt using the answer in this post >> < >> https://stackoverflow.com/questions/17288197/reading-a-csv-file-organized-horizontally >> > >> to >> read the CSV file (samplefile.csv) gives me the following output where >> headers are kinda messed up and empty cells are replaced with NA values >> which is not what I want >> >> aaa <- read.tcsv("samplefile.csv") >> aaa >> >> Header.1 Header.2..units. Header.3..units. Header.3..units..11 >> Some text 0.0625 0 2648962 >> <NA> 0.0625 1200 6647473 <NA> >> 0.0625 1380 14 <NA> >> 0.2500 1500 15 <NA> >> 0.6250 1620 NA6 <NA> >> 1.3125 1740 NA7 <NA> >> 2.4375 1860 NA8 <NA> >> 3.5625 1980 NA9 <NA> >> 4.6250 2100 NA10 <NA> >> 5.0000 2220 NA11 <NA> >> 5.0000 2340 NA12 <NA> >> 4.6250 2460 NA13 <NA> >> 3.5625 2580 NA14 <NA> >> 2.4375 2700 NA15 <NA> >> 1.3125 2820 NA16 <NA> >> 0.6250 2940 NA17 <NA> >> 0.2500 3060 NA18 <NA> >> 0.0625 3180 NA19 <NA> >> 0.0000 3300 NA20 <NA> >> 0.0000 18000 NA >> >> Also, I am not sure how to go back to original format when I save the file >> again after a modification (for example after replacing a cell value) >> >> I tried saving the file again by using t (transpose) as given below >> >> write.csv(t(aaa), file ="samplefile_e.csv", row.names=T) >> >> but still, there are following issues in the saved file >> >> 1. messed up headers >> 2. empty cells are replaced with NA >> 3. when I open the file in a text editor all the values are shown as >> characters >> >> Thanks, >> Mano >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]
William Dunlap
2018-Oct-03 16:51 UTC
[R] R - Reading a horizontally arranged csv file, replacing values and saving again
Using read.table to read data with a variable number of entries per row is probably a mistake - data.frames (which read.table returns) are not meant for this sort of data. You want to store the data in R as a list with names. E.g., readYourFormat <- function(connection) { # connection is a file name or connection object referring # to text of the form # Header A,Some text # Header B,<number>,<number>,<number> # Header C,<number> # Header D,<number>,<number>,<number>,<number> lines <- readLines(connection) lines <- lines[ nchar(lines)>0 ] # omit empty lines fields <- strsplit(lines, ",") # split lines by commas into fields headers <- vapply(fields, function(x)x[1], FUN.VALUE="") # first entry is header names(fields) <- headers contents <- lapply(fields, function(x)x[-1]) # remaining entries are contents contents[-1] <- lapply(contents[-1], as.numeric) # convert all but first line's entries to numeric contents } After reading the data and fiddling with it (but keeping it a named list), you can write it out in a similar format with writeYourFormat <- function (data, connection = stdout()) { stopifnot(is.list(data), !is.null(names(data))) cat(file = connection, sep = "\n", paste(sep = ",", names(data), vapply(data, paste, collapse = ",", FUN.VALUE = ""))) } Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Oct 3, 2018 at 3:05 AM, Manoranjan Muthusamy < ranjanmano167 at gmail.com> wrote:> Thanks for the short but informative answer, Bill. But still, each row has > four columns..right? Although the *NA* is replaced by a blank cell, > because of the extra comma it still is a four column row. Is there any way > to avoid/remove the extra comma when NA is replaced which will make it a > three column row? > > > > On Mon, Oct 1, 2018 at 4:56 PM William Dunlap <wdunlap at tibco.com> wrote: > >> Read the help file for write.table >> ?write.table >> and look at the descriptions of its arguments. >> > df <- data.frame(Text=c("Abe","Ben",NA,"David"), Age=c(19, NA, 12, >> 10)) >> > write.table(file=stdout(), t(df), sep=",") >> "V1","V2","V3","V4" >> "Text","Abe","Ben",NA,"David" >> "Age","19",NA,"12","10" >> > write.table(file=stdout(), t(df), quote=FALSE, na="", sep=",") >> V1,V2,V3,V4 >> Text,Abe,Ben,,David >> Age,19,,12,10 >> >> >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> On Mon, Oct 1, 2018 at 2:24 AM, Manoranjan Muthusamy < >> ranjanmano167 at gmail.com> wrote: >> >>> I have a horizontally arranged CSV file >>> <https://www.dropbox.com/s/0ukyuifvpq1olqk/samplefile.csv?dl=0> ( >>> samplefile.csv) with headers are in the first column. Also, each row has >>> a >>> different number of columns. I want to read this CSV file, replace one of >>> the cell value and save again as a CSV file with the same format as the >>> original file with exactly same number of columns and rows. It sounds >>> like >>> a simple task, but I am struggling to find a way. I tried to do this with >>> the help of this >>> <https://stackoverflow.com/questions/17288197/reading-a- >>> csv-file-organized-horizontally> >>> and this >>> <https://stackoverflow.com/questions/20308621/how-do-i- >>> write-a-csv-file-in-r-where-my-input-is-written-to-the-file-as-row> >>> posts, >>> but still, I can't get the output the way I want. Can somebody help me >>> here? >>> >>> My attempt using the answer in this post >>> <https://stackoverflow.com/questions/17288197/reading-a- >>> csv-file-organized-horizontally> >>> to >>> read the CSV file (samplefile.csv) gives me the following output where >>> headers are kinda messed up and empty cells are replaced with NA values >>> which is not what I want >>> >>> aaa <- read.tcsv("samplefile.csv") >>> aaa >>> >>> Header.1 Header.2..units. Header.3..units. Header.3..units..11 >>> Some text 0.0625 0 2648962 >>> <NA> 0.0625 1200 6647473 <NA> >>> 0.0625 1380 14 <NA> >>> 0.2500 1500 15 <NA> >>> 0.6250 1620 NA6 <NA> >>> 1.3125 1740 NA7 <NA> >>> 2.4375 1860 NA8 <NA> >>> 3.5625 1980 NA9 <NA> >>> 4.6250 2100 NA10 <NA> >>> 5.0000 2220 NA11 <NA> >>> 5.0000 2340 NA12 <NA> >>> 4.6250 2460 NA13 <NA> >>> 3.5625 2580 NA14 <NA> >>> 2.4375 2700 NA15 <NA> >>> 1.3125 2820 NA16 <NA> >>> 0.6250 2940 NA17 <NA> >>> 0.2500 3060 NA18 <NA> >>> 0.0625 3180 NA19 <NA> >>> 0.0000 3300 NA20 <NA> >>> 0.0000 18000 NA >>> >>> Also, I am not sure how to go back to original format when I save the >>> file >>> again after a modification (for example after replacing a cell value) >>> >>> I tried saving the file again by using t (transpose) as given below >>> >>> write.csv(t(aaa), file ="samplefile_e.csv", row.names=T) >>> >>> but still, there are following issues in the saved file >>> >>> 1. messed up headers >>> 2. empty cells are replaced with NA >>> 3. when I open the file in a text editor all the values are shown as >>> characters >>> >>> Thanks, >>> Mano >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/ >>> posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >>[[alternative HTML version deleted]]
Manoranjan Muthusamy
2018-Oct-04 16:03 UTC
[R] R - Reading a horizontally arranged csv file, replacing values and saving again
That's exactly what I was looking for. Never used readLines before which is why I was struggling. Many thanks. I asked the same question in Stack Overflow as well. But no answers yet. I will post the link to your answer. Thanks again. On Wed, Oct 3, 2018 at 5:51 PM William Dunlap <wdunlap at tibco.com> wrote:> Using read.table to read data with a variable number of entries per row is > probably a mistake - data.frames (which read.table returns) are not meant > for this sort of data. You want to store the data in R as a list with > names. E.g., > > readYourFormat <- function(connection) { > # connection is a file name or connection object referring > # to text of the form > # Header A,Some text > # Header B,<number>,<number>,<number> > # Header C,<number> > # Header D,<number>,<number>,<number>,<number> > lines <- readLines(connection) > lines <- lines[ nchar(lines)>0 ] # omit empty lines > fields <- strsplit(lines, ",") # split lines by commas into fields > headers <- vapply(fields, function(x)x[1], FUN.VALUE="") # first entry > is header > names(fields) <- headers > contents <- lapply(fields, function(x)x[-1]) # remaining entries are > contents > contents[-1] <- lapply(contents[-1], as.numeric) # convert all but > first line's entries to numeric > contents > } > > After reading the data and fiddling with it (but keeping it a named list), > you can write it out in a similar format with > > writeYourFormat <- function (data, connection = stdout()) > { > stopifnot(is.list(data), !is.null(names(data))) > cat(file = connection, sep = "\n", paste(sep = ",", names(data), > vapply(data, paste, collapse = ",", FUN.VALUE = ""))) > } > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Oct 3, 2018 at 3:05 AM, Manoranjan Muthusamy < > ranjanmano167 at gmail.com> wrote: > >> Thanks for the short but informative answer, Bill. But still, each row >> has four columns..right? Although the *NA* is replaced by a blank cell, >> because of the extra comma it still is a four column row. Is there any way >> to avoid/remove the extra comma when NA is replaced which will make it a >> three column row? >> >> >> >> On Mon, Oct 1, 2018 at 4:56 PM William Dunlap <wdunlap at tibco.com> wrote: >> >>> Read the help file for write.table >>> ?write.table >>> and look at the descriptions of its arguments. >>> > df <- data.frame(Text=c("Abe","Ben",NA,"David"), Age=c(19, NA, 12, >>> 10)) >>> > write.table(file=stdout(), t(df), sep=",") >>> "V1","V2","V3","V4" >>> "Text","Abe","Ben",NA,"David" >>> "Age","19",NA,"12","10" >>> > write.table(file=stdout(), t(df), quote=FALSE, na="", sep=",") >>> V1,V2,V3,V4 >>> Text,Abe,Ben,,David >>> Age,19,,12,10 >>> >>> >>> >>> >>> Bill Dunlap >>> TIBCO Software >>> wdunlap tibco.com >>> >>> On Mon, Oct 1, 2018 at 2:24 AM, Manoranjan Muthusamy < >>> ranjanmano167 at gmail.com> wrote: >>> >>>> I have a horizontally arranged CSV file >>>> <https://www.dropbox.com/s/0ukyuifvpq1olqk/samplefile.csv?dl=0> ( >>>> samplefile.csv) with headers are in the first column. Also, each row >>>> has a >>>> different number of columns. I want to read this CSV file, replace one >>>> of >>>> the cell value and save again as a CSV file with the same format as the >>>> original file with exactly same number of columns and rows. It sounds >>>> like >>>> a simple task, but I am struggling to find a way. I tried to do this >>>> with >>>> the help of this >>>> < >>>> https://stackoverflow.com/questions/17288197/reading-a-csv-file-organized-horizontally >>>> > >>>> and this >>>> < >>>> https://stackoverflow.com/questions/20308621/how-do-i-write-a-csv-file-in-r-where-my-input-is-written-to-the-file-as-row >>>> > >>>> posts, >>>> but still, I can't get the output the way I want. Can somebody help me >>>> here? >>>> >>>> My attempt using the answer in this post >>>> < >>>> https://stackoverflow.com/questions/17288197/reading-a-csv-file-organized-horizontally >>>> > >>>> to >>>> read the CSV file (samplefile.csv) gives me the following output where >>>> headers are kinda messed up and empty cells are replaced with NA values >>>> which is not what I want >>>> >>>> aaa <- read.tcsv("samplefile.csv") >>>> aaa >>>> >>>> Header.1 Header.2..units. Header.3..units. Header.3..units..11 >>>> Some text 0.0625 0 2648962 >>>> <NA> 0.0625 1200 6647473 <NA> >>>> 0.0625 1380 14 <NA> >>>> 0.2500 1500 15 <NA> >>>> 0.6250 1620 NA6 <NA> >>>> 1.3125 1740 NA7 <NA> >>>> 2.4375 1860 NA8 <NA> >>>> 3.5625 1980 NA9 <NA> >>>> 4.6250 2100 NA10 <NA> >>>> 5.0000 2220 NA11 <NA> >>>> 5.0000 2340 NA12 <NA> >>>> 4.6250 2460 NA13 <NA> >>>> 3.5625 2580 NA14 <NA> >>>> 2.4375 2700 NA15 <NA> >>>> 1.3125 2820 NA16 <NA> >>>> 0.6250 2940 NA17 <NA> >>>> 0.2500 3060 NA18 <NA> >>>> 0.0625 3180 NA19 <NA> >>>> 0.0000 3300 NA20 <NA> >>>> 0.0000 18000 NA >>>> >>>> Also, I am not sure how to go back to original format when I save the >>>> file >>>> again after a modification (for example after replacing a cell value) >>>> >>>> I tried saving the file again by using t (transpose) as given below >>>> >>>> write.csv(t(aaa), file ="samplefile_e.csv", row.names=T) >>>> >>>> but still, there are following issues in the saved file >>>> >>>> 1. messed up headers >>>> 2. empty cells are replaced with NA >>>> 3. when I open the file in a text editor all the values are shown as >>>> characters >>>> >>>> Thanks, >>>> Mano >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >[[alternative HTML version deleted]]