HI all, I am trying to read a csv file, but have a problem in the row names. After reading, the name of the first column is now "row.names" and all other column names are shifted to the right. The value of the last column become all NAs( as an extra column). My sample data looks like as follow, filename = dat.csv The first row has a missing value at column 3 and 5. The last row has a missing value at column 1 and 5 x1,x2,x3,x4,x5 12,13,,14,, 22,23,24,25,26 ,33,34,34, To read the file I used this dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char = "", quote = "", stringsAsFactors = FALSE) The output from the above is dsh row.names x1 x2 x3 x4 x5 1 12 13 NA 14 NA NA 2 22 23 24 25 26 NA 3 33 34 34 NA NA The name of teh frist column is row,banes and all values of last columns is NAs However, the desired output should be x1 x2 x3 x4 x5 12 13 NA 14 NA 22 23 24 25 26 NA 33 34 34 NA How can I fix this? Thank you in advance
Your file has 5 commas in the first data row, but only 4 in the header. R interprets this to mean your first column is intended to be row names (has no corresponding column label) rather than data. (Row names are "outside" the data frame... use str(dsh) to get a better picture.) Basically, your file does not conform to consistent practices for csv files of having the same number of commas in every row. If at all possible I would eliminate the extra comma. If you have many of these broken files, you might need to read the data in pieces... e.g. dsh <- read.csv( "dat.csv", header=FALSE, skip=1 ) dsh <- dsh[ , -length( dsh ) ] dshh <- read.csv( "dat.csv", header=TRUE, nrow=1) names( dsh ) <- names( dshh ) On Fri, 9 Nov 2018, Val wrote:> HI all, > I am trying to read a csv file, but have a problem in the row names. > After reading, the name of the first column is now "row.names" and > all other column names are shifted to the right. The value of the last > column become all NAs( as an extra column). > > My sample data looks like as follow, > filename = dat.csv > The first row has a missing value at column 3 and 5. The last row has > a missing value at column 1 and 5 > x1,x2,x3,x4,x5 > 12,13,,14,, > 22,23,24,25,26 > ,33,34,34, > To read the file I used this > > dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char > = "", quote = "", stringsAsFactors = FALSE) > > The output from the above is > dsh > > row.names x1 x2 x3 x4 x5 > 1 12 13 NA 14 NA NA > 2 22 23 24 25 26 NA > 3 33 34 34 NA NA > > The name of teh frist column is row,banes and all values of last columns is NAs > > > However, the desired output should be > x1 x2 x3 x4 x5 > 12 13 NA 14 NA > 22 23 24 25 26 > NA 33 34 34 NA > > > How can I fix this? > Thank you in advance > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Hello, I've just tested Jeff's solution, it works but the second code line should be dsh <- sh[ , -length( sh ) ] (dsh doesn't exist yet.) Hope this helps, Rui Barradas ?s 02:46 de 10/11/2018, Jeff Newmiller escreveu:> Your file has 5 commas in the first data row, but only 4 in the header. > R interprets this to mean your first column is intended to be row names > (has no corresponding column label) rather than data. (Row names are > "outside" the data frame... use str(dsh) to get a better picture.) > > Basically, your file does not conform to consistent practices for csv > files of having the same number of commas in every row. If at all > possible I would eliminate the extra comma. If you have many of these > broken files, you might need to read the data in pieces... e.g. > > dsh <- read.csv( "dat.csv", header=FALSE, skip=1 ) > dsh <- dsh[ , -length( dsh ) ] > dshh <- read.csv( "dat.csv", header=TRUE, nrow=1) > names( dsh ) <- names( dshh ) > > On Fri, 9 Nov 2018, Val wrote: > >> HI all, >> I am trying to read a csv file, but? have a problem in the row names. >> After reading, the name of the first column is now "row.names" and >> all other column names are shifted to the right. The value of the last >> column become all NAs( as an extra column). >> >> My sample data looks like as follow, >> filename = dat.csv >> The first row has a missing value at column 3 and 5. The last row has >> a missing value at column 1 and? 5 >> x1,x2,x3,x4,x5 >> 12,13,,14,, >> 22,23,24,25,26 >> ,33,34,34, >> To read the file I used this >> >> dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char >> >> = "", quote = "", stringsAsFactors = FALSE) >> >> The output? from the above? is >> dsh >> >> row.names x1 x2 x3 x4 x5 >> 1??????? 12 13 NA 14 NA? NA >> 2??????? 22 23 24 25 26? NA >> 3???????????? 33 34 34 NA? NA >> >> The name of teh frist column is row,banes and all values of last >> columns is NAs >> >> >> However, the desired output should be >> x1 x2 x3 x4 x5 >> 12 13 NA 14 NA >> 22 23 24 25 26 >> NA 33 34 34 NA >> >> >> How can I fix this? >> Thank you in advance >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > --------------------------------------------------------------------------- > Jeff Newmiller??????????????????????? The???? .....?????? .....? Go Live... > DCN:<jdnewmil at dcn.davis.ca.us>??????? Basics: ##.#.?????? ##.#.? Live Go... > ????????????????????????????????????? Live:?? OO#.. Dead: OO#..? Playing > Research Engineer (Solar/Batteries??????????? O.O#.?????? #.O#.? with > /Software/Embedded Controllers)?????????????? .OO#.?????? .OO#.? rocks...1k > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
readr::read_csv produces the desired result by default: readr::read_csv("x1,x2,x3,x4,x5 12,13,,14,, 22,23,24,25,26 ,33,34,34,") Best, Ista On Fri, Nov 9, 2018 at 8:40 PM Val <valkremk at gmail.com> wrote:> > HI all, > I am trying to read a csv file, but have a problem in the row names. > After reading, the name of the first column is now "row.names" and > all other column names are shifted to the right. The value of the last > column become all NAs( as an extra column). > > My sample data looks like as follow, > filename = dat.csv > The first row has a missing value at column 3 and 5. The last row has > a missing value at column 1 and 5 > x1,x2,x3,x4,x5 > 12,13,,14,, > 22,23,24,25,26 > ,33,34,34, > To read the file I used this > > dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char > = "", quote = "", stringsAsFactors = FALSE) > > The output from the above is > dsh > > row.names x1 x2 x3 x4 x5 > 1 12 13 NA 14 NA NA > 2 22 23 24 25 26 NA > 3 33 34 34 NA NA > > The name of teh frist column is row,banes and all values of last columns is NAs > > > However, the desired output should be > x1 x2 x3 x4 x5 > 12 13 NA 14 NA > 22 23 24 25 26 > NA 33 34 34 NA > > > How can I fix this? > Thank you in advance > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you Jeff and all. My data is very messy and it is nice trick suggested by Jeff to handle it On Fri, Nov 9, 2018 at 8:42 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> > Your file has 5 commas in the first data row, but only 4 in the header. R > interprets this to mean your first column is intended to be row names (has > no corresponding column label) rather than data. (Row names are "outside" > the data frame... use str(dsh) to get a better picture.) > > Basically, your file does not conform to consistent practices for csv > files of having the same number of commas in every row. If at all possible > I would eliminate the extra comma. If you have many of these broken files, > you might need to read the data in pieces... e.g. > > dsh <- read.csv( "dat.csv", header=FALSE, skip=1 ) > dsh <- dsh[ , -length( dsh ) ] > dshh <- read.csv( "dat.csv", header=TRUE, nrow=1) > names( dsh ) <- names( dshh ) > > On Fri, 9 Nov 2018, Val wrote: > > > HI all, > > I am trying to read a csv file, but have a problem in the row names. > > After reading, the name of the first column is now "row.names" and > > all other column names are shifted to the right. The value of the last > > column become all NAs( as an extra column). > > > > My sample data looks like as follow, > > filename = dat.csv > > The first row has a missing value at column 3 and 5. The last row has > > a missing value at column 1 and 5 > > x1,x2,x3,x4,x5 > > 12,13,,14,, > > 22,23,24,25,26 > > ,33,34,34, > > To read the file I used this > > > > dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char > > = "", quote = "", stringsAsFactors = FALSE) > > > > The output from the above is > > dsh > > > > row.names x1 x2 x3 x4 x5 > > 1 12 13 NA 14 NA NA > > 2 22 23 24 25 26 NA > > 3 33 34 34 NA NA > > > > The name of teh frist column is row,banes and all values of last columns is NAs > > > > > > However, the desired output should be > > x1 x2 x3 x4 x5 > > 12 13 NA 14 NA > > 22 23 24 25 26 > > NA 33 34 34 NA > > > > > > How can I fix this? > > Thank you in advance > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ---------------------------------------------------------------------------