Fix Ace
2017-Aug-28 02:56 UTC
[R] help with read.csv() for files with different number of columns
Hi, Jim, Thank you very much for pointing out the format issue. Here is the original text: ===I have a text file (test.txt) with different number of columns: 0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6 1100001G20Rik%%% Nmi 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 1700003E16Rik%%% Ascl2 Ifnar2 1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 Sox10 Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%% Zfp612 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6 I wold like to read it into R using > test=read.csv("test.txt",sep="\t",header=FALSE) However, when I check the r object "test", I found that all the rows have 5 columns:> test?? ? ? ? ? ? ? ? V1? ? ? ? ? ? V2? ? ? V3 ? ? V4? ? ? V51? 0610007P14Rik%%% ? ? ? ? Tcf19 ? Gtf2i? ? ? ? ? ? ? ?2? 0610010O12Rik%%%? ? ? Ivns1abp? ? Etv6? ? ? ? ? ? ? ?3? 1100001G20Rik%%% ? ? ? ? ? Nmi? ? ? ? ? ? ? ? ? ? ? ?4? 1500015O10Rik%%% ? ? ? ? Foxi1 ? Ascl3? Sirt3 ? ? ? ?5? 1700003E16Rik%%% ? ? ? ? Ascl2? Ifnar2? ? ? ? ? ? ? ?6? 1700028J19Rik%%%? ? ? ? ? Musk? Nfe2l3? ? ? ? ? ? ? ?7? 1810011O10Rik%%%? ? ? Ppp1r13b ? Bpnt1 Cdkn2c ? Foxc18 ? ? ? ? ? ? Sox10 ? ? ? Smarca2? ? ? ? ? ? ? ? ? ? ? ?9? 1810019D21Rik%%%? ? ? ? ? Asb8? ? ? ? ? ? ? ? ? ? ? ?10 1810037I17Rik%%%? ? ? ? Zfp612? ? ? ? ? ? ? ? ? ? ? ?11 1810055G02Rik%%%? ? ? ? Nkx2-3? Maged1? Runx1? ? Ugp212 ? ? ? ? ? ? Elk4 ? ? ? ? Spdef ? Tcf19 ? Isl2 ? Gtf2i13? ? ? ? ? Ctnnbl1 ? ? ? ? Tcea3? ? Ank2 Zfp612 Creb3l114? ? ? ? ? ? Nupr1 3632451O06Rik Creb3l4? Lass6?Basically it breaks some rows into more than one rows. For example, row 7 in the original record becomes two rows. Looks like the "test" always has 5 columns.? How does this happen? How should I fix it to make one record into one two in R object? =Please let?me know if it is readable now. Thank you very much for your time! Kind regards, Ace On Sunday, August 27, 2017 7:25 PM, Jim Lemon <drjimlemon at gmail.com> wrote: Hi Ace, As your example seems to have spaces as separators, testdf<-read.table("test.txt",header=FALSE,fill=TRUE, col.names=paste("V",1:14,sep=""),stringsAsFactors=FALSE) By specifying the number of columns with "col.names" and using "fill=TRUE" you can get a data frame with zero length strings where values are missing in the input file. Jim On Mon, Aug 28, 2017 at 6:25 AM, Fix Ace via R-help <r-help at r-project.org> wrote:> Dear R community, > I have a text file (test.txt) with different number of columns: > 0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6 1100001G20Rik%%% Nmi 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 1700003E16Rik%%% Ascl2 Ifnar2 1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 Sox10 Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%% Zfp612 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6 > I wold like to read it into R using >? > test=read.csv("test.txt",sep="\t",header=FALSE) > However, when I check the r object "test", I found that all the rows have 5 columns: >> test? ? ? ? ? ? ? ? V1? ? ? ? ? ? V2? ? ? V3? ? V4? ? ? V51? 0610007P14Rik%%%? ? ? ? Tcf19? Gtf2i? ? ? ? ? ? ? 2? 0610010O12Rik%%%? ? ? Ivns1abp? ? Etv6? ? ? ? ? ? ? 3? 1100001G20Rik%%%? ? ? ? ? Nmi? ? ? ? ? ? ? ? ? ? ? 4? 1500015O10Rik%%%? ? ? ? Foxi1? Ascl3? Sirt3? ? ? ? 5? 1700003E16Rik%%%? ? ? ? Ascl2? Ifnar2? ? ? ? ? ? ? 6? 1700028J19Rik%%%? ? ? ? ? Musk? Nfe2l3? ? ? ? ? ? ? 7? 1810011O10Rik%%%? ? ? Ppp1r13b? Bpnt1 Cdkn2c? Foxc18? ? ? ? ? ? Sox10? ? ? Smarca2? ? ? ? ? ? ? ? ? ? ? 9? 1810019D21Rik%%%? ? ? ? ? Asb8? ? ? ? ? ? ? ? ? ? ? 10 1810037I17Rik%%%? ? ? ? Zfp612? ? ? ? ? ? ? ? ? ? ? 11 1810055G02Rik%%%? ? ? ? Nkx2-3? Maged1? Runx1? ? Ugp212? ? ? ? ? ? Elk4? ? ? ? Spdef? Tcf19? Isl2? Gtf2i13? ? ? ? ? Ctnnbl1? ? ? ? Tcea3? ? Ank2 Zfp612 Creb3l114? ? ? ? ? ? Nupr1 3632451O06Rik Creb3l4? Lass6 > Basically it breaks some rows into more than one rows. For example, row 7 in the original record becomes two rows. Looks like the "test" always has 5 columns. > How does this happen? How should I fix it to make one record into one two in R object? > Thank you very much! > Ace > > > > > > > > >? ? ? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Jim Lemon
2017-Aug-28 04:56 UTC
[R] help with read.csv() for files with different number of columns
Hi Ace, With tabs as separators: testdf<-read.table("test.txt",header=FALSE,fill=TRUE,sep="\t", col.names=paste("V",1:19,sep=""),stringsAsFactors=FALSE) Also note that I got the number of columns wrong the first time. Jim On Mon, Aug 28, 2017 at 12:56 PM, Fix Ace <acefix at rocketmail.com> wrote:> Hi, Jim, > > Thank you very much for pointing out the format issue. Here is the original > text: > > ==> I have a text file (test.txt) with different number of columns: > > 0610007P14Rik%%% Tcf19 Gtf2i > 0610010O12Rik%%% Ivns1abp Etv6 > 1100001G20Rik%%% Nmi > 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 > 1700003E16Rik%%% Ascl2 Ifnar2 > 1700028J19Rik%%% Musk Nfe2l3 > 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 Sox10 Smarca2 > 1810019D21Rik%%% Asb8 > 1810037I17Rik%%% Zfp612 > 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i > Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6 > > I wold like to read it into R using > >> test=read.csv("test.txt",sep="\t",header=FALSE) > > However, when I check the r object "test", I found that all the rows have 5 > columns: > >> test > V1 V2 V3 V4 V5 > 1 0610007P14Rik%%% Tcf19 Gtf2i > 2 0610010O12Rik%%% Ivns1abp Etv6 > 3 1100001G20Rik%%% Nmi > 4 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 > 5 1700003E16Rik%%% Ascl2 Ifnar2 > 6 1700028J19Rik%%% Musk Nfe2l3 > 7 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 > 8 Sox10 Smarca2 > 9 1810019D21Rik%%% Asb8 > 10 1810037I17Rik%%% Zfp612 > 11 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 > 12 Elk4 Spdef Tcf19 Isl2 Gtf2i > 13 Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 > 14 Nupr1 3632451O06Rik Creb3l4 Lass6 > > Basically it breaks some rows into more than one rows. For example, row 7 in > the original record becomes two rows. Looks like the "test" always has 5 > columns. > > How does this happen? How should I fix it to make one record into one two in > R object? > > => > Please let me know if it is readable now. Thank you very much for your time! > > Kind regards, > > Ace > > > On Sunday, August 27, 2017 7:25 PM, Jim Lemon <drjimlemon at gmail.com> wrote: > > > Hi Ace, > As your example seems to have spaces as separators, > > testdf<-read.table("test.txt",header=FALSE,fill=TRUE, > col.names=paste("V",1:14,sep=""),stringsAsFactors=FALSE) > > By specifying the number of columns with "col.names" and using > "fill=TRUE" you can get a data frame with zero length strings where > values are missing in the input file. > > Jim > > On Mon, Aug 28, 2017 at 6:25 AM, Fix Ace via R-help > <r-help at r-project.org> wrote: >> Dear R community, >> I have a text file (test.txt) with different number of columns: >> 0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6 >> 1100001G20Rik%%% Nmi 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 1700003E16Rik%%% >> Ascl2 Ifnar2 1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b Bpnt1 >> Cdkn2c Foxc1 Sox10 Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%% Zfp612 >> 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i >> Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6 >> I wold like to read it into R using >> > test=read.csv("test.txt",sep="\t",header=FALSE) >> However, when I check the r object "test", I found that all the rows have >> 5 columns: >>> test V1 V2 V3 V4 V51 >>> 0610007P14Rik%%% Tcf19 Gtf2i 2 0610010O12Rik%%% >>> Ivns1abp Etv6 3 1100001G20Rik%%% Nmi >>> 4 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 5 1700003E16Rik%%% >>> Ascl2 Ifnar2 6 1700028J19Rik%%% Musk Nfe2l3 >>> 7 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc18 Sox10 >>> Smarca2 9 1810019D21Rik%%% Asb8 >>> 10 1810037I17Rik%%% Zfp612 11 1810055G02Rik%%% >>> Nkx2-3 Maged1 Runx1 Ugp212 Elk4 Spdef Tcf19 Isl2 >>> Gtf2i13 Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l114 >>> Nupr1 3632451O06Rik Creb3l4 Lass6 >> Basically it breaks some rows into more than one rows. For example, row 7 >> in the original record becomes two rows. Looks like the "test" always has 5 >> columns. >> How does this happen? How should I fix it to make one record into one two >> in R object? >> Thank you very much! >> Ace > >> >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >
Fix Ace
2017-Aug-29 16:22 UTC
[R] help with read.csv() for files with different number of columns
Thank you very much! Looks like I have to know the length of each record ahead of time. Ace On Monday, August 28, 2017 12:56 AM, Jim Lemon <drjimlemon at gmail.com> wrote: Hi Ace, With tabs as separators: testdf<-read.table("test.txt",header=FALSE,fill=TRUE,sep="\t", col.names=paste("V",1:19,sep=""),stringsAsFactors=FALSE) Also note that I got the number of columns wrong the first time. Jim On Mon, Aug 28, 2017 at 12:56 PM, Fix Ace <acefix at rocketmail.com> wrote:> Hi, Jim, > > Thank you very much for pointing out the format issue. Here is the original > text: > > ==> I have a text file (test.txt) with different number of columns: > > 0610007P14Rik%%% Tcf19 Gtf2i > 0610010O12Rik%%% Ivns1abp Etv6 > 1100001G20Rik%%% Nmi > 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 > 1700003E16Rik%%% Ascl2 Ifnar2 > 1700028J19Rik%%% Musk Nfe2l3 > 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 Sox10 Smarca2 > 1810019D21Rik%%% Asb8 > 1810037I17Rik%%% Zfp612 > 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i > Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6 > > I wold like to read it into R using > >> test=read.csv("test.txt",sep="\t",header=FALSE) > > However, when I check the r object "test", I found that all the rows have 5 > columns: > >> test >? ? ? ? ? ? ? ? ? V1? ? ? ? ? ? V2? ? ? V3? ? V4? ? ? V5 > 1? 0610007P14Rik%%%? ? ? ? Tcf19? Gtf2i > 2? 0610010O12Rik%%%? ? ? Ivns1abp? ? Etv6 > 3? 1100001G20Rik%%%? ? ? ? ? Nmi > 4? 1500015O10Rik%%%? ? ? ? Foxi1? Ascl3? Sirt3 > 5? 1700003E16Rik%%%? ? ? ? Ascl2? Ifnar2 > 6? 1700028J19Rik%%%? ? ? ? ? Musk? Nfe2l3 > 7? 1810011O10Rik%%%? ? ? Ppp1r13b? Bpnt1 Cdkn2c? Foxc1 > 8? ? ? ? ? ? Sox10? ? ? Smarca2 > 9? 1810019D21Rik%%%? ? ? ? ? Asb8 > 10 1810037I17Rik%%%? ? ? ? Zfp612 > 11 1810055G02Rik%%%? ? ? ? Nkx2-3? Maged1? Runx1? ? Ugp2 > 12? ? ? ? ? ? Elk4? ? ? ? Spdef? Tcf19? Isl2? Gtf2i > 13? ? ? ? ? Ctnnbl1? ? ? ? Tcea3? ? Ank2 Zfp612 Creb3l1 > 14? ? ? ? ? ? Nupr1 3632451O06Rik Creb3l4? Lass6 > > Basically it breaks some rows into more than one rows. For example, row 7 in > the original record becomes two rows. Looks like the "test" always has 5 > columns. > > How does this happen? How should I fix it to make one record into one two in > R object? > > => > Please let me know if it is readable now. Thank you very much for your time! > > Kind regards, > > Ace > > > On Sunday, August 27, 2017 7:25 PM, Jim Lemon <drjimlemon at gmail.com> wrote: > > > Hi Ace, > As your example seems to have spaces as separators, > > testdf<-read.table("test.txt",header=FALSE,fill=TRUE, > col.names=paste("V",1:14,sep=""),stringsAsFactors=FALSE) > > By specifying the number of columns with "col.names" and using > "fill=TRUE" you can get a data frame with zero length strings where > values are missing in the input file. > > Jim > > On Mon, Aug 28, 2017 at 6:25 AM, Fix Ace via R-help > <r-help at r-project.org> wrote: >> Dear R community, >> I have a text file (test.txt) with different number of columns: >> 0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6 >> 1100001G20Rik%%% Nmi 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 1700003E16Rik%%% >> Ascl2 Ifnar2 1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b Bpnt1 >> Cdkn2c Foxc1 Sox10 Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%% Zfp612 >> 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i >> Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6 >> I wold like to read it into R using >>? > test=read.csv("test.txt",sep="\t",header=FALSE) >> However, when I check the r object "test", I found that all the rows have >> 5 columns: >>> test? ? ? ? ? ? ? ? V1? ? ? ? ? ? V2? ? ? V3? ? V4? ? ? V51 >>> 0610007P14Rik%%%? ? ? ? Tcf19? Gtf2i? ? ? ? ? ? ? 2? 0610010O12Rik%%% >>> Ivns1abp? ? Etv6? ? ? ? ? ? ? 3? 1100001G20Rik%%%? ? ? ? ? Nmi >>> 4? 1500015O10Rik%%%? ? ? ? Foxi1? Ascl3? Sirt3? ? ? ? 5? 1700003E16Rik%%% >>> Ascl2? Ifnar2? ? ? ? ? ? ? 6? 1700028J19Rik%%%? ? ? ? ? Musk? Nfe2l3 >>> 7? 1810011O10Rik%%%? ? ? Ppp1r13b? Bpnt1 Cdkn2c? Foxc18? ? ? ? ? ? Sox10 >>> Smarca2? ? ? ? ? ? ? ? ? ? ? 9? 1810019D21Rik%%%? ? ? ? ? Asb8 >>> 10 1810037I17Rik%%%? ? ? ? Zfp612? ? ? ? ? ? ? ? ? ? ? 11 1810055G02Rik%%% >>> Nkx2-3? Maged1? Runx1? ? Ugp212? ? ? ? ? ? Elk4? ? ? ? Spdef? Tcf19? Isl2 >>> Gtf2i13? ? ? ? ? Ctnnbl1? ? ? ? Tcea3? ? Ank2 Zfp612 Creb3l114 >>> Nupr1 3632451O06Rik Creb3l4? Lass6 >> Basically it breaks some rows into more than one rows. For example, row 7 >> in the original record becomes two rows. Looks like the "test" always has 5 >> columns. >> How does this happen? How should I fix it to make one record into one two >> in R object? >> Thank you very much! >> Ace > >> >> >> >> >> >> >> >> >>? ? ? ? [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Possibly Parallel Threads
- help with read.csv() for files with different number of columns
- help with read.csv() for files with different number of columns
- help with read.csv() for files with different number of columns
- help with read.csv() for files with different number of columns
- help with read.csv() for files with different number of columns