Fix Ace
2017-Aug-28  02:56 UTC
[R] help with read.csv() for files with different number of columns
Hi, Jim,
Thank you very much for pointing out the format issue. Here is the original
text:
===I have a text file (test.txt) with different number of columns:
0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6 1100001G20Rik%%% Nmi
1500015O10Rik%%% Foxi1 Ascl3 Sirt3 1700003E16Rik%%% Ascl2 Ifnar2
1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 Sox10
Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%% Zfp612 1810055G02Rik%%% Nkx2-3
Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1
Nupr1 3632451O06Rik Creb3l4 Lass6
I wold like to read it into R using
 > test=read.csv("test.txt",sep="\t",header=FALSE)
However, when I check the r object "test", I found that all the rows
have 5 columns:> test?? ? ? ? ? ? ? ? V1? ? ? ? ? ? V2? ? ? V3 ? ? V4? ? ? V51?
0610007P14Rik%%% ? ? ? ? Tcf19 ? Gtf2i? ? ? ? ? ? ? ?2? 0610010O12Rik%%%? ? ?
Ivns1abp? ? Etv6? ? ? ? ? ? ? ?3? 1100001G20Rik%%% ? ? ? ? ? Nmi? ? ? ? ? ? ? ?
? ? ? ?4? 1500015O10Rik%%% ? ? ? ? Foxi1 ? Ascl3? Sirt3 ? ? ? ?5?
1700003E16Rik%%% ? ? ? ? Ascl2? Ifnar2? ? ? ? ? ? ? ?6? 1700028J19Rik%%%? ? ? ?
? Musk? Nfe2l3? ? ? ? ? ? ? ?7? 1810011O10Rik%%%? ? ? Ppp1r13b ? Bpnt1 Cdkn2c ?
Foxc18 ? ? ? ? ? ? Sox10 ? ? ? Smarca2? ? ? ? ? ? ? ? ? ? ? ?9?
1810019D21Rik%%%? ? ? ? ? Asb8? ? ? ? ? ? ? ? ? ? ? ?10 1810037I17Rik%%%? ? ? ?
Zfp612? ? ? ? ? ? ? ? ? ? ? ?11 1810055G02Rik%%%? ? ? ? Nkx2-3? Maged1? Runx1? ?
Ugp212 ? ? ? ? ? ? Elk4 ? ? ? ? Spdef ? Tcf19 ? Isl2 ? Gtf2i13? ? ? ? ? Ctnnbl1
? ? ? ? Tcea3? ? Ank2 Zfp612 Creb3l114? ? ? ? ? ? Nupr1 3632451O06Rik Creb3l4?
Lass6?
Basically it breaks some rows into more than one rows. For example, row 7 in the
original record becomes two rows. Looks like the "test" always has 5
columns.?
How does this happen? How should I fix it to make one record into one two in R
object?
=Please let?me know if it is readable now. Thank you very much for your time!
Kind regards,
Ace 
    On Sunday, August 27, 2017 7:25 PM, Jim Lemon <drjimlemon at
gmail.com> wrote:
 
 Hi Ace,
As your example seems to have spaces as separators,
testdf<-read.table("test.txt",header=FALSE,fill=TRUE,
col.names=paste("V",1:14,sep=""),stringsAsFactors=FALSE)
By specifying the number of columns with "col.names" and using
"fill=TRUE" you can get a data frame with zero length strings where
values are missing in the input file.
Jim
On Mon, Aug 28, 2017 at 6:25 AM, Fix Ace via R-help
<r-help at r-project.org> wrote:> Dear R community,
> I have a text file (test.txt) with different number of columns:
> 0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6
1100001G20Rik%%% Nmi 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 1700003E16Rik%%% Ascl2
Ifnar2 1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1
Sox10 Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%% Zfp612 1810055G02Rik%%%
Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i Ctnnbl1 Tcea3 Ank2 Zfp612
Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6
> I wold like to read it into R using
>? > test=read.csv("test.txt",sep="\t",header=FALSE)
> However, when I check the r object "test", I found that all the
rows have 5 columns:
>> test? ? ? ? ? ? ? ? V1? ? ? ? ? ? V2? ? ? V3? ? V4? ? ? V51?
0610007P14Rik%%%? ? ? ? Tcf19? Gtf2i? ? ? ? ? ? ? 2? 0610010O12Rik%%%? ? ?
Ivns1abp? ? Etv6? ? ? ? ? ? ? 3? 1100001G20Rik%%%? ? ? ? ? Nmi? ? ? ? ? ? ? ? ?
? ? 4? 1500015O10Rik%%%? ? ? ? Foxi1? Ascl3? Sirt3? ? ? ? 5? 1700003E16Rik%%%? ?
? ? Ascl2? Ifnar2? ? ? ? ? ? ? 6? 1700028J19Rik%%%? ? ? ? ? Musk? Nfe2l3? ? ? ?
? ? ? 7? 1810011O10Rik%%%? ? ? Ppp1r13b? Bpnt1 Cdkn2c? Foxc18? ? ? ? ? ? Sox10?
? ? Smarca2? ? ? ? ? ? ? ? ? ? ? 9? 1810019D21Rik%%%? ? ? ? ? Asb8? ? ? ? ? ? ?
? ? ? ? 10 1810037I17Rik%%%? ? ? ? Zfp612? ? ? ? ? ? ? ? ? ? ? 11
1810055G02Rik%%%? ? ? ? Nkx2-3? Maged1? Runx1? ? Ugp212? ? ? ? ? ? Elk4? ? ? ?
Spdef? Tcf19? Isl2? Gtf2i13? ? ? ? ? Ctnnbl1? ? ? ? Tcea3? ? Ank2 Zfp612
Creb3l114? ? ? ? ? ? Nupr1 3632451O06Rik Creb3l4? Lass6
> Basically it breaks some rows into more than one rows. For example, row 7
in the original record becomes two rows. Looks like the "test" always
has 5 columns.
> How does this happen? How should I fix it to make one record into one two
in R object?
> Thank you very much!
> Ace
>
>
>
>
>
>
>
>
>? ? ? ? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
   
	[[alternative HTML version deleted]]
Jim Lemon
2017-Aug-28  04:56 UTC
[R] help with read.csv() for files with different number of columns
Hi Ace,
With tabs as separators:
testdf<-read.table("test.txt",header=FALSE,fill=TRUE,sep="\t",
col.names=paste("V",1:19,sep=""),stringsAsFactors=FALSE)
Also note that I got the number of columns wrong the first time.
Jim
On Mon, Aug 28, 2017 at 12:56 PM, Fix Ace <acefix at rocketmail.com>
wrote:> Hi, Jim,
>
> Thank you very much for pointing out the format issue. Here is the original
> text:
>
> ==> I have a text file (test.txt) with different number of columns:
>
> 0610007P14Rik%%% Tcf19 Gtf2i
> 0610010O12Rik%%% Ivns1abp Etv6
> 1100001G20Rik%%% Nmi
> 1500015O10Rik%%% Foxi1 Ascl3 Sirt3
> 1700003E16Rik%%% Ascl2 Ifnar2
> 1700028J19Rik%%% Musk Nfe2l3
> 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 Sox10 Smarca2
> 1810019D21Rik%%% Asb8
> 1810037I17Rik%%% Zfp612
> 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i
> Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6
>
> I wold like to read it into R using
>
>> test=read.csv("test.txt",sep="\t",header=FALSE)
>
> However, when I check the r object "test", I found that all the
rows have 5
> columns:
>
>> test
>                  V1            V2      V3     V4      V5
> 1  0610007P14Rik%%%         Tcf19   Gtf2i
> 2  0610010O12Rik%%%      Ivns1abp    Etv6
> 3  1100001G20Rik%%%           Nmi
> 4  1500015O10Rik%%%         Foxi1   Ascl3  Sirt3
> 5  1700003E16Rik%%%         Ascl2  Ifnar2
> 6  1700028J19Rik%%%          Musk  Nfe2l3
> 7  1810011O10Rik%%%      Ppp1r13b   Bpnt1 Cdkn2c   Foxc1
> 8             Sox10       Smarca2
> 9  1810019D21Rik%%%          Asb8
> 10 1810037I17Rik%%%        Zfp612
> 11 1810055G02Rik%%%        Nkx2-3  Maged1  Runx1    Ugp2
> 12             Elk4         Spdef   Tcf19   Isl2   Gtf2i
> 13          Ctnnbl1         Tcea3    Ank2 Zfp612 Creb3l1
> 14            Nupr1 3632451O06Rik Creb3l4  Lass6
>
> Basically it breaks some rows into more than one rows. For example, row 7
in
> the original record becomes two rows. Looks like the "test"
always has 5
> columns.
>
> How does this happen? How should I fix it to make one record into one two
in
> R object?
>
> =>
> Please let me know if it is readable now. Thank you very much for your
time!
>
> Kind regards,
>
> Ace
>
>
> On Sunday, August 27, 2017 7:25 PM, Jim Lemon <drjimlemon at
gmail.com> wrote:
>
>
> Hi Ace,
> As your example seems to have spaces as separators,
>
> testdf<-read.table("test.txt",header=FALSE,fill=TRUE,
>
col.names=paste("V",1:14,sep=""),stringsAsFactors=FALSE)
>
> By specifying the number of columns with "col.names" and using
> "fill=TRUE" you can get a data frame with zero length strings
where
> values are missing in the input file.
>
> Jim
>
> On Mon, Aug 28, 2017 at 6:25 AM, Fix Ace via R-help
> <r-help at r-project.org> wrote:
>> Dear R community,
>> I have a text file (test.txt) with different number of columns:
>> 0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6
>> 1100001G20Rik%%% Nmi 1500015O10Rik%%% Foxi1 Ascl3 Sirt3
1700003E16Rik%%%
>> Ascl2 Ifnar2 1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b
Bpnt1
>> Cdkn2c Foxc1 Sox10 Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%%
Zfp612
>> 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i
>> Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6
>> I wold like to read it into R using
>>  >
test=read.csv("test.txt",sep="\t",header=FALSE)
>> However, when I check the r object "test", I found that all
the rows have
>> 5 columns:
>>> test                V1            V2      V3    V4      V51
>>> 0610007P14Rik%%%        Tcf19  Gtf2i              2 
0610010O12Rik%%%
>>> Ivns1abp    Etv6              3  1100001G20Rik%%%          Nmi
>>> 4  1500015O10Rik%%%        Foxi1  Ascl3  Sirt3        5 
1700003E16Rik%%%
>>> Ascl2  Ifnar2              6  1700028J19Rik%%%          Musk 
Nfe2l3
>>> 7  1810011O10Rik%%%      Ppp1r13b  Bpnt1 Cdkn2c  Foxc18           
Sox10
>>> Smarca2                      9  1810019D21Rik%%%          Asb8
>>> 10 1810037I17Rik%%%        Zfp612                      11
1810055G02Rik%%%
>>> Nkx2-3  Maged1  Runx1    Ugp212            Elk4        Spdef  Tcf19
Isl2
>>> Gtf2i13          Ctnnbl1        Tcea3    Ank2 Zfp612 Creb3l114
>>> Nupr1 3632451O06Rik Creb3l4  Lass6
>> Basically it breaks some rows into more than one rows. For example, row
7
>> in the original record becomes two rows. Looks like the
"test" always has 5
>> columns.
>> How does this happen? How should I fix it to make one record into one
two
>> in R object?
>> Thank you very much!
>> Ace
>
>>
>>
>>
>>
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
Fix Ace
2017-Aug-29  16:22 UTC
[R] help with read.csv() for files with different number of columns
Thank you very much! Looks like I have to know the length of each record ahead
of time.
Ace
 
    On Monday, August 28, 2017 12:56 AM, Jim Lemon <drjimlemon at
gmail.com> wrote:
 
 Hi Ace,
With tabs as separators:
testdf<-read.table("test.txt",header=FALSE,fill=TRUE,sep="\t",
col.names=paste("V",1:19,sep=""),stringsAsFactors=FALSE)
Also note that I got the number of columns wrong the first time.
Jim
On Mon, Aug 28, 2017 at 12:56 PM, Fix Ace <acefix at rocketmail.com>
wrote:> Hi, Jim,
>
> Thank you very much for pointing out the format issue. Here is the original
> text:
>
> ==> I have a text file (test.txt) with different number of columns:
>
> 0610007P14Rik%%% Tcf19 Gtf2i
> 0610010O12Rik%%% Ivns1abp Etv6
> 1100001G20Rik%%% Nmi
> 1500015O10Rik%%% Foxi1 Ascl3 Sirt3
> 1700003E16Rik%%% Ascl2 Ifnar2
> 1700028J19Rik%%% Musk Nfe2l3
> 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 Sox10 Smarca2
> 1810019D21Rik%%% Asb8
> 1810037I17Rik%%% Zfp612
> 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i
> Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6
>
> I wold like to read it into R using
>
>> test=read.csv("test.txt",sep="\t",header=FALSE)
>
> However, when I check the r object "test", I found that all the
rows have 5
> columns:
>
>> test
>? ? ? ? ? ? ? ? ? V1? ? ? ? ? ? V2? ? ? V3? ? V4? ? ? V5
> 1? 0610007P14Rik%%%? ? ? ? Tcf19? Gtf2i
> 2? 0610010O12Rik%%%? ? ? Ivns1abp? ? Etv6
> 3? 1100001G20Rik%%%? ? ? ? ? Nmi
> 4? 1500015O10Rik%%%? ? ? ? Foxi1? Ascl3? Sirt3
> 5? 1700003E16Rik%%%? ? ? ? Ascl2? Ifnar2
> 6? 1700028J19Rik%%%? ? ? ? ? Musk? Nfe2l3
> 7? 1810011O10Rik%%%? ? ? Ppp1r13b? Bpnt1 Cdkn2c? Foxc1
> 8? ? ? ? ? ? Sox10? ? ? Smarca2
> 9? 1810019D21Rik%%%? ? ? ? ? Asb8
> 10 1810037I17Rik%%%? ? ? ? Zfp612
> 11 1810055G02Rik%%%? ? ? ? Nkx2-3? Maged1? Runx1? ? Ugp2
> 12? ? ? ? ? ? Elk4? ? ? ? Spdef? Tcf19? Isl2? Gtf2i
> 13? ? ? ? ? Ctnnbl1? ? ? ? Tcea3? ? Ank2 Zfp612 Creb3l1
> 14? ? ? ? ? ? Nupr1 3632451O06Rik Creb3l4? Lass6
>
> Basically it breaks some rows into more than one rows. For example, row 7
in
> the original record becomes two rows. Looks like the "test"
always has 5
> columns.
>
> How does this happen? How should I fix it to make one record into one two
in
> R object?
>
> =>
> Please let me know if it is readable now. Thank you very much for your
time!
>
> Kind regards,
>
> Ace
>
>
> On Sunday, August 27, 2017 7:25 PM, Jim Lemon <drjimlemon at
gmail.com> wrote:
>
>
> Hi Ace,
> As your example seems to have spaces as separators,
>
> testdf<-read.table("test.txt",header=FALSE,fill=TRUE,
>
col.names=paste("V",1:14,sep=""),stringsAsFactors=FALSE)
>
> By specifying the number of columns with "col.names" and using
> "fill=TRUE" you can get a data frame with zero length strings
where
> values are missing in the input file.
>
> Jim
>
> On Mon, Aug 28, 2017 at 6:25 AM, Fix Ace via R-help
> <r-help at r-project.org> wrote:
>> Dear R community,
>> I have a text file (test.txt) with different number of columns:
>> 0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6
>> 1100001G20Rik%%% Nmi 1500015O10Rik%%% Foxi1 Ascl3 Sirt3
1700003E16Rik%%%
>> Ascl2 Ifnar2 1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b
Bpnt1
>> Cdkn2c Foxc1 Sox10 Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%%
Zfp612
>> 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i
>> Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6
>> I wold like to read it into R using
>>? >
test=read.csv("test.txt",sep="\t",header=FALSE)
>> However, when I check the r object "test", I found that all
the rows have
>> 5 columns:
>>> test? ? ? ? ? ? ? ? V1? ? ? ? ? ? V2? ? ? V3? ? V4? ? ? V51
>>> 0610007P14Rik%%%? ? ? ? Tcf19? Gtf2i? ? ? ? ? ? ? 2?
0610010O12Rik%%%
>>> Ivns1abp? ? Etv6? ? ? ? ? ? ? 3? 1100001G20Rik%%%? ? ? ? ? Nmi
>>> 4? 1500015O10Rik%%%? ? ? ? Foxi1? Ascl3? Sirt3? ? ? ? 5?
1700003E16Rik%%%
>>> Ascl2? Ifnar2? ? ? ? ? ? ? 6? 1700028J19Rik%%%? ? ? ? ? Musk?
Nfe2l3
>>> 7? 1810011O10Rik%%%? ? ? Ppp1r13b? Bpnt1 Cdkn2c? Foxc18? ? ? ? ? ?
Sox10
>>> Smarca2? ? ? ? ? ? ? ? ? ? ? 9? 1810019D21Rik%%%? ? ? ? ? Asb8
>>> 10 1810037I17Rik%%%? ? ? ? Zfp612? ? ? ? ? ? ? ? ? ? ? 11
1810055G02Rik%%%
>>> Nkx2-3? Maged1? Runx1? ? Ugp212? ? ? ? ? ? Elk4? ? ? ? Spdef?
Tcf19? Isl2
>>> Gtf2i13? ? ? ? ? Ctnnbl1? ? ? ? Tcea3? ? Ank2 Zfp612 Creb3l114
>>> Nupr1 3632451O06Rik Creb3l4? Lass6
>> Basically it breaks some rows into more than one rows. For example, row
7
>> in the original record becomes two rows. Looks like the
"test" always has 5
>> columns.
>> How does this happen? How should I fix it to make one record into one
two
>> in R object?
>> Thank you very much!
>> Ace
>
>>
>>
>>
>>
>>
>>
>>
>>
>>? ? ? ? [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
   
	[[alternative HTML version deleted]]
Reasonably Related Threads
- help with read.csv() for files with different number of columns
- help with read.csv() for files with different number of columns
- help with read.csv() for files with different number of columns
- help with read.csv() for files with different number of columns
- help with read.csv() for files with different number of columns