thr3ads.net - R help - [R] read.delim skips first column (why?) [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Giovanni Marco Dall'Olio

2009-Jul-13 17:27 UTC

[R] read.delim skips first column (why?)

Hi people,
I have a text file like this one posted:

snp_id  gene    chromosome      distance_from_gene_center
position        pop1    pop2    pop3    pop4    pop5    pop6    pop7
rs2129081       RAPT2   3       -129993 "upstream"      0.439009
1.169210        NA      0.233020        0.093042        NA
-0.902596
rs1202698       RAPT2   3       -128695 "upstream"      NA
1.815000        NA      0.399079        1.814270        1.382950
NA
rs1163207       RAPT2   3       -128224 "upstream"      NA      NA
NA      NA      NA      NA      NA
rs1834127       RAPT2   3       -128106 "upstream"      NA      NA
NA      NA      NA      NA      2.180670
rs2114211       RAPT2   3       -126738 "upstream"      -0.468279
-1.447620       NA      0.010616        -0.414581       NA
0.550447
rs2113151       RAPT2   3       -124620 "upstream"      -0.897660
-1.971020       NA      -0.920327       -0.764658       NA
0.337127
rs2524130       RAPT2   3       -123029 "upstream"      -0.109795
-0.004646       -0.412059       1.116740        0.667567
-0.924529       0.962841
rs1381318       RAPT2   3       -12818  "upstream"      -0.911662
-1.791580       NA      -0.945716       -1.239640       NA
0.004876
rs2113319       RAPT2   3       -122028 "upstream"      -0.911662
-1.738610       NA      -0.945716       -1.240950       NA      -0.005318

When I use read.delim (or any read function) on it, R skips the first
column, and I don' understand why.

For example:
$: R> data = read.delim('snp_file.txt', head=T, sep='\t')
Now, I would expect data$snp_id to contain snp ids, and data$gene to contain
gene names; but it is not like this:
> data$snp_id[1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
Levels: RAPT2> data$gene[1] 3 3 3 3 3 3 3 3 3
> summary(data)  snp_id       gene     chromosome      distance_from_gene_center
 RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
           1st Qu.:3   1st Qu.:-128224
           Median :3   Median :-126738
           Mean   :3   Mean   :-113806
           3rd Qu.:3   3rd Qu.:-123029
           Max.   :3   Max.   : -12818
....
> data$pop7[1] NA NA NA NA NA NA NA NA NA


Notice that it did use snp_id as the header for the first column, but it
skips completely al the data from that column, and all the fields are
shifted, so the last column is filled with NA values.

What I am doing wrong? Can it be a problem of my data files? I have tried to
modify them a bit (add new columns, etc..) but it didn't work.

I am running R from an Ubuntu system:> sessionInfo()R version 2.9.1 (2009-06-26)
i486-pc-linux-gnu

locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base




-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it

	[[alternative HTML version deleted]]

Giovanni Marco Dall'Olio

2009-Jul-14 09:11 UTC

head link

[R] read.delim skips first column (why?)

Hi,
I have uploaded a copy of the file here:
- http://pastebin.com/fd0edfab

the file has also been passed throught the unix command tool unexpand, but
it doesn't solve the problem.

using head=TRUE instead of head=T has also the same effect.

the output of print(names) is:> print(names(ngly), quote=TRUE) [1] "snp"                       "gene"
 [3] "chromosome"                "distance_from_gene_center"
 [5] "position"                  "ame"
 [7] "csasia"                    "easia"
 [9] "eur"                       "mena"
[11] "oce"                       "ssafr"
[13] "X"                         "X.1"
[15] "X.2"

Thank you to all the people who answered me to my mail address, but I
couldn't solve the problem yet.


On Tue, Jul 14, 2009 at 12:36 AM, jim holtman <jholtman@gmail.com> wrote:
> Can you send your file as an attachment since it is impossible to see
> where the separator characters are.
>
> On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco
> Dall'Olio<dalloliogm@gmail.com> wrote:
> > Hi people,
> > I have a text file like this one posted:
> >
> > snp_id  gene    chromosome      distance_from_gene_center
> > position        pop1    pop2    pop3    pop4    pop5    pop6    pop7
> > rs2129081       RAPT2   3       -129993 "upstream"     
0.439009
> > 1.169210        NA      0.233020        0.093042        NA
> > -0.902596
> > rs1202698       RAPT2   3       -128695 "upstream"      NA
> > 1.815000        NA      0.399079        1.814270        1.382950
> > NA
> > rs1163207       RAPT2   3       -128224 "upstream"      NA  
NA
> > NA      NA      NA      NA      NA
> > rs1834127       RAPT2   3       -128106 "upstream"      NA  
NA
> > NA      NA      NA      NA      2.180670
> > rs2114211       RAPT2   3       -126738 "upstream"     
-0.468279
> > -1.447620       NA      0.010616        -0.414581       NA
> > 0.550447
> > rs2113151       RAPT2   3       -124620 "upstream"     
-0.897660
> > -1.971020       NA      -0.920327       -0.764658       NA
> > 0.337127
> > rs2524130       RAPT2   3       -123029 "upstream"     
-0.109795
> > -0.004646       -0.412059       1.116740        0.667567
> > -0.924529       0.962841
> > rs1381318       RAPT2   3       -12818  "upstream"     
-0.911662
> > -1.791580       NA      -0.945716       -1.239640       NA
> > 0.004876
> > rs2113319       RAPT2   3       -122028 "upstream"     
-0.911662
> > -1.738610       NA      -0.945716       -1.240950       NA     
-0.005318
> >
> > When I use read.delim (or any read function) on it, R skips the first
> > column, and I don' understand why.
> >
> > For example:
> > $: R
> >> data = read.delim('snp_file.txt', head=T,
sep='\t')
> >
> > Now, I would expect data$snp_id to contain snp ids, and data$gene to
> contain
> > gene names; but it is not like this:
> >
> >> data$snp_id
> > [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
> > Levels: RAPT2
> >> data$gene
> > [1] 3 3 3 3 3 3 3 3 3
> >
> >> summary(data)
> >  snp_id       gene     chromosome      distance_from_gene_center
> >  RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
> >           1st Qu.:3   1st Qu.:-128224
> >           Median :3   Median :-126738
> >           Mean   :3   Mean   :-113806
> >           3rd Qu.:3   3rd Qu.:-123029
> >           Max.   :3   Max.   : -12818
> > ....
> >
> >> data$pop7
> > [1] NA NA NA NA NA NA NA NA NA
> >
> >
> > Notice that it did use snp_id as the header for the first column, but
it
> > skips completely al the data from that column, and all the fields are
> > shifted, so the last column is filled with NA values.
> >
> > What I am doing wrong? Can it be a problem of my data files? I have
tried
> to
> > modify them a bit (add new columns, etc..) but it didn't work.
> >
> > I am running R from an Ubuntu system:
> >> sessionInfo()
> > R version 2.9.1 (2009-06-26)
> > i486-pc-linux-gnu
> >
> > locale:
> >
>
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> >
> >
> >
> > --
> > Giovanni Dall'Olio, phd student
> > Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
> >
> > My blog on bioinformatics: http://bioinfoblog.it
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>


-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it

	[[alternative HTML version deleted]]

Paolo Sonego

2009-Jul-15 16:11 UTC

head link

[R] read.delim skips first column (why?)

This should work:


junk <-read.table("fd0edfab.txt", sep="", header=T,
fill=F,quote=" ")

HIH

Paolo Sonego

HBaize

2009-Oct-01 13:28 UTC

head link

[R] re ad.delim skips first column (why?)

I can't determine what is going on in you example, but my approach would be
to read the text file into a text editor that will display hidden characters
(tab, etc.) so you can see the pattern. It could be that there is an extra
tab in some locations. You could then use the editor's replace function to
remove the control characters that are causing the problem. 

Giovanni Dall'Olio wrote:> 
> Hi people,
> I have a text file like this one posted:
> 
> 
> When I use read.delim (or any read function) on it, R skips the first
> column, and I don' understand why.
> 
> 
> 
-- 
View this message in context:
http://www.nabble.com/read.delim-skips-first-column-%28why-%29-tp24466023p25696875.html
Sent from the R help mailing list archive at Nabble.com.

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Jul 2009 - read.delim skips first column (why?)

[R] read.delim skips first column (why?)

[R] read.delim skips first column (why?)

[R] read.delim skips first column (why?)

[R] re ad.delim skips first column (why?)

Seemingly Similar Threads