Hi,
I have uploaded a copy of the file here:
- http://pastebin.com/fd0edfab
the file has also been passed throught the unix command tool unexpand, but
it doesn't solve the problem.
using head=TRUE instead of head=T has also the same effect.
the output of print(names) is:> print(names(ngly), quote=TRUE)
[1] "snp" "gene"
[3] "chromosome" "distance_from_gene_center"
[5] "position" "ame"
[7] "csasia" "easia"
[9] "eur" "mena"
[11] "oce" "ssafr"
[13] "X" "X.1"
[15] "X.2"
Thank you to all the people who answered me to my mail address, but I
couldn't solve the problem yet.
On Tue, Jul 14, 2009 at 12:36 AM, jim holtman <jholtman@gmail.com> wrote:
> Can you send your file as an attachment since it is impossible to see
> where the separator characters are.
>
> On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco
> Dall'Olio<dalloliogm@gmail.com> wrote:
> > Hi people,
> > I have a text file like this one posted:
> >
> > snp_id gene chromosome distance_from_gene_center
> > position pop1 pop2 pop3 pop4 pop5 pop6 pop7
> > rs2129081 RAPT2 3 -129993 "upstream"
0.439009
> > 1.169210 NA 0.233020 0.093042 NA
> > -0.902596
> > rs1202698 RAPT2 3 -128695 "upstream" NA
> > 1.815000 NA 0.399079 1.814270 1.382950
> > NA
> > rs1163207 RAPT2 3 -128224 "upstream" NA
NA
> > NA NA NA NA NA
> > rs1834127 RAPT2 3 -128106 "upstream" NA
NA
> > NA NA NA NA 2.180670
> > rs2114211 RAPT2 3 -126738 "upstream"
-0.468279
> > -1.447620 NA 0.010616 -0.414581 NA
> > 0.550447
> > rs2113151 RAPT2 3 -124620 "upstream"
-0.897660
> > -1.971020 NA -0.920327 -0.764658 NA
> > 0.337127
> > rs2524130 RAPT2 3 -123029 "upstream"
-0.109795
> > -0.004646 -0.412059 1.116740 0.667567
> > -0.924529 0.962841
> > rs1381318 RAPT2 3 -12818 "upstream"
-0.911662
> > -1.791580 NA -0.945716 -1.239640 NA
> > 0.004876
> > rs2113319 RAPT2 3 -122028 "upstream"
-0.911662
> > -1.738610 NA -0.945716 -1.240950 NA
-0.005318
> >
> > When I use read.delim (or any read function) on it, R skips the first
> > column, and I don' understand why.
> >
> > For example:
> > $: R
> >> data = read.delim('snp_file.txt', head=T,
sep='\t')
> >
> > Now, I would expect data$snp_id to contain snp ids, and data$gene to
> contain
> > gene names; but it is not like this:
> >
> >> data$snp_id
> > [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
> > Levels: RAPT2
> >> data$gene
> > [1] 3 3 3 3 3 3 3 3 3
> >
> >> summary(data)
> > snp_id gene chromosome distance_from_gene_center
> > RAPT2:9 Min. :3 Min. :-129993 upstream:9
> > 1st Qu.:3 1st Qu.:-128224
> > Median :3 Median :-126738
> > Mean :3 Mean :-113806
> > 3rd Qu.:3 3rd Qu.:-123029
> > Max. :3 Max. : -12818
> > ....
> >
> >> data$pop7
> > [1] NA NA NA NA NA NA NA NA NA
> >
> >
> > Notice that it did use snp_id as the header for the first column, but
it
> > skips completely al the data from that column, and all the fields are
> > shifted, so the last column is filled with NA values.
> >
> > What I am doing wrong? Can it be a problem of my data files? I have
tried
> to
> > modify them a bit (add new columns, etc..) but it didn't work.
> >
> > I am running R from an Ubuntu system:
> >> sessionInfo()
> > R version 2.9.1 (2009-06-26)
> > i486-pc-linux-gnu
> >
> > locale:
> >
>
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base
> >
> >
> >
> >
> > --
> > Giovanni Dall'Olio, phd student
> > Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
> >
> > My blog on bioinformatics: http://bioinfoblog.it
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
--
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
My blog on bioinformatics: http://bioinfoblog.it
[[alternative HTML version deleted]]