thr3ads.net - R help - [R] big data file geting truncated [Aug 2003]

If this information is useful, please help other people find it:
Share via:

Dibakar Ray

2003-Aug-13 07:03 UTC

[R] big data file geting truncated

I am very new to R. I was trying to load some publicly available Expression
data in to R.
I used the following commands
 mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep
="\t",row.names=NULL)
It reads data without any error
Now if I use
edit(mydata)
It shows only 3916 entries, whereas the actual file contains 7129 entries)
My data is something like
Gene Description	Gene Accession
Number	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27
34	35	36	37	38	28	29	30	31	32	33
AFFX-BioB-5_at (endogenous
control)	AFFX-BioB-5_at	-214	-139	-76	-135	-106	-138	-72	-413	5	-88	-165	-67	-92
-113	-107	-117	-476	-81	-44	17	-144	-247	-74	-120	-81	-112	-273	-20	7	-213	-25
-72	-4	15	-318	-32	-124	-135
So it seems R is truncating the data. How  can I load the complete file?
Thanks in advance
Dibakar

Peter Dalgaard BSA

2003-Aug-13 07:26 UTC

head link

[R] big data file geting truncated

Dibakar Ray <dibakar at hub.nic.in> writes:
> I am very new to R. I was trying to load some publicly available Expression
> data in to R.
> I used the following commands
>  mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep
> ="\t",row.names=NULL)
> It reads data without any error
> Now if I use
> edit(mydata)
> It shows only 3916 entries, whereas the actual file contains 7129 entries)
...> So it seems R is truncating the data. How  can I load the complete file?
First isolate the source. edit() could have a bug, so what is
dim(mydata) ? Does the input file really have 7130 lines?
length(readLines("foobar.txt")) should tell you if you don't have
"wc"
on your system.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Martin Maechler

2003-Aug-13 07:46 UTC

head link

[R] big data file geting truncated

>>>>> "Dibakar" == Dibakar Ray <dibakar at
hub.nic.in>
>>>>>     on Wed 13 Aug 2003 12:33:21 +0530 (IST) writes:
    Dibakar> I am very new to R. I was trying to load some
    Dibakar> publicly available Expression data in to R.

    Dibakar> I used the following commands
    Dibakar> mydata<-read.table("dataALLAMLtrain.txt",
header=TRUE, sep
    Dibakar>                    ="\t",row.names=NULL)
    Dibakar> It reads data without any error

(really?, how do you know?  
 It seems you are trying to check this via the following ?
)
    Dibakar> Now if I use
    Dibakar> edit(mydata)
    Dibakar> It shows only 3916 entries, whereas the actual file
    Dibakar> contains 7129 entries). My data is something like

    Dibakar> Gene Description Gene Accession

    Dibakar> Number	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22
23	24	25	26	27	34	35	36	37	38	28	29	30	31	32	33
    Dibakar> AFFX-BioB-5_at (endogenous
    Dibakar> control)	AFFX-BioB-5_at	-214	-139	-76	-135	-106	-138	-72	-413	5
-88	-165	-67	-92	-113	-107	-117	-476	-81	-44	17	-144	-247	-74	-120	-81	-112	-273
-20	7	-213	-25	-72	-4	15	-318	-32	-124	-135

(this probably has an extraneous  "wrap-around" in your post).

    Dibakar> So it seems R is truncating the data. How can I
    Dibakar> load the complete file?

edit() has been having problems with large files, however only
with more than 65535 rows.

HOWEVER, using edit() after read.table() to check your data is
not very recommended. 
Use 	dim(mydata)
	str(mydata)
and possibly also
	names(mydata)
	summary(mydata)
	
to check if the data frame was okay *before* you edited it,
using edit().

Martin Maechler <maechler at stat.math.ethz.ch>
http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><

Rafael A. Irizarry

2003-Aug-13 07:49 UTC

head link

[R] big data file geting truncated

without seeing the file its hard to tell but one possibility that comes to
mind is that there is a # character in your
file. read.table considers this a comment character.
use the argurment comment.char="" and see what happens...




On Wed, 13 Aug 2003, Dibakar Ray wrote:
> I am very new to R. I was trying to load some publicly available Expression
> data in to R.
> I used the following commands
>  mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep
> ="\t",row.names=NULL)
> It reads data without any error
> Now if I use
> edit(mydata)
> It shows only 3916 entries, whereas the actual file contains 7129 entries)
> My data is something like
> Gene Description	Gene Accession
> Number	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26
27	34	35	36	37	38	28	29	30	31	32	33
> AFFX-BioB-5_at (endogenous
> control)	AFFX-BioB-5_at	-214	-139	-76	-135	-106	-138	-72	-413	5	-88	-165
-67	-92	-113	-107	-117	-476	-81	-44	17	-144	-247	-74	-120	-81	-112	-273	-20	7
-213	-25	-72	-4	15	-318	-32	-124	-135
> So it seems R is truncating the data. How  can I load the complete file?
> Thanks in advance
> Dibakar
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

Philipp Pagel

2003-Aug-13 08:33 UTC

head link

[R] big data file geting truncated

Hi!
> I used the following commands
>  mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep
> ="\t",row.names=NULL)
> It reads data without any error
> Now if I use
> edit(mydata)
> It shows only 3916 entries, whereas the actual file contains 7129 entries)
[...]> So it seems R is truncating the data. How  can I load the complete file?
Others have already recommended checking the length of the data.frame
using dim() and the file using wc. If it turns out that there really is
a difference in size the next thing would be to get an idea what lines
are affected: Are "random" lines missing or is everything ok up to
line
3916 and then it stops? In either case - have a close look at the lines
missing or the last line present plus the first one missing: Is there
anything special about them?

But actually I have a feeling that this may be your problem:

read.table uses both '"' and "'" for quoting by
default. Gene
descriptions love to contain things like "5'" and
"3'".
=> Try quote='' in the read.table call.

cu
	Philipp

-- 
Dr. Philipp Pagel                                Tel.  +49-89-3187-3675
Institute for Bioinformatics / MIPS              Fax.  +49-89-3187-3585
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1
85764 Neuherberg, Germany

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2003 - big data file geting truncated

[R] big data file geting truncated

[R] big data file geting truncated

[R] big data file geting truncated

[R] big data file geting truncated

[R] big data file geting truncated

Maybe Matching Threads