I am very new to R. I was trying to load some publicly available Expression data in to R. I used the following commands mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep ="\t",row.names=NULL) It reads data without any error Now if I use edit(mydata) It shows only 3916 entries, whereas the actual file contains 7129 entries) My data is something like Gene Description Gene Accession Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 34 35 36 37 38 28 29 30 31 32 33 AFFX-BioB-5_at (endogenous control) AFFX-BioB-5_at -214 -139 -76 -135 -106 -138 -72 -413 5 -88 -165 -67 -92 -113 -107 -117 -476 -81 -44 17 -144 -247 -74 -120 -81 -112 -273 -20 7 -213 -25 -72 -4 15 -318 -32 -124 -135 So it seems R is truncating the data. How can I load the complete file? Thanks in advance Dibakar
Dibakar Ray <dibakar at hub.nic.in> writes:> I am very new to R. I was trying to load some publicly available Expression > data in to R. > I used the following commands > mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep > ="\t",row.names=NULL) > It reads data without any error > Now if I use > edit(mydata) > It shows only 3916 entries, whereas the actual file contains 7129 entries)...> So it seems R is truncating the data. How can I load the complete file?First isolate the source. edit() could have a bug, so what is dim(mydata) ? Does the input file really have 7130 lines? length(readLines("foobar.txt")) should tell you if you don't have "wc" on your system. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>>>>> "Dibakar" == Dibakar Ray <dibakar at hub.nic.in> >>>>> on Wed 13 Aug 2003 12:33:21 +0530 (IST) writes:Dibakar> I am very new to R. I was trying to load some Dibakar> publicly available Expression data in to R. Dibakar> I used the following commands Dibakar> mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep Dibakar> ="\t",row.names=NULL) Dibakar> It reads data without any error (really?, how do you know? It seems you are trying to check this via the following ? ) Dibakar> Now if I use Dibakar> edit(mydata) Dibakar> It shows only 3916 entries, whereas the actual file Dibakar> contains 7129 entries). My data is something like Dibakar> Gene Description Gene Accession Dibakar> Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 34 35 36 37 38 28 29 30 31 32 33 Dibakar> AFFX-BioB-5_at (endogenous Dibakar> control) AFFX-BioB-5_at -214 -139 -76 -135 -106 -138 -72 -413 5 -88 -165 -67 -92 -113 -107 -117 -476 -81 -44 17 -144 -247 -74 -120 -81 -112 -273 -20 7 -213 -25 -72 -4 15 -318 -32 -124 -135 (this probably has an extraneous "wrap-around" in your post). Dibakar> So it seems R is truncating the data. How can I Dibakar> load the complete file? edit() has been having problems with large files, however only with more than 65535 rows. HOWEVER, using edit() after read.table() to check your data is not very recommended. Use dim(mydata) str(mydata) and possibly also names(mydata) summary(mydata) to check if the data frame was okay *before* you edited it, using edit(). Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <><
without seeing the file its hard to tell but one possibility that comes to mind is that there is a # character in your file. read.table considers this a comment character. use the argurment comment.char="" and see what happens... On Wed, 13 Aug 2003, Dibakar Ray wrote:> I am very new to R. I was trying to load some publicly available Expression > data in to R. > I used the following commands > mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep > ="\t",row.names=NULL) > It reads data without any error > Now if I use > edit(mydata) > It shows only 3916 entries, whereas the actual file contains 7129 entries) > My data is something like > Gene Description Gene Accession > Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 34 35 36 37 38 28 29 30 31 32 33 > AFFX-BioB-5_at (endogenous > control) AFFX-BioB-5_at -214 -139 -76 -135 -106 -138 -72 -413 5 -88 -165 -67 -92 -113 -107 -117 -476 -81 -44 17 -144 -247 -74 -120 -81 -112 -273 -20 7 -213 -25 -72 -4 15 -318 -32 -124 -135 > So it seems R is truncating the data. How can I load the complete file? > Thanks in advance > Dibakar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
Hi!> I used the following commands > mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep > ="\t",row.names=NULL) > It reads data without any error > Now if I use > edit(mydata) > It shows only 3916 entries, whereas the actual file contains 7129 entries)[...]> So it seems R is truncating the data. How can I load the complete file?Others have already recommended checking the length of the data.frame using dim() and the file using wc. If it turns out that there really is a difference in size the next thing would be to get an idea what lines are affected: Are "random" lines missing or is everything ok up to line 3916 and then it stops? In either case - have a close look at the lines missing or the last line present plus the first one missing: Is there anything special about them? But actually I have a feeling that this may be your problem: read.table uses both '"' and "'" for quoting by default. Gene descriptions love to contain things like "5'" and "3'". => Try quote='' in the read.table call. cu Philipp -- Dr. Philipp Pagel Tel. +49-89-3187-3675 Institute for Bioinformatics / MIPS Fax. +49-89-3187-3585 GSF - National Research Center for Environment and Health Ingolstaedter Landstrasse 1 85764 Neuherberg, Germany