Hello! I have a tab-delimited .txt file (size 800MB) with about 3.4 million rows and 41 columns. About 15 columns contain strings. Tried to read it in in R 2.12.2 on a laptop that has Windows XP: mydata<-read.delim(file="FileName.TXT",sep="\t") R did not complain (!) and I got: dim(mydata) 1692063 41. I looked at the same file in 2 other programs - one of them was SPSS. Both of them show me that I have 3,374,050 rows and 41 columns. And rows 1692063 and 1692064 are in no way different from each other. Then I went to a large desktop with huge memory, Windows 7 for 64 bits and tried the same thing with R 2.12.2 for 64 bits. Again, I got no complaints from R and got the same number of rows (1692063)! Then I tried to read in more rows (into the second data frame), with the same code but with skip=1692064. It keeps reading in progressively fewer and fewer rows (maybe because memory is full?). Any advice - any chance for me to read in the whole file? Thank you very much! -- Dimitri Liakhovitski Ninah Consulting
On Tue, Mar 29, 2011 at 06:58:59PM -0400, Dimitri Liakhovitski wrote:> I have a tab-delimited .txt file (size 800MB) with about 3.4 million > rows and 41 columns. About 15 columns contain strings. > Tried to read it in in R 2.12.2 on a laptop that has Windows XP: > mydata<-read.delim(file="FileName.TXT",sep="\t") > R did not complain (!) and I got: dim(mydata) 1692063 41.My guess would be that there are (unexpected) quotes and/or double quotes in your file and so R thinks that rather large blocks of your file are actually very long strings. This routinely happens in situations like this: ID x description 1 0.4 my first measurement 2 1.6 Normal 5" object 3 0.4 Some measuremetn 4 0.7 A 4" long sample R thinks that the description in row 2 ends in row 4 and you loose data. Try read.delim(..., quote=""). cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/