Dear All, Hope I am not bumping into a FAQ, but so far my online search has been fruitless I need to read some data file using R. I am using the (I think) standard command: data_150<-read.table("y_complete06000", header=FALSE) where y_complete06000 is a 6000 by 40 table of numbers. I am puzzled at the fact that R is taking several minutes to read this file. First I thought it may have been due to its shape, but even re-expressing and saving the matrix as a 1D array does not help. It is not a small file, but not even huge (it amounts to about 5Mb of text file). Is there anything I can do to speed up the file reading? Many thanks Lorenzo
If it's a matrix, use scan(). If the columns are not all the same type, use the colClasses argument to read.table() to specify their types, instead of leaving it to R to guess. That will speed things up quite a lot. Andy From: Lorenzo Isella> > Dear All, > Hope I am not bumping into a FAQ, but so far my online search > has been fruitless > I need to read some data file using R. I am using the (I think) > standard command: > > data_150<-read.table("y_complete06000", header=FALSE) > > where y_complete06000 is a 6000 by 40 table of numbers. > I am puzzled at the fact that R is taking several minutes to > read this file. > First I thought it may have been due to its shape, but even > re-expressing and saving the matrix as a 1D array does not help. > It is not a small file, but not even huge (it amounts to about 5Mb of > text file). > Is there anything I can do to speed up the file reading? > Many thanks > > Lorenzo > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
On Tue, 15 May 2007, Lorenzo Isella wrote:> Dear All, > Hope I am not bumping into a FAQ, but so far my online search has been fruitless > I need to read some data file using R. I am using the (I think) > standard command: > > data_150<-read.table("y_complete06000", header=FALSE) > > where y_complete06000 is a 6000 by 40 table of numbers. > I am puzzled at the fact that R is taking several minutes to read this file. > First I thought it may have been due to its shape, but even > re-expressing and saving the matrix as a 1D array does not help. > It is not a small file, but not even huge (it amounts to about 5Mb of > text file). > Is there anything I can do to speed up the file reading?You could try reading the help page or the 'R Data Import/Export' manual. Both point out things like 'read.table' is not the right tool for reading large matrices, especially those with many columns: it is designed to read _data frames_ which may have columns of very different classes. Use 'scan' instead. On the other hand I am surprised at several minutes, but as you haven't even told us your OS, it is hard to know what to expect. My Linux box took 3 secs for a 6000x40 matrix with read.table, 0.8 sec with scan. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595