davidek at zla-ryba.cz
2006-Jun-27 16:07 UTC
[R] Very slow read.table on Linux, compared to Win2000
Dear all, I read.table a 17MB tabulator separated table with 483 variables(mostly numeric) and 15000 observations into R. This takes a few seconds with R 2.3.1 on windows 2000, but it takes several minutes on my Linux machine. The linux machine is Ubuntu 6.06, 256 MR RAM, Athlon 1600 processor. The windows hardware is better (Pentium 4, 512 RAM), but it shouldn't make such a difference. The strange thing is that even doing something with the data(say a histogram of a variable, or transforming integers into a factor) takes really long time on the linux box and the computer seems to work extensively with the hard disk. Could this be caused by swapping ? Can I increase the memory allocated to R somehow ? I have checked the manual, but the memory options allowed for linux don't seem to help me (I may be doing it wrong, though ...) The code I run: TBO <- read.table(file="TBO.dat",sep="\t",header=TRUE,dec=","); # this takes forever TBO$sexe<-factor(TBO$sexe,labels=c("man","vrouw")); # even this takes like 30 seconds, compared to nothing on Win2000 I'd be grateful for any suggestions, Regards, David Vonka ------------------------------------------------------------------ David Vonka (Netspar, Universiteit van Tilburg, room B-623) CZ: Ovci Hajek 42, Praha 5, Czech Republic, tel: +420777022926 NL: Telefoonstraat 1, 5038DL Tilburg, The Netherlands, tel:+31638083064
Peter Dalgaard
2006-Jun-28 12:28 UTC
[R] Very slow read.table on Linux, compared to Win2000
<davidek at zla-ryba.cz> writes:> Dear all, > > I read.table a 17MB tabulator separated table with 483 variables(mostly numeric) and 15000 > observations into R. This takes a few seconds with R 2.3.1 on windows 2000, but it takes > several minutes on my Linux machine. The linux machine is Ubuntu 6.06, 256 MR RAM, > Athlon 1600 processor. The windows hardware is better (Pentium 4, 512 RAM), but it > shouldn't make such a difference. > > The strange thing is that even doing something with the data(say a histogram of a variable, or > transforming > integers into a factor) takes really long time on the linux box and the computer seems to work > extensively with the hard disk. > Could this be caused by swapping ? Can I increase the memory allocated to R somehow ? > I have checked the manual, but the memory options allowed for linux don't seem to > help me (I may be doing it wrong, though ...) > > The code I run: > > TBO <- read.table(file="TBO.dat",sep="\t",header=TRUE,dec=","); # this takes forever > TBO$sexe<-factor(TBO$sexe,labels=c("man","vrouw")); # even this takes like 30 seconds, compared > to nothing on Win2000 > > I'd be grateful for any suggestions,Almost surely, the fix is to insert more RAM chips. 256 MB leaves you very little space for actual work these days, and a 17MB file will get expanded to several times the original size during reading and data manipulations. Using a lightweight window manager can help, but you usually regret the switch for other reasons. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907