rivercode
2010-Oct-11 21:39 UTC
[R] Slow reading multiple tick data files into list of dataframes
Hi, I am trying to find the best way to read 85 tick data files of format:> head(nbbo)1 bid CON 09:30:00.722 09:30:00.722 32.71 98 2 ask CON 09:30:00.782 09:30:00.810 33.14 300 3 ask CON 09:30:00.809 09:30:00.810 33.14 414 4 bid CON 09:30:00.783 09:30:00.810 33.06 200 Each file has between 100,000 to 300,300 rows. Currently doing nbbo.list<- lapply(filePath, read.csv) to create list with 85 data.frame objects...but it is taking minutes to read in the data and afterwards I get the following message on the console when taking further actions (though it does then stop): The R Engine is busy. Please wait, and try your command again later. filePath in the above example is a vector of filenames:> head(filePath)[1] "C:/work/A/A_2010-10-07_nbbo.csv" [2] "C:/work/AAPL/AAPL_2010-10-07_nbbo.csv" [3] "C:/work/ADBE/ADBE_2010-10-07_nbbo.csv" [4] "C:/work/ADI/ADI_2010-10-07_nbbo.csv" Is there a better/quicker or more R way of doing this ? Thanks, Chris -- View this message in context: http://r.789695.n4.nabble.com/Slow-reading-multiple-tick-data-files-into-list-of-dataframes-tp2990723p2990723.html Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2010-Oct-11 21:48 UTC
[R] Slow reading multiple tick data files into list of dataframes
On Mon, Oct 11, 2010 at 5:39 PM, rivercode <aquanyc at gmail.com> wrote:> > Hi, > > I am trying to find the best way to read 85 tick data files of format: > >> head(nbbo) > 1 bid ?CON ?09:30:00.722 ? ?09:30:00.722 ?32.71 ? 98 > 2 ask ?CON ?09:30:00.782 ? ?09:30:00.810 ?33.14 ?300 > 3 ask ?CON ?09:30:00.809 ? ?09:30:00.810 ?33.14 ?414 > 4 bid ?CON ?09:30:00.783 ? ?09:30:00.810 ?33.06 ?200 > > Each file has between 100,000 to 300,300 rows. > > Currently doing ? nbbo.list<- lapply(filePath, read.csv) ? ?to create list > with 85 data.frame objects...but it is taking minutes to read in the data > and afterwards I get the following message on the console when taking > further actions (though it does then stop): > > ? ?The R Engine is busy. Please wait, and try your command again later. > > filePath in the above example is a vector of filenames: >> head(filePath) > [1] "C:/work/A/A_2010-10-07_nbbo.csv" > [2] "C:/work/AAPL/AAPL_2010-10-07_nbbo.csv" > [3] "C:/work/ADBE/ADBE_2010-10-07_nbbo.csv" > [4] "C:/work/ADI/ADI_2010-10-07_nbbo.csv" > > Is there a better/quicker or more R way of doing this ? >You could try (possibly with suitable additonal arguments): library(sqldf) lapply(filePath, read.csv.sql) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Mike Marchywka
2010-Oct-11 21:48 UTC
[R] Slow reading multiple tick data files into list of dataframes
----------------------------------------> Date: Mon, 11 Oct 2010 14:39:54 -0700 > From: aquanyc at gmail.com > To: r-help at r-project.org > Subject: [R] Slow reading multiple tick data files into list of dataframes[...]> Is there a better/quicker or more R way of doing this ?While there may be an obvious R-related answer, usually it helps if you can determine where the bottleneck is in terms of resources on your platform- often on older machines you run out of real memory and then all the time is spent reading the file onto VM back on disk. Can you tell if you are CPU or memory limited by using task manager? It could in fact be that the best solution involves not trying to hold your entire data set in memory at once, hard to know without knowing your platform etc. In the past, I've found that actually sorting data, a slow process itself, can speed things up a lot due to less thrashing of memory hierarchy during the later analysis. I doubt if that helps your immediate problem but it does point to one possible non-obvious "optimization" depending on what is slowing you down.> > Thanks, > Chris > > -- > View this message in context: http://r.789695.n4.nabble.com/Slow-reading-multiple-tick-data-files-into-list-of-dataframes-tp2990723p2990723.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.