Hi all scientists, Recently, I am dealing with big data ( >3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]]
Have you think of build a database then then let R read it thru that db instead of your desktop? On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao <rfans4chemo@gmail.com> wrote:> Hi all scientists, > > Recently, I am dealing with big data ( >3G txt or csv format ) in my > desktop (windows 7 - 64 bit version), but I can not read them faster, > thought I search from internet. [define colClasses for read.table, cobycol > and limma packages I have use them, but it is not so fast]. > > Could you share your methods to read big data to R faster? > > Though this is an odd question, but we need it really. > > Any suggest appreciates. > > Thank you very much. > > > kevin > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Do you really have the need loading all the data into memory? Mostly for large data set, people would just read a chunk of it for developing analysis pipeline, and when that's done, the ready script would just iterate through the entire data set. For example, the read.table function has 'nrow' and 'skip' parameters to control the reading of data chunks. read.table(file, nrows = -1, skip = 0, ...) And another tip here is, you can split the large file into smaller ones. On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao <rfans4chemo@gmail.com> wrote:> Hi all scientists, > > Recently, I am dealing with big data ( >3G txt or csv format ) in my > desktop (windows 7 - 64 bit version), but I can not read them faster, > thought I search from internet. [define colClasses for read.table, cobycol > and limma packages I have use them, but it is not so fast]. > > Could you share your methods to read big data to R faster? > > Though this is an odd question, but we need it really. > > Any suggest appreciates. > > Thank you very much. > > > kevin > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 04/26/2013 08:09 AM, Kevin Hao wrote:> Hi all scientists, > > Recently, I am dealing with big data ( >3G txt or csv format ) in my > desktop (windows 7 - 64 bit version), but I can not read them faster, > thought I search from internet. [define colClasses for read.table, cobycol > and limma packages I have use them, but it is not so fast].you mention limma; if this is sequence or microarray data then asking on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/ (no subscription necessary) may be more appropriate, but you need to provide more information about what you want to do, e.g., a code chunk illustrating the problem. Martin> > Could you share your methods to read big data to R faster? > > Though this is an odd question, but we need it really. > > Any suggest appreciates. > > Thank you very much. > > > kevin > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
We recently benchmarked our R Server (Intel Xeon 2.2GHz, 128 GB RAM, Centos 6.2 running R 2.15.2 64bit) where we tested various read / write / data manipulation times. A 6 GB dataset took around 15 minutes to read without colClassses. The dataset had around 10 million rows and 14 columns. Were your times comparable to this? Regards, Indrajit On Fri, 26 Apr 2013 23:19:12 +0530 wrote>Hi all scientists,Recently, I am dealing with big data ( >3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]