thr3ads.net - R help - [R] Read big data (>3G ) methods ? [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Kevin Hao

2013-Apr-26 15:09 UTC

[R] Read big data (>3G ) methods ?

Hi all scientists,

Recently, I am dealing with big data ( >3G  txt or csv format ) in my
desktop (windows 7 - 64 bit version), but I can not read them faster,
thought I search from internet. [define colClasses for read.table, cobycol
and limma packages I have use them, but it is not so fast].

Could you share your methods to read big data to R faster?

Though this is an odd question, but we need it really.

Any suggest appreciates.

Thank you very much.


kevin

	[[alternative HTML version deleted]]

Ye Lin

2013-Apr-26 17:58 UTC

head link

[R] Read big data (>3G ) methods ?

Have you think of build a database then then let R read it thru that db
instead of your desktop?


On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao <rfans4chemo@gmail.com> wrote:
> Hi all scientists,
>
> Recently, I am dealing with big data ( >3G  txt or csv format ) in my
> desktop (windows 7 - 64 bit version), but I can not read them faster,
> thought I search from internet. [define colClasses for read.table, cobycol
> and limma packages I have use them, but it is not so fast].
>
> Could you share your methods to read big data to R faster?
>
> Though this is an odd question, but we need it really.
>
> Any suggest appreciates.
>
> Thank you very much.
>
>
> kevin
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

lcn

2013-Apr-26 19:05 UTC

head link

[R] Read big data (>3G ) methods ?

Do you really have the need loading all the data into memory?

Mostly for large data set, people would just read a chunk of it for
developing analysis pipeline, and when that's done, the ready script would
just iterate through the entire data set. For example, the read.table
function has 'nrow' and 'skip' parameters to control the reading
of data
chunks.

read.table(file, nrows = -1, skip = 0, ...)

And another tip here is, you can split the large file into smaller ones.

On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao <rfans4chemo@gmail.com> wrote:
> Hi all scientists,
>
> Recently, I am dealing with big data ( >3G  txt or csv format ) in my
> desktop (windows 7 - 64 bit version), but I can not read them faster,
> thought I search from internet. [define colClasses for read.table, cobycol
> and limma packages I have use them, but it is not so fast].
>
> Could you share your methods to read big data to R faster?
>
> Though this is an odd question, but we need it really.
>
> Any suggest appreciates.
>
> Thank you very much.
>
>
> kevin
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Martin Morgan

2013-Apr-26 20:12 UTC

head link

[R] Read big data (>3G ) methods ?

On 04/26/2013 08:09 AM, Kevin Hao wrote:> Hi all scientists,
>
> Recently, I am dealing with big data ( >3G  txt or csv format ) in my
> desktop (windows 7 - 64 bit version), but I can not read them faster,
> thought I search from internet. [define colClasses for read.table, cobycol
> and limma packages I have use them, but it is not so fast].
you mention limma; if this is sequence or microarray data then asking on the 
Bioconductor mailing list

   http://bioconductor.org/help/mailing-list/

(no subscription necessary) may be more appropriate, but you need to provide 
more information about what you want to do, e.g., a code chunk illustrating the 
problem.

Martin
>
> Could you share your methods to read big data to R faster?
>
> Though this is an odd question, but we need it really.
>
> Any suggest appreciates.
>
> Thank you very much.
>
>
> kevin
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Indrajit Sen Gupta

2013-Apr-27 05:11 UTC

head link

[R] Read big data (>3G ) methods ?

We recently benchmarked our R Server (Intel Xeon 2.2GHz, 128 GB RAM, Centos 6.2
running R 2.15.2 64bit) where we tested various read / write / data manipulation
times. A 6 GB dataset took around 15 minutes to read without colClassses. The
dataset had around 10 million rows and 14 columns.



Were your times comparable to this?



Regards,

Indrajit





On Fri, 26 Apr 2013 23:19:12 +0530  wrote
>Hi all scientists,






Recently, I am dealing with big data ( >3G txt or csv format ) in my



desktop (windows 7 - 64 bit version), but I can not read them faster,



thought I search from internet. [define colClasses for read.table, cobycol



and limma packages I have use them, but it is not so fast].







Could you share your methods to read big data to R faster?







Though this is an odd question, but we need it really.







Any suggest appreciates.







Thank you very much.











kevin







[[alternative HTML version deleted]]







______________________________________________



R-help@r-project.org mailing list



https://stat.ethz.ch/mailman/listinfo/r-help



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html



and provide commented, minimal, self-contained, reproducible code.




	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more possibly parallel threads

R help - Apr 2013 - Read big data (>3G ) methods ?

[R] Read big data (>3G ) methods ?

[R] Read big data (>3G ) methods ?

[R] Read big data (>3G ) methods ?

[R] Read big data (>3G ) methods ?

[R] Read big data (&gt;3G ) methods ?

Maybe Matching Threads

[R] Read big data (>3G ) methods ?