thr3ads.net - R help - [R] Reading huge chunks of data from MySQL into Windows R [Jun 2005]

If this information is useful, please help other people find it:
Share via:

Dubravko Dolic

2005-Jun-06 13:30 UTC

[R] Reading huge chunks of data from MySQL into Windows R

Dear List,

 

I'm trying to use R under Windows on a huge database in MySQL via ODBC
(technical reasons for this...). Now I want to read tables with some
160.000.000 entries into R. I would be lucky if anyone out there has
some good hints what to consider concerning memory management. I'm not
sure about the best methods reading such huge files into R. for the
moment I spilt the whole table into readable parts stick them together
in R again. 

 

Any hints welcome.

 

 

 

Dubravko Dolic

Statistical Analyst

 

Email: dubravko.dolic@komdat.com


	[[alternative HTML version deleted]]

Duncan Murdoch

2005-Jun-06 13:49 UTC

head link

[R] Reading huge chunks of data from MySQL into Windows R

On 6/6/2005 9:30 AM, Dubravko Dolic wrote:> Dear List,
> 
>  
> 
> I'm trying to use R under Windows on a huge database in MySQL via ODBC
> (technical reasons for this...). Now I want to read tables with some
> 160.000.000 entries into R. I would be lucky if anyone out there has
> some good hints what to consider concerning memory management. I'm not
> sure about the best methods reading such huge files into R. for the
> moment I spilt the whole table into readable parts stick them together
> in R again. 
> 
>  
> 
> Any hints welcome.
Most values in R are stored in 8 byte doubles, so 160,000,000 entries 
will take roughly a gigabyte of storage.  (Half that if they are 
integers or factors.) You are likely to run into problems manipulating 
something that big in Windows, because users are normally only allowed 2 
GB of the memory address space, and it can be fragmented.

I'd suggest developing algorithms that can work on the data a block at a 
time, so that you never need to stick the whole thing together in R at 
once.  Alternatively, switch to a 64 bit platform and install lots of 
memory -- but there are still various 4 GB limits in R, so you may still 
run into trouble.

Duncan Murdoch

bogdan romocea

2005-Jun-06 14:26 UTC

head link

[R] Reading huge chunks of data from MySQL into Windows R

You don't say what you want to do with the data, how many columns you
have etc. However, I would suggest proceeding in this order:
1. Avoid R; do everything in MySQL.
2. Use random samples.
3. If for some reason you need to process all 160 million rows in R, do
it in a loop. Pull no more than, say, 50-100k rows at a time. This
approach would allow you to process billions of rows without the memory
and disk requirements going through the roof.

hth,
b.


-----Original Message-----
From: Dubravko Dolic [mailto:Dubravko.Dolic at komdat.com]
Sent: Monday, June 06, 2005 9:31 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Reading huge chunks of data from MySQL into Windows R


Dear List,

 

I'm trying to use R under Windows on a huge database in MySQL via ODBC
(technical reasons for this...). Now I want to read tables with some
160.000.000 entries into R. I would be lucky if anyone out there has
some good hints what to consider concerning memory management. I'm not
sure about the best methods reading such huge files into R. for the
moment I spilt the whole table into readable parts stick them together
in R again. 

 

Any hints welcome.

 

 

 

Dubravko Dolic

Statistical Analyst

 

Email: dubravko.dolic at komdat.com


	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




		
__________________________________ 

Have fun online with music videos, cool games, IM and more. Check it out!

bogdan romocea

2005-Jun-06 16:02 UTC

head link

[R] Reading huge chunks of data from MySQL into Windows R

On June 06, 2005 Dubravko Dolic wrote:> So I think the other way round will serve best: 
> Do everything in R and avoid using SQL on the database...
I'm not so sure. How about your MySQL experience? 
What types of queries do you run most often? What indexes do you have?
What kind of indexes? What storage engine(s)? How about the query
cache? What kinds of joins do you do? In what order? Etc etc etc.
Optimizing MySQL queries is a (much) more complex job than it may
appear at first sight. 

An index on userid would cause your query to run in seconds instead of
minutes (the full table scan would be avoided). If you're talking about
counting stuff from very large tables, R can't possibly be faster than
MySQL.



-----Original Message-----
From: Dubravko Dolic [mailto:Dubravko.Dolic at komdat.com]
Sent: Monday, June 06, 2005 10:43 AM
To: br44114 at gmail.com
Cc: r-help at stat.math.ethz.ch
Subject: AW: [R] Reading huge chunks of data from MySQL into Windows R


In my (limited) experience R is more powerful concerning data
manipulation. An example: I have a vector holding a user id. Some user
ids can appear more than once. Doing SELECT COUNT(DISTINCT userid) on
MySQL will take approx. 15 min. Doing length(unique(userid)) will take
(almost) no time...

So I think the other way round will serve best: Do everything in R and
avoid using SQL on the database...




-----Urspr??ngliche Nachricht-----
Von: bogdan romocea [mailto:br44114 at yahoo.com] 
Gesendet: Montag, 6. Juni 2005 16:27
An: Dubravko Dolic
Cc: r-help at stat.math.ethz.ch
Betreff: RE: [R] Reading huge chunks of data from MySQL into Windows R

You don't say what you want to do with the data, how many columns you
have etc. However, I would suggest proceeding in this order:
1. Avoid R; do everything in MySQL.
2. Use random samples.
3. If for some reason you need to process all 160 million rows in R, do
it in a loop. Pull no more than, say, 50-100k rows at a time. This
approach would allow you to process billions of rows without the memory
and disk requirements going through the roof.

hth,
b.


-----Original Message-----
From: Dubravko Dolic [mailto:Dubravko.Dolic at komdat.com]
Sent: Monday, June 06, 2005 9:31 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Reading huge chunks of data from MySQL into Windows R


Dear List,

 

I'm trying to use R under Windows on a huge database in MySQL via ODBC
(technical reasons for this...). Now I want to read tables with some
160.000.000 entries into R. I would be lucky if anyone out there has
some good hints what to consider concerning memory management. I'm not
sure about the best methods reading such huge files into R. for the
moment I spilt the whole table into readable parts stick them together
in R again. 

 

Any hints welcome.

 

 

 

Dubravko Dolic

Statistical Analyst

 

Email: dubravko.dolic at komdat.com


	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html







		
__________________________________ 

Get on-the-go sports scores, stock quotes, news and more. Check it out!

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Jun 2005 - Reading huge chunks of data from MySQL into Windows R

[R] Reading huge chunks of data from MySQL into Windows R

[R] Reading huge chunks of data from MySQL into Windows R

[R] Reading huge chunks of data from MySQL into Windows R

[R] Reading huge chunks of data from MySQL into Windows R

Possibly Parallel Threads