thr3ads.net - Rails - Multiple Database Connections [Aug 2007]

If this information is useful, please help other people find it:
Share via:

matt

2007-Aug-10 11:06 UTC

Multiple Database Connections

Hi all,

We currently have a potentially large scale rails project.  There is
one table (call it ''updates'') that will potentially have
somewhere in
the region of 2500 records created every minute (ie 200 million
records every 6 months).  The entries in this table belong to
different accounts (about 50 accounts).

So a possible solution is to split each customer account up and give
them their own ''updates'' table.  This would reduce the number
of
records being created in different tables.  This would mean hijacking
the rails ActiveRecord::Base connection mechanism, overriding the
database connector, and redirecting the connection to the appropriate
database, depending upon the account the request is coming from.

My worry is in caching/pooling connections.  If Account A calls action
"get_updates", Rails connects to database "A", then Account
B calls
action "get_updates", Rails connects to database "B"... THEN
Account A
comes back and calls action "get_updates": the first connection to
database "A" will presumabely have been lost.  Is this correct?  If
so, is there a way around this?  The problem would be that many many
connections would be opened once, and never reused which is a
potential nightmare.

Has anyone else had any experience in this multiple databases
approach?  The alternative is to create a new "updates" table in the
database for each customer, call them something like "updates_a",
"updates_b" etc..  However doing this approach it looks likely that
the database will grow to be very large, and scalability may become a
problem?

Thanks for any thoughts on this.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

matt

2007-Aug-10 13:21 UTC

head link

Re: Multiple Database Connections

Just realised that Rails does not perform connection pooling: so the
pooling question is of course rubbish.

But would still like to hear anybody else''s thoughts on the multiple
databases approach?

Cheers

On Aug 10, 12:06 pm, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:> Hi all,
>
> We currently have a potentially large scale rails project.  There is
> one table (call it ''updates'') that will potentially have
somewhere in
> the region of 2500 records created every minute (ie 200 million
> records every 6 months).  The entries in this table belong to
> different accounts (about 50 accounts).
>
> So a possible solution is to split each customer account up and give
> them their own ''updates'' table.  This would reduce the
number of
> records being created in different tables.  This would mean hijacking
> the rails ActiveRecord::Base connection mechanism, overriding the
> database connector, and redirecting the connection to the appropriate
> database, depending upon the account the request is coming from.
>
> My worry is in caching/pooling connections.  If Account A calls action
> "get_updates", Rails connects to database "A", then
Account B calls
> action "get_updates", Rails connects to database "B"...
THEN Account A
> comes back and calls action "get_updates": the first connection
to
> database "A" will presumabely have been lost.  Is this correct? 
If
> so, is there a way around this?  The problem would be that many many
> connections would be opened once, and never reused which is a
> potential nightmare.
>
> Has anyone else had any experience in this multiple databases
> approach?  The alternative is to create a new "updates" table in
the
> database for each customer, call them something like "updates_a",
> "updates_b" etc..  However doing this approach it looks likely
that
> the database will grow to be very large, and scalability may become a
> problem?
>
> Thanks for any thoughts on this.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Lloyd Linklater

2007-Aug-10 13:52 UTC

head link

Re: Multiple Database Connections

Matt Su wrote:> Just realised that Rails does not perform connection pooling: so the
> pooling question is of course rubbish.
> 
> But would still like to hear anybody else''s thoughts on the
multiple
> databases approach?
> 
> Cheers
You mention over 40 records per second being generated.  How many 
customers?  How many tables?

Notion from the Ruby side: Could you do your own connection pooling? 
What if you were to open one connection and just keep it connected and 
have customers just send their request down a particular "pipe"?

Notion from the database side:  What if you used a stored procedure that 
took the request and customer ID as parameters?  No new connection would 
have to be set up each time and such databases can cache the internal 
procedures so that they can connect VERY fast and efficiently.  You 
would not need to make any new connection and the server would route 
things as needed without you having to do much of anything.

Is that the kind of thing you were asking?
-- 
Posted via http://www.ruby-forum.com/.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Lloyd Linklater

2007-Aug-10 13:53 UTC

head link

Re: Multiple Database Connections

Lloyd Linklater wrote:> Notion from the Ruby side: Could you do your own connection pooling? 
> What if you were to open one connection and just keep it connected and 
> have customers just send their request down a particular "pipe"?
I meant one connection per table.  oops
-- 
Posted via http://www.ruby-forum.com/.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Piet Hadermann

2007-Aug-10 14:32 UTC

head link

Re: Multiple Database Connections

Instead of different databases per customer, why not use different
tables in the same database per customer ? 
No need to mess around with connections, just do something like
Model.table_name = ''table_''+session[:customer_id] before doing
anything
else.

I think it would work. Not sure where to put this so you don''t have to
repeat this every time you need to use a model. Maybe there''s some sort
of hook that can be used ?

Or would this be a bad idea ?

Piet.
> -----Original Message-----
> From: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
> [mailto:rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org] On Behalf
Of matt
> Sent: vrijdag 10 augustus 2007 13:07
> To: Ruby on Rails: Talk
> Subject: [Rails] Multiple Database Connections
> 
> So a possible solution is to split each customer account up 
> and give them their own ''updates'' table.  This would
reduce
> the number of records being created in different tables.  
> This would mean hijacking the rails ActiveRecord::Base 
> connection mechanism, overriding the database connector, and 
> redirecting the connection to the appropriate database, 
> depending upon the account the request is coming from.
> 
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

matt

2007-Aug-10 19:03 UTC

head link

Re: Multiple Database Connections

Hi,

Lloyd:
We are looking at potentially 50 accounts each with 50 users, sending
a request every minute. Therefore it is 2500 requests a minute.  I am
hesitant about using stored procedures as they don''t seem to be
recommended in rails: http://wiki.rubyonrails.org/rails/pages/StoredProcedures.

I am looking at connection pooling at the moment.  We thought of using
a separate database for each customer.  Then you would switch
connections based on the customer to connect to the customer''s
database.  I am looking at caching the connection to each customer''s
DB, however Rails does not like it.  If I try and get a connection to
a current customer''s DB, and store it in a hash:

@@connections[:account_id] = ActiveRecord::Base.connection

later (i.e. several requests down the line) when I try and restore it
to avoid creating a new connection:

ActiveRecord::Base.connection = @@connections[:account_id]

the error "Lost Connection to MySQL" appears.  It seems that rails
only keeps the last connection used by ActiveRecord::Base open.  If i
assign the connection for ActiveRecord to say connection "B", when I
attempt to reassign a connection "A" i put in @@connections to
ActiveRecord::Base the connection is lost. I am not sure what is going
on here but any ideas would be great?

Piet: Thanks, that is something we are looking at as you can indeed
set the table name for a model.  It may just become slightly messy
with different tables scattered throughout the database although it is
certainly a viable option.

Does anyone have any experience with this type of design in rails:
i.e. a database per customer type architecture?

Cheers

On Aug 10, 3:32 pm, "Piet Hadermann"
<piet.haderm...-/1BeJOKqmzrQT0dZR+AlfA@public.gmane.org>
wrote:> Instead of different databases per customer, why not use different
> tables in the same database per customer ?
> No need to mess around with connections, just do something like
> Model.table_name = ''table_''+session[:customer_id] before
doing anything
> else.
>
> I think it would work. Not sure where to put this so you don''t
have to
> repeat this every time you need to use a model. Maybe there''s some
sort
> of hook that can be used ?
>
> Or would this be a bad idea ?
>
> Piet.
>
>
>
> > -----Original Message-----
> > From: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > [mailto:rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org] On
Behalf Of matt
> > Sent: vrijdag 10 augustus 2007 13:07
> > To: Ruby on Rails: Talk
> > Subject: [Rails] Multiple Database Connections
>
> > So a possible solution is to split each customer account up
> > and give them their own ''updates'' table.  This would
reduce
> > the number of records being created in different tables.  
> > This would mean hijacking the rails ActiveRecord::Base
> > connection mechanism, overriding the database connector, and
> > redirecting the connection to the appropriate database,
> > depending upon the account the request is coming from.- Hide quoted
text -
>
> - Show quoted text -

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

danlunde

2007-Aug-10 21:10 UTC

head link

Re: Multiple Database Connections

Maybe this plugin would help.

http://www.elctech.com/2007/3/8/using-and-testing-rails-with-multiple-databases

Daniel

On Aug 10, 7:06 am, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:> Hi all,
>
> We currently have a potentially large scale rails project.  There is
> one table (call it ''updates'') that will potentially have
somewhere in
> the region of 2500 records created every minute (ie 200 million
> records every 6 months).  The entries in this table belong to
> different accounts (about 50 accounts).
>
> So a possible solution is to split each customer account up and give
> them their own ''updates'' table.  This would reduce the
number of
> records being created in different tables.  This would mean hijacking
> the rails ActiveRecord::Base connection mechanism, overriding the
> database connector, and redirecting the connection to the appropriate
> database, depending upon the account the request is coming from.
>
> My worry is in caching/pooling connections.  If Account A calls action
> "get_updates", Rails connects to database "A", then
Account B calls
> action "get_updates", Rails connects to database "B"...
THEN Account A
> comes back and calls action "get_updates": the first connection
to
> database "A" will presumabely have been lost.  Is this correct? 
If
> so, is there a way around this?  The problem would be that many many
> connections would be opened once, and never reused which is a
> potential nightmare.
>
> Has anyone else had any experience in this multiple databases
> approach?  The alternative is to create a new "updates" table in
the
> database for each customer, call them something like "updates_a",
> "updates_b" etc..  However doing this approach it looks likely
that
> the database will grow to be very large, and scalability may become a
> problem?
>
> Thanks for any thoughts on this.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

matt

2007-Aug-11 12:44 UTC

head link

Re: Multiple Database Connections

Thanks,

It is a good concept, but really I don''t need to change the databse
for all my models in each model, I can just put a before_filter on the
ApplicationController, because all models will point at a certain DB.

Has no one had any experience of using Multiple/Customer Databases in
a single rails app?  Is it to be discouraged?

Thanks in advance for anybody''s replies :)

On Aug 10, 10:10 pm, danlunde
<danlu...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> Maybe this plugin would help.
>
> http://www.elctech.com/2007/3/8/using-and-testing-rails-with-multiple...
>
> Daniel
>
> On Aug 10, 7:06 am, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
>
>
> > Hi all,
>
> > We currently have a potentially large scale rails project.  There is
> > one table (call it ''updates'') that will potentially
have somewhere in
> > the region of 2500 records created every minute (ie 200 million
> > records every 6 months).  The entries in this table belong to
> > different accounts (about 50 accounts).
>
> > So a possible solution is to split each customer account up and give
> > them their own ''updates'' table.  This would reduce
the number of
> > records being created in different tables.  This would mean hijacking
> > the rails ActiveRecord::Base connection mechanism, overriding the
> > database connector, and redirecting the connection to the appropriate
> > database, depending upon the account the request is coming from.
>
> > My worry is in caching/pooling connections.  If Account A calls action
> > "get_updates", Rails connects to database "A",
then Account B calls
> > action "get_updates", Rails connects to database
"B"... THEN Account A
> > comes back and calls action "get_updates": the first
connection to
> > database "A" will presumabely have been lost.  Is this
correct?  If
> > so, is there a way around this?  The problem would be that many many
> > connections would be opened once, and never reused which is a
> > potential nightmare.
>
> > Has anyone else had any experience in this multiple databases
> > approach?  The alternative is to create a new "updates"
table in the
> > database for each customer, call them something like
"updates_a",
> > "updates_b" etc..  However doing this approach it looks
likely that
> > the database will grow to be very large, and scalability may become a
> > problem?
>
> > Thanks for any thoughts on this.- Hide quoted text -
>
> - Show quoted text -

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

matt

2007-Aug-12 16:32 UTC

head link

Re: Multiple Database Connections

Hi Isak,

The RDBMS is MySQL 5.  The records themselves are relatively small:
about 6 fields each containing a 20 character string.  The data will
primarily be used in a ''live'' view - it is tracking data for
vehicles;
whereby only the records from the last day is of interest, and for
reporting, whereby records from up to 6 months will be searched for
and accessed.  Partitioning may help: we could split the main Updates
table up into tables per tenant, or on a date basis: however I am
reluctant to use this approach because it is not particularly suited
to the way Rails deals with models and ActiveRecord.  By partitioning
the data into tables per tenant or on a historical basis would mean I
would need to be changing table names in ActiveRecord.  The other
thought is long term scalability: using a database per customer would
mean the system was self scaling.  Even partitioning the data in a
single database could mean the database grew to the point where it
needed to be spread across multiple servers.  Load will be constant
during daylight hours: vehicles send updates of their locations every
minute, and customers have users logged in to view this data in
realtime via a mapping interface.  So load on both inserts and selects
on the Updates table will be pretty high during working hours, and non
existent outside these hours.

I think this is a rails specific problem: although it is a multi-
tenant architecture problem, Rails adds its own unique twists:
particularly because of its ''single database'' approach.

Thanks in advance for any response

On Aug 12, 5:07 pm, "Isak Hansen"
<isak.han...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> On 8/10/07, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
>
>
> > Hi,
>
> > Lloyd:
> > We are looking at potentially 50 accounts each with 50 users, sending
> > a request every minute. Therefore it is 2500 requests a minute.
>
> How large are the records? Can your rdbms engine of choice handle the
> load and further growth on a single server, or do you need to split it
> across multiple boxes anyway?
>
> How will you be using the data? Would partitioning help if your db supports
it?
> Do you expect load to be constant or have spikes?
>
> I''d make a new post on a support forum for the particular db
you''re
> using to figure out the ideal way of doing it, and only then start
> looking for rails specific solutions.
>
> Isak

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Rusty Burchfield

2007-Aug-12 16:37 UTC

head link

Re: Multiple Database Connections

On Aug 10, 7:06 am, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:> We currently have a potentially large scale rails project.  There is
> one table (call it ''updates'') that will potentially have
somewhere in
> the region of 2500 records created every minute (ie 200 million
> records every 6 months).  The entries in this table belong to
> different accounts (about 50 accounts).
Have you actually /tried/ this and found it to be a problem?  You
could spend a month working on all this multiple connection stuff,
only to find out the bottleneck is somewhere else.  Create a test
script that submits similarly sized records, 2500 per minute and see
if it works.  If you use script/runner you can even run it in the
rails framework with an activerecord model.

If you split each customer onto a different database, are the
databases going to be on different machines?  If not, it seems like
you would run into the same problems and perhaps even faster, due to
memory, disk and network congestion.  If the customer databases would
be on different machines, why not run a separate rails installation
for each customer?  Then you just set the db config for each rails
installation, rather than all this multiple connection hacking.

~Rusty


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

matt

2007-Aug-12 17:18 UTC

head link

Re: Multiple Database Connections

Rusty,

Unfortunately we can''t deploy a new rails application for each
customer; if we could that would be great but we can''t.

Inserts are not the problem: we have run spike tests and a pack of
mongrels and MySQL can easily handle 2500 inserts a minute.  The
problem is in retrieval of data.

Potentially, after a month, the Updates table could have 3 million
records.  Each record belongs to a tenant, and doing a search for all
the Updates for a particular tenant on a 3 million big recordset takes
13 seconds.  This is using 50 tenants.  Using the multiple databases
approach, the same select for a tenant''s Updates takes about 0.3
seconds.  After 6 months, you are looking at holding 20 million
records, with average searches just to get a list of Updates taking
over a minute and a half (that''s with some careful indexing too).  Now
obviously partitioning data is one way to reduce this lookup time: in
rails partitioning data would mean splitting up the Updates table
either into tenants or by date...but from a scalability point of view
is this desirable?  Is it not better to tackle the multiple databases
approach as a means of improving search times and creating an
architecture that scales ''automatically''?

Thanks in advance for anyone''s thoughts

On Aug 12, 5:37 pm, Rusty Burchfield
<n...-KboFZ2jX1NmsTnJN9+BGXg@public.gmane.org>
wrote:> On Aug 10, 7:06 am, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
> > We currently have a potentially large scale rails project.  There is
> > one table (call it ''updates'') that will potentially
have somewhere in
> > the region of 2500 records created every minute (ie 200 million
> > records every 6 months).  The entries in this table belong to
> > different accounts (about 50 accounts).
>
> Have you actually /tried/ this and found it to be a problem?  You
> could spend a month working on all this multiple connection stuff,
> only to find out the bottleneck is somewhere else.  Create a test
> script that submits similarly sized records, 2500 per minute and see
> if it works.  If you use script/runner you can even run it in the
> rails framework with an activerecord model.
>
> If you split each customer onto a different database, are the
> databases going to be on different machines?  If not, it seems like
> you would run into the same problems and perhaps even faster, due to
> memory, disk and network congestion.  If the customer databases would
> be on different machines, why not run a separate rails installation
> for each customer?  Then you just set the db config for each rails
> installation, rather than all this multiple connection hacking.
>
> ~Rusty

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Rusty Burchfield

2007-Aug-12 17:51 UTC

head link

Re: Multiple Database Connections

On Aug 12, 1:18 pm, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:> Potentially, after a month, the Updates table could have 3 million
> records.  Each record belongs to a tenant, and doing a search for all
> the Updates for a particular tenant on a 3 million big recordset takes
> 13 seconds.  This is using 50 tenants.  Using the multiple databases
> approach, the same select for a tenant''s Updates takes about 0.3
> seconds.
I would be curious to see how PostgreSQL 8.2 handles that dataset.
> After 6 months, you are looking at holding 20 million
> records, with average searches just to get a list of Updates taking
> over a minute and a half (that''s with some careful indexing too). 
Now
> obviously partitioning data is one way to reduce this lookup time: in
> rails partitioning data would mean splitting up the Updates table
> either into tenants or by date...but from a scalability point of view
> is this desirable?  Is it not better to tackle the multiple databases
> approach as a means of improving search times and creating an
> architecture that scales ''automatically''?
If retrieval is the issue, I would side with splitting the table
rather than the entire database.  What is gained from multiple
databases over splitting the table?  It seems like additional overhead
and complexity with performance similar to splitting the table.

What do you mean by automatically?  How is either method more
''automatic'' than the other?

~Rusty


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

matt

2007-Aug-12 19:11 UTC

head link

Re: Multiple Database Connections

> What do you mean by automatically?  How is either method more
> ''automatic'' than the other?
By automatic, I refer to the differences in architecture.  With a
single database approach, as new tenants are added the system does not
scale out to accommodate this new data: recordsets in the schema
grow.  With a multiple database approach, the system scales out
horizontally to cope with the new tenant and new data.  This is a
''side-effect'' as it were of using a multi-database
architecture: that
horizontal scale out is ''automatically'' or rather *inherently*
achieved.  There is no *inherent* or ''automatic'' scale out
with the
single database approach.

Out of interest, are there any performance issues with having a large
number of tables in a MySQL database?  Large being in the hundreds to
thousands range.

Matt

On Aug 12, 6:51 pm, Rusty Burchfield
<n...-KboFZ2jX1NmsTnJN9+BGXg@public.gmane.org>
wrote:> On Aug 12, 1:18 pm, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
> > Potentially, after a month, the Updates table could have 3 million
> > records.  Each record belongs to a tenant, and doing a search for all
> > the Updates for a particular tenant on a 3 million big recordset takes
> > 13 seconds.  This is using 50 tenants.  Using the multiple databases
> > approach, the same select for a tenant''s Updates takes about
0.3
> > seconds.
>
> I would be curious to see how PostgreSQL 8.2 handles that dataset.
>
> > After 6 months, you are looking at holding 20 million
> > records, with average searches just to get a list of Updates taking
> > over a minute and a half (that''s with some careful indexing
too).  Now
> > obviously partitioning data is one way to reduce this lookup time: in
> > rails partitioning data would mean splitting up the Updates table
> > either into tenants or by date...but from a scalability point of view
> > is this desirable?  Is it not better to tackle the multiple databases
> > approach as a means of improving search times and creating an
> > architecture that scales ''automatically''?
>
> If retrieval is the issue, I would side with splitting the table
> rather than the entire database.  What is gained from multiple
> databases over splitting the table?  It seems like additional overhead
> and complexity with performance similar to splitting the table.
>
> What do you mean by automatically?  How is either method more
> ''automatic'' than the other?
>
> ~Rusty

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Lionel Bouton

2007-Aug-12 19:16 UTC

head link

Re: Multiple Database Connections

matt wrote the following on 12.08.2007 21:11 :>> What do you mean by automatically?  How is either method more
>> ''automatic'' than the other?
>>     
>
> By automatic, I refer to the differences in architecture.  With a
> single database approach, as new tenants are added the system does not
> scale out to accommodate this new data: recordsets in the schema
> grow.  With a multiple database approach, the system scales out
> horizontally to cope with the new tenant and new data.  This is a
> ''side-effect'' as it were of using a multi-database
architecture: that
> horizontal scale out is ''automatically'' or rather
*inherently*
> achieved.  There is no *inherent* or ''automatic'' scale
out with the
> single database approach.
>
> Out of interest, are there any performance issues with having a large
> number of tables in a MySQL database?  Large being in the hundreds to
> thousands range.
>   
The only one I remember from experience is slowness of the command line
client startup unless you tell it not to autocomplete table and column
names (by default it lists all tables and column on startup to provide
autocompletion).
The database itself doesn''t seem affected (this was on a mix of MyIsam
and InnoDB tables, most tables where MyIsam).

Lionel

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

glennswest

2007-Aug-13 03:33 UTC

head link

Re: Multiple Database Connections

Actually your better approach may be to use oracle 10g
It can do this "break" down behind the schenes, running
a single table across a "rack" of machines.
Then ror and your code does not have to worry about it.


On Aug 10, 7:06 pm, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:> Hi all,
>
> We currently have a potentially large scale rails project.  There is
> one table (call it ''updates'') that will potentially have
somewhere in
> the region of 2500 records created every minute (ie 200 million
> records every 6 months).  The entries in this table belong to
> different accounts (about 50 accounts).
>
> So a possible solution is to split each customer account up and give
> them their own ''updates'' table.  This would reduce the
number of
> records being created in different tables.  This would mean hijacking
> the rails ActiveRecord::Base connection mechanism, overriding the
> database connector, and redirecting the connection to the appropriate
> database, depending upon the account the request is coming from.
>
> My worry is in caching/pooling connections.  If Account A calls action
> "get_updates", Rails connects to database "A", then
Account B calls
> action "get_updates", Rails connects to database "B"...
THEN Account A
> comes back and calls action "get_updates": the first connection
to
> database "A" will presumabely have been lost.  Is this correct? 
If
> so, is there a way around this?  The problem would be that many many
> connections would be opened once, and never reused which is a
> potential nightmare.
>
> Has anyone else had any experience in this multiple databases
> approach?  The alternative is to create a new "updates" table in
the
> database for each customer, call them something like "updates_a",
> "updates_b" etc..  However doing this approach it looks likely
that
> the database will grow to be very large, and scalability may become a
> problem?
>
> Thanks for any thoughts on this.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

snacktime

2007-Aug-13 06:22 UTC

head link

Re: Multiple Database Connections

On 8/12/07, matt <matthewjsummers-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:>
> > What do you mean by automatically?  How is either method more
> > ''automatic'' than the other?
>
> By automatic, I refer to the differences in architecture.  With a
> single database approach, as new tenants are added the system does not
> scale out to accommodate this new data: recordsets in the schema
> grow.  With a multiple database approach, the system scales out
> horizontally to cope with the new tenant and new data.  This is a
> ''side-effect'' as it were of using a multi-database
architecture: that
> horizontal scale out is ''automatically'' or rather
*inherently*
> achieved.  There is no *inherent* or ''automatic'' scale
out with the
> single database approach.
>
You are making assumptions which are most likely not correct.  Without
knowing the data structure, usage patterns, and lifespan it''s
difficult to give any specific advice, but multiple databases is most
likely not the best solution.   Table partitioning comes to mind as a
possible solution.  A lot depends on the overall usage
characteristics.  It also sounds like the data might have a relatively
short lifespan?  I would also suggest taking a look at postgresql
instead of mysql for something like this. Mvcc architectures often do
better under heavy insert/update activity.

Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

matt

2007-Aug-13 07:30 UTC

head link

Re: Multiple Database Connections

Hi Chris,

Thanks for your response.  Do you have any specific reasons why
multiple databases may not be the best approach?

Cheers!

On Aug 13, 7:22 am, snacktime
<snackt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> On 8/12/07, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
>
>
> > > What do you mean by automatically?  How is either method more
> > > ''automatic'' than the other?
>
> > By automatic, I refer to the differences in architecture.  With a
> > single database approach, as new tenants are added the system does not
> > scale out to accommodate this new data: recordsets in the schema
> > grow.  With a multiple database approach, the system scales out
> > horizontally to cope with the new tenant and new data.  This is a
> > ''side-effect'' as it were of using a multi-database
architecture: that
> > horizontal scale out is ''automatically'' or rather
*inherently*
> > achieved.  There is no *inherent* or ''automatic''
scale out with the
> > single database approach.
>
> You are making assumptions which are most likely not correct.  Without
> knowing the data structure, usage patterns, and lifespan it''s
> difficult to give any specific advice, but multiple databases is most
> likely not the best solution.   Table partitioning comes to mind as a
> possible solution.  A lot depends on the overall usage
> characteristics.  It also sounds like the data might have a relatively
> short lifespan?  I would also suggest taking a look at postgresql
> instead of mysql for something like this. Mvcc architectures often do
> better under heavy insert/update activity.
>
> Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

snacktime

2007-Aug-13 08:43 UTC

head link

Re: Multiple Database Connections

On 8/13/07, matt <matthewjsummers-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:>
> Hi Chris,
>
> Thanks for your response.  Do you have any specific reasons why
> multiple databases may not be the best approach?
The main reason is that it''s usually just not necessary.  The other
reason is that it''s a management nightmare to partition data like that
as you scale up.  The amount of data is usually not a major issue,
it''s the usage patterns.  If you do need to partition for whatever
reason, first look at the partitioning features of the database.

Also, the one thing you are leaving out that makes it difficult to
give any specifics is the overall usage patterns.  What kind of
selects will you be doing on the data?

Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

matt

2007-Aug-13 09:08 UTC

head link

Re: Multiple Database Connections

Chris

It would normally be doing a select to get a list of Updates by
tenant, for the past day (these would potentially be done every minute
for each tenant user logged in).  Then, when reporting was required,
there would be doing a select all per tenant.

Usage patterns would mean that for potentially 100 - 500 concurrent
users (100 typical case, 500 worst case) each user would be doing a
( Select where tenant_id="" and date="" ) every minute or
so.

Does that clarify things a bit?

Thanks for your response

Matt

On Aug 13, 9:43 am, snacktime
<snackt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> On 8/13/07, matt
<matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
>
>
> > Hi Chris,
>
> > Thanks for your response.  Do you have any specific reasons why
> > multiple databases may not be the best approach?
>
> The main reason is that it''s usually just not necessary.  The
other
> reason is that it''s a management nightmare to partition data like
that
> as you scale up.  The amount of data is usually not a major issue,
> it''s the usage patterns.  If you do need to partition for whatever
> reason, first look at the partitioning features of the database.
>
> Also, the one thing you are leaving out that makes it difficult to
> give any specifics is the overall usage patterns.  What kind of
> selects will you be doing on the data?
>
> Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Michael Wang

2007-Aug-13 09:10 UTC

head link

Re: Multiple Database Connections

snacktime wrote:> On 8/13/07, matt
<matthewjsummers-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>> Hi Chris,
>>
>> Thanks for your response.  Do you have any specific reasons why
>> multiple databases may not be the best approach?
> 
> The main reason is that it''s usually just not necessary.  The
other
> reason is that it''s a management nightmare to partition data like
that
> as you scale up.  The amount of data is usually not a major issue,
> it''s the usage patterns.  If you do need to partition for whatever
> reason, first look at the partitioning features of the database.
> 
> Also, the one thing you are leaving out that makes it difficult to
> give any specifics is the overall usage patterns.  What kind of
> selects will you be doing on the data?
It may or may not be necessary for the original poster''s situation but 
managing connections to multiple databases is something that very large 
Web applications need to do in order to scale the DB backend.

Current best practices for scaling the DB portion of a Web app include 
creating multiple read-only slaves through replication so the "master"
database(s) only has to deal with the writes. This requires the app to 
be aware of "read-only" database connections and
"write-only" database
connections.

Another best practice that requires being able to manage multiple 
database connection is "data federation" where a very large
"logical"
dataset is split among multiple databases and servers.

Sites like eBay, Flickr and LiveJournal use both of these techniques to 
scale their database backends.


-- 
Michael Wang

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

snacktime

2007-Aug-13 10:02 UTC

head link

Re: Multiple Database Connections

On 8/13/07, matt <matthewjsummers-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:>
> Chris
>
> It would normally be doing a select to get a list of Updates by
> tenant, for the past day (these would potentially be done every minute
> for each tenant user logged in).  Then, when reporting was required,
> there would be doing a select all per tenant.
>
> Usage patterns would mean that for potentially 100 - 500 concurrent
> users (100 typical case, 500 worst case) each user would be doing a
> ( Select where tenant_id="" and date="" ) every minute
or so.
>
> Does that clarify things a bit?
In that case I''d probably take a look at table partitioning by date
range.  Implementation depends on the database, I''m most familiar with
postgresql not sure if it works like this with mysql.

The basic idea is that you create tables named by a date range, say by
month.  Then you use something like a rule to rewrite the queries to
the appropriate table.  Probably a good rule of thumb for partitioning
is whatever size is small enough so you can keep the indexes in
memory.  It''s going to take some time to create the rules and auto
create the tables, etc.., but it''s much less work in the long run then
multiple databases and it''s transparent if done correctly.

That''s just a general idea though, there are several variations on
that, and they will all be database specific.  Where you have
complicated selects that need to hit the database, then partitioning
can work great.  But you might also look at some creative ways of
caching since your selects are fairly simple and not even worry about
partitioning for now.   For example, on each insert/update do a
memcache set, and have another key that you increment as a counter.
Then on select  get the counter key which contains the number of
entries for the date range, and then a multi get to grab all the data.
 Your keys would use a naming scheme of some type so you know which
ones to get.

In any case I think you will find a more elegant solution then using
multiple databases.  I''ve gone that route before and regretted it.

Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Rails - Aug 2007 - Multiple Database Connections

Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections

Re: Multiple Database Connections