Hi all, We currently have a potentially large scale rails project. There is one table (call it ''updates'') that will potentially have somewhere in the region of 2500 records created every minute (ie 200 million records every 6 months). The entries in this table belong to different accounts (about 50 accounts). So a possible solution is to split each customer account up and give them their own ''updates'' table. This would reduce the number of records being created in different tables. This would mean hijacking the rails ActiveRecord::Base connection mechanism, overriding the database connector, and redirecting the connection to the appropriate database, depending upon the account the request is coming from. My worry is in caching/pooling connections. If Account A calls action "get_updates", Rails connects to database "A", then Account B calls action "get_updates", Rails connects to database "B"... THEN Account A comes back and calls action "get_updates": the first connection to database "A" will presumabely have been lost. Is this correct? If so, is there a way around this? The problem would be that many many connections would be opened once, and never reused which is a potential nightmare. Has anyone else had any experience in this multiple databases approach? The alternative is to create a new "updates" table in the database for each customer, call them something like "updates_a", "updates_b" etc.. However doing this approach it looks likely that the database will grow to be very large, and scalability may become a problem? Thanks for any thoughts on this. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Just realised that Rails does not perform connection pooling: so the pooling question is of course rubbish. But would still like to hear anybody else''s thoughts on the multiple databases approach? Cheers On Aug 10, 12:06 pm, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> Hi all, > > We currently have a potentially large scale rails project. There is > one table (call it ''updates'') that will potentially have somewhere in > the region of 2500 records created every minute (ie 200 million > records every 6 months). The entries in this table belong to > different accounts (about 50 accounts). > > So a possible solution is to split each customer account up and give > them their own ''updates'' table. This would reduce the number of > records being created in different tables. This would mean hijacking > the rails ActiveRecord::Base connection mechanism, overriding the > database connector, and redirecting the connection to the appropriate > database, depending upon the account the request is coming from. > > My worry is in caching/pooling connections. If Account A calls action > "get_updates", Rails connects to database "A", then Account B calls > action "get_updates", Rails connects to database "B"... THEN Account A > comes back and calls action "get_updates": the first connection to > database "A" will presumabely have been lost. Is this correct? If > so, is there a way around this? The problem would be that many many > connections would be opened once, and never reused which is a > potential nightmare. > > Has anyone else had any experience in this multiple databases > approach? The alternative is to create a new "updates" table in the > database for each customer, call them something like "updates_a", > "updates_b" etc.. However doing this approach it looks likely that > the database will grow to be very large, and scalability may become a > problem? > > Thanks for any thoughts on this.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Matt Su wrote:> Just realised that Rails does not perform connection pooling: so the > pooling question is of course rubbish. > > But would still like to hear anybody else''s thoughts on the multiple > databases approach? > > CheersYou mention over 40 records per second being generated. How many customers? How many tables? Notion from the Ruby side: Could you do your own connection pooling? What if you were to open one connection and just keep it connected and have customers just send their request down a particular "pipe"? Notion from the database side: What if you used a stored procedure that took the request and customer ID as parameters? No new connection would have to be set up each time and such databases can cache the internal procedures so that they can connect VERY fast and efficiently. You would not need to make any new connection and the server would route things as needed without you having to do much of anything. Is that the kind of thing you were asking? -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Lloyd Linklater wrote:> Notion from the Ruby side: Could you do your own connection pooling? > What if you were to open one connection and just keep it connected and > have customers just send their request down a particular "pipe"?I meant one connection per table. oops -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Instead of different databases per customer, why not use different tables in the same database per customer ? No need to mess around with connections, just do something like Model.table_name = ''table_''+session[:customer_id] before doing anything else. I think it would work. Not sure where to put this so you don''t have to repeat this every time you need to use a model. Maybe there''s some sort of hook that can be used ? Or would this be a bad idea ? Piet.> -----Original Message----- > From: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [mailto:rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org] On Behalf Of matt > Sent: vrijdag 10 augustus 2007 13:07 > To: Ruby on Rails: Talk > Subject: [Rails] Multiple Database Connections > > So a possible solution is to split each customer account up > and give them their own ''updates'' table. This would reduce > the number of records being created in different tables. > This would mean hijacking the rails ActiveRecord::Base > connection mechanism, overriding the database connector, and > redirecting the connection to the appropriate database, > depending upon the account the request is coming from. >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi, Lloyd: We are looking at potentially 50 accounts each with 50 users, sending a request every minute. Therefore it is 2500 requests a minute. I am hesitant about using stored procedures as they don''t seem to be recommended in rails: http://wiki.rubyonrails.org/rails/pages/StoredProcedures. I am looking at connection pooling at the moment. We thought of using a separate database for each customer. Then you would switch connections based on the customer to connect to the customer''s database. I am looking at caching the connection to each customer''s DB, however Rails does not like it. If I try and get a connection to a current customer''s DB, and store it in a hash: @@connections[:account_id] = ActiveRecord::Base.connection later (i.e. several requests down the line) when I try and restore it to avoid creating a new connection: ActiveRecord::Base.connection = @@connections[:account_id] the error "Lost Connection to MySQL" appears. It seems that rails only keeps the last connection used by ActiveRecord::Base open. If i assign the connection for ActiveRecord to say connection "B", when I attempt to reassign a connection "A" i put in @@connections to ActiveRecord::Base the connection is lost. I am not sure what is going on here but any ideas would be great? Piet: Thanks, that is something we are looking at as you can indeed set the table name for a model. It may just become slightly messy with different tables scattered throughout the database although it is certainly a viable option. Does anyone have any experience with this type of design in rails: i.e. a database per customer type architecture? Cheers On Aug 10, 3:32 pm, "Piet Hadermann" <piet.haderm...-/1BeJOKqmzrQT0dZR+AlfA@public.gmane.org> wrote:> Instead of different databases per customer, why not use different > tables in the same database per customer ? > No need to mess around with connections, just do something like > Model.table_name = ''table_''+session[:customer_id] before doing anything > else. > > I think it would work. Not sure where to put this so you don''t have to > repeat this every time you need to use a model. Maybe there''s some sort > of hook that can be used ? > > Or would this be a bad idea ? > > Piet. > > > > > -----Original Message----- > > From: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > > [mailto:rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org] On Behalf Of matt > > Sent: vrijdag 10 augustus 2007 13:07 > > To: Ruby on Rails: Talk > > Subject: [Rails] Multiple Database Connections > > > So a possible solution is to split each customer account up > > and give them their own ''updates'' table. This would reduce > > the number of records being created in different tables. > > This would mean hijacking the rails ActiveRecord::Base > > connection mechanism, overriding the database connector, and > > redirecting the connection to the appropriate database, > > depending upon the account the request is coming from.- Hide quoted text - > > - Show quoted text ---~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Maybe this plugin would help. http://www.elctech.com/2007/3/8/using-and-testing-rails-with-multiple-databases Daniel On Aug 10, 7:06 am, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> Hi all, > > We currently have a potentially large scale rails project. There is > one table (call it ''updates'') that will potentially have somewhere in > the region of 2500 records created every minute (ie 200 million > records every 6 months). The entries in this table belong to > different accounts (about 50 accounts). > > So a possible solution is to split each customer account up and give > them their own ''updates'' table. This would reduce the number of > records being created in different tables. This would mean hijacking > the rails ActiveRecord::Base connection mechanism, overriding the > database connector, and redirecting the connection to the appropriate > database, depending upon the account the request is coming from. > > My worry is in caching/pooling connections. If Account A calls action > "get_updates", Rails connects to database "A", then Account B calls > action "get_updates", Rails connects to database "B"... THEN Account A > comes back and calls action "get_updates": the first connection to > database "A" will presumabely have been lost. Is this correct? If > so, is there a way around this? The problem would be that many many > connections would be opened once, and never reused which is a > potential nightmare. > > Has anyone else had any experience in this multiple databases > approach? The alternative is to create a new "updates" table in the > database for each customer, call them something like "updates_a", > "updates_b" etc.. However doing this approach it looks likely that > the database will grow to be very large, and scalability may become a > problem? > > Thanks for any thoughts on this.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Thanks, It is a good concept, but really I don''t need to change the databse for all my models in each model, I can just put a before_filter on the ApplicationController, because all models will point at a certain DB. Has no one had any experience of using Multiple/Customer Databases in a single rails app? Is it to be discouraged? Thanks in advance for anybody''s replies :) On Aug 10, 10:10 pm, danlunde <danlu...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Maybe this plugin would help. > > http://www.elctech.com/2007/3/8/using-and-testing-rails-with-multiple... > > Daniel > > On Aug 10, 7:06 am, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > > > Hi all, > > > We currently have a potentially large scale rails project. There is > > one table (call it ''updates'') that will potentially have somewhere in > > the region of 2500 records created every minute (ie 200 million > > records every 6 months). The entries in this table belong to > > different accounts (about 50 accounts). > > > So a possible solution is to split each customer account up and give > > them their own ''updates'' table. This would reduce the number of > > records being created in different tables. This would mean hijacking > > the rails ActiveRecord::Base connection mechanism, overriding the > > database connector, and redirecting the connection to the appropriate > > database, depending upon the account the request is coming from. > > > My worry is in caching/pooling connections. If Account A calls action > > "get_updates", Rails connects to database "A", then Account B calls > > action "get_updates", Rails connects to database "B"... THEN Account A > > comes back and calls action "get_updates": the first connection to > > database "A" will presumabely have been lost. Is this correct? If > > so, is there a way around this? The problem would be that many many > > connections would be opened once, and never reused which is a > > potential nightmare. > > > Has anyone else had any experience in this multiple databases > > approach? The alternative is to create a new "updates" table in the > > database for each customer, call them something like "updates_a", > > "updates_b" etc.. However doing this approach it looks likely that > > the database will grow to be very large, and scalability may become a > > problem? > > > Thanks for any thoughts on this.- Hide quoted text - > > - Show quoted text ---~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi Isak, The RDBMS is MySQL 5. The records themselves are relatively small: about 6 fields each containing a 20 character string. The data will primarily be used in a ''live'' view - it is tracking data for vehicles; whereby only the records from the last day is of interest, and for reporting, whereby records from up to 6 months will be searched for and accessed. Partitioning may help: we could split the main Updates table up into tables per tenant, or on a date basis: however I am reluctant to use this approach because it is not particularly suited to the way Rails deals with models and ActiveRecord. By partitioning the data into tables per tenant or on a historical basis would mean I would need to be changing table names in ActiveRecord. The other thought is long term scalability: using a database per customer would mean the system was self scaling. Even partitioning the data in a single database could mean the database grew to the point where it needed to be spread across multiple servers. Load will be constant during daylight hours: vehicles send updates of their locations every minute, and customers have users logged in to view this data in realtime via a mapping interface. So load on both inserts and selects on the Updates table will be pretty high during working hours, and non existent outside these hours. I think this is a rails specific problem: although it is a multi- tenant architecture problem, Rails adds its own unique twists: particularly because of its ''single database'' approach. Thanks in advance for any response On Aug 12, 5:07 pm, "Isak Hansen" <isak.han...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On 8/10/07, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > > > Hi, > > > Lloyd: > > We are looking at potentially 50 accounts each with 50 users, sending > > a request every minute. Therefore it is 2500 requests a minute. > > How large are the records? Can your rdbms engine of choice handle the > load and further growth on a single server, or do you need to split it > across multiple boxes anyway? > > How will you be using the data? Would partitioning help if your db supports it? > Do you expect load to be constant or have spikes? > > I''d make a new post on a support forum for the particular db you''re > using to figure out the ideal way of doing it, and only then start > looking for rails specific solutions. > > Isak--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Aug 10, 7:06 am, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> We currently have a potentially large scale rails project. There is > one table (call it ''updates'') that will potentially have somewhere in > the region of 2500 records created every minute (ie 200 million > records every 6 months). The entries in this table belong to > different accounts (about 50 accounts).Have you actually /tried/ this and found it to be a problem? You could spend a month working on all this multiple connection stuff, only to find out the bottleneck is somewhere else. Create a test script that submits similarly sized records, 2500 per minute and see if it works. If you use script/runner you can even run it in the rails framework with an activerecord model. If you split each customer onto a different database, are the databases going to be on different machines? If not, it seems like you would run into the same problems and perhaps even faster, due to memory, disk and network congestion. If the customer databases would be on different machines, why not run a separate rails installation for each customer? Then you just set the db config for each rails installation, rather than all this multiple connection hacking. ~Rusty --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Rusty, Unfortunately we can''t deploy a new rails application for each customer; if we could that would be great but we can''t. Inserts are not the problem: we have run spike tests and a pack of mongrels and MySQL can easily handle 2500 inserts a minute. The problem is in retrieval of data. Potentially, after a month, the Updates table could have 3 million records. Each record belongs to a tenant, and doing a search for all the Updates for a particular tenant on a 3 million big recordset takes 13 seconds. This is using 50 tenants. Using the multiple databases approach, the same select for a tenant''s Updates takes about 0.3 seconds. After 6 months, you are looking at holding 20 million records, with average searches just to get a list of Updates taking over a minute and a half (that''s with some careful indexing too). Now obviously partitioning data is one way to reduce this lookup time: in rails partitioning data would mean splitting up the Updates table either into tenants or by date...but from a scalability point of view is this desirable? Is it not better to tackle the multiple databases approach as a means of improving search times and creating an architecture that scales ''automatically''? Thanks in advance for anyone''s thoughts On Aug 12, 5:37 pm, Rusty Burchfield <n...-KboFZ2jX1NmsTnJN9+BGXg@public.gmane.org> wrote:> On Aug 10, 7:06 am, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > We currently have a potentially large scale rails project. There is > > one table (call it ''updates'') that will potentially have somewhere in > > the region of 2500 records created every minute (ie 200 million > > records every 6 months). The entries in this table belong to > > different accounts (about 50 accounts). > > Have you actually /tried/ this and found it to be a problem? You > could spend a month working on all this multiple connection stuff, > only to find out the bottleneck is somewhere else. Create a test > script that submits similarly sized records, 2500 per minute and see > if it works. If you use script/runner you can even run it in the > rails framework with an activerecord model. > > If you split each customer onto a different database, are the > databases going to be on different machines? If not, it seems like > you would run into the same problems and perhaps even faster, due to > memory, disk and network congestion. If the customer databases would > be on different machines, why not run a separate rails installation > for each customer? Then you just set the db config for each rails > installation, rather than all this multiple connection hacking. > > ~Rusty--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Aug 12, 1:18 pm, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> Potentially, after a month, the Updates table could have 3 million > records. Each record belongs to a tenant, and doing a search for all > the Updates for a particular tenant on a 3 million big recordset takes > 13 seconds. This is using 50 tenants. Using the multiple databases > approach, the same select for a tenant''s Updates takes about 0.3 > seconds.I would be curious to see how PostgreSQL 8.2 handles that dataset.> After 6 months, you are looking at holding 20 million > records, with average searches just to get a list of Updates taking > over a minute and a half (that''s with some careful indexing too). Now > obviously partitioning data is one way to reduce this lookup time: in > rails partitioning data would mean splitting up the Updates table > either into tenants or by date...but from a scalability point of view > is this desirable? Is it not better to tackle the multiple databases > approach as a means of improving search times and creating an > architecture that scales ''automatically''?If retrieval is the issue, I would side with splitting the table rather than the entire database. What is gained from multiple databases over splitting the table? It seems like additional overhead and complexity with performance similar to splitting the table. What do you mean by automatically? How is either method more ''automatic'' than the other? ~Rusty --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
> What do you mean by automatically? How is either method more > ''automatic'' than the other?By automatic, I refer to the differences in architecture. With a single database approach, as new tenants are added the system does not scale out to accommodate this new data: recordsets in the schema grow. With a multiple database approach, the system scales out horizontally to cope with the new tenant and new data. This is a ''side-effect'' as it were of using a multi-database architecture: that horizontal scale out is ''automatically'' or rather *inherently* achieved. There is no *inherent* or ''automatic'' scale out with the single database approach. Out of interest, are there any performance issues with having a large number of tables in a MySQL database? Large being in the hundreds to thousands range. Matt On Aug 12, 6:51 pm, Rusty Burchfield <n...-KboFZ2jX1NmsTnJN9+BGXg@public.gmane.org> wrote:> On Aug 12, 1:18 pm, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > Potentially, after a month, the Updates table could have 3 million > > records. Each record belongs to a tenant, and doing a search for all > > the Updates for a particular tenant on a 3 million big recordset takes > > 13 seconds. This is using 50 tenants. Using the multiple databases > > approach, the same select for a tenant''s Updates takes about 0.3 > > seconds. > > I would be curious to see how PostgreSQL 8.2 handles that dataset. > > > After 6 months, you are looking at holding 20 million > > records, with average searches just to get a list of Updates taking > > over a minute and a half (that''s with some careful indexing too). Now > > obviously partitioning data is one way to reduce this lookup time: in > > rails partitioning data would mean splitting up the Updates table > > either into tenants or by date...but from a scalability point of view > > is this desirable? Is it not better to tackle the multiple databases > > approach as a means of improving search times and creating an > > architecture that scales ''automatically''? > > If retrieval is the issue, I would side with splitting the table > rather than the entire database. What is gained from multiple > databases over splitting the table? It seems like additional overhead > and complexity with performance similar to splitting the table. > > What do you mean by automatically? How is either method more > ''automatic'' than the other? > > ~Rusty--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
matt wrote the following on 12.08.2007 21:11 :>> What do you mean by automatically? How is either method more >> ''automatic'' than the other? >> > > By automatic, I refer to the differences in architecture. With a > single database approach, as new tenants are added the system does not > scale out to accommodate this new data: recordsets in the schema > grow. With a multiple database approach, the system scales out > horizontally to cope with the new tenant and new data. This is a > ''side-effect'' as it were of using a multi-database architecture: that > horizontal scale out is ''automatically'' or rather *inherently* > achieved. There is no *inherent* or ''automatic'' scale out with the > single database approach. > > Out of interest, are there any performance issues with having a large > number of tables in a MySQL database? Large being in the hundreds to > thousands range. >The only one I remember from experience is slowness of the command line client startup unless you tell it not to autocomplete table and column names (by default it lists all tables and column on startup to provide autocompletion). The database itself doesn''t seem affected (this was on a mix of MyIsam and InnoDB tables, most tables where MyIsam). Lionel --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Actually your better approach may be to use oracle 10g It can do this "break" down behind the schenes, running a single table across a "rack" of machines. Then ror and your code does not have to worry about it. On Aug 10, 7:06 pm, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> Hi all, > > We currently have a potentially large scale rails project. There is > one table (call it ''updates'') that will potentially have somewhere in > the region of 2500 records created every minute (ie 200 million > records every 6 months). The entries in this table belong to > different accounts (about 50 accounts). > > So a possible solution is to split each customer account up and give > them their own ''updates'' table. This would reduce the number of > records being created in different tables. This would mean hijacking > the rails ActiveRecord::Base connection mechanism, overriding the > database connector, and redirecting the connection to the appropriate > database, depending upon the account the request is coming from. > > My worry is in caching/pooling connections. If Account A calls action > "get_updates", Rails connects to database "A", then Account B calls > action "get_updates", Rails connects to database "B"... THEN Account A > comes back and calls action "get_updates": the first connection to > database "A" will presumabely have been lost. Is this correct? If > so, is there a way around this? The problem would be that many many > connections would be opened once, and never reused which is a > potential nightmare. > > Has anyone else had any experience in this multiple databases > approach? The alternative is to create a new "updates" table in the > database for each customer, call them something like "updates_a", > "updates_b" etc.. However doing this approach it looks likely that > the database will grow to be very large, and scalability may become a > problem? > > Thanks for any thoughts on this.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On 8/12/07, matt <matthewjsummers-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> > > What do you mean by automatically? How is either method more > > ''automatic'' than the other? > > By automatic, I refer to the differences in architecture. With a > single database approach, as new tenants are added the system does not > scale out to accommodate this new data: recordsets in the schema > grow. With a multiple database approach, the system scales out > horizontally to cope with the new tenant and new data. This is a > ''side-effect'' as it were of using a multi-database architecture: that > horizontal scale out is ''automatically'' or rather *inherently* > achieved. There is no *inherent* or ''automatic'' scale out with the > single database approach. >You are making assumptions which are most likely not correct. Without knowing the data structure, usage patterns, and lifespan it''s difficult to give any specific advice, but multiple databases is most likely not the best solution. Table partitioning comes to mind as a possible solution. A lot depends on the overall usage characteristics. It also sounds like the data might have a relatively short lifespan? I would also suggest taking a look at postgresql instead of mysql for something like this. Mvcc architectures often do better under heavy insert/update activity. Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi Chris, Thanks for your response. Do you have any specific reasons why multiple databases may not be the best approach? Cheers! On Aug 13, 7:22 am, snacktime <snackt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On 8/12/07, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > > > > What do you mean by automatically? How is either method more > > > ''automatic'' than the other? > > > By automatic, I refer to the differences in architecture. With a > > single database approach, as new tenants are added the system does not > > scale out to accommodate this new data: recordsets in the schema > > grow. With a multiple database approach, the system scales out > > horizontally to cope with the new tenant and new data. This is a > > ''side-effect'' as it were of using a multi-database architecture: that > > horizontal scale out is ''automatically'' or rather *inherently* > > achieved. There is no *inherent* or ''automatic'' scale out with the > > single database approach. > > You are making assumptions which are most likely not correct. Without > knowing the data structure, usage patterns, and lifespan it''s > difficult to give any specific advice, but multiple databases is most > likely not the best solution. Table partitioning comes to mind as a > possible solution. A lot depends on the overall usage > characteristics. It also sounds like the data might have a relatively > short lifespan? I would also suggest taking a look at postgresql > instead of mysql for something like this. Mvcc architectures often do > better under heavy insert/update activity. > > Chris--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On 8/13/07, matt <matthewjsummers-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> > Hi Chris, > > Thanks for your response. Do you have any specific reasons why > multiple databases may not be the best approach?The main reason is that it''s usually just not necessary. The other reason is that it''s a management nightmare to partition data like that as you scale up. The amount of data is usually not a major issue, it''s the usage patterns. If you do need to partition for whatever reason, first look at the partitioning features of the database. Also, the one thing you are leaving out that makes it difficult to give any specifics is the overall usage patterns. What kind of selects will you be doing on the data? Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Chris It would normally be doing a select to get a list of Updates by tenant, for the past day (these would potentially be done every minute for each tenant user logged in). Then, when reporting was required, there would be doing a select all per tenant. Usage patterns would mean that for potentially 100 - 500 concurrent users (100 typical case, 500 worst case) each user would be doing a ( Select where tenant_id="" and date="" ) every minute or so. Does that clarify things a bit? Thanks for your response Matt On Aug 13, 9:43 am, snacktime <snackt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On 8/13/07, matt <matthewjsumm...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > > > Hi Chris, > > > Thanks for your response. Do you have any specific reasons why > > multiple databases may not be the best approach? > > The main reason is that it''s usually just not necessary. The other > reason is that it''s a management nightmare to partition data like that > as you scale up. The amount of data is usually not a major issue, > it''s the usage patterns. If you do need to partition for whatever > reason, first look at the partitioning features of the database. > > Also, the one thing you are leaving out that makes it difficult to > give any specifics is the overall usage patterns. What kind of > selects will you be doing on the data? > > Chris--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
snacktime wrote:> On 8/13/07, matt <matthewjsummers-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: >> Hi Chris, >> >> Thanks for your response. Do you have any specific reasons why >> multiple databases may not be the best approach? > > The main reason is that it''s usually just not necessary. The other > reason is that it''s a management nightmare to partition data like that > as you scale up. The amount of data is usually not a major issue, > it''s the usage patterns. If you do need to partition for whatever > reason, first look at the partitioning features of the database. > > Also, the one thing you are leaving out that makes it difficult to > give any specifics is the overall usage patterns. What kind of > selects will you be doing on the data?It may or may not be necessary for the original poster''s situation but managing connections to multiple databases is something that very large Web applications need to do in order to scale the DB backend. Current best practices for scaling the DB portion of a Web app include creating multiple read-only slaves through replication so the "master" database(s) only has to deal with the writes. This requires the app to be aware of "read-only" database connections and "write-only" database connections. Another best practice that requires being able to manage multiple database connection is "data federation" where a very large "logical" dataset is split among multiple databases and servers. Sites like eBay, Flickr and LiveJournal use both of these techniques to scale their database backends. -- Michael Wang --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On 8/13/07, matt <matthewjsummers-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> > Chris > > It would normally be doing a select to get a list of Updates by > tenant, for the past day (these would potentially be done every minute > for each tenant user logged in). Then, when reporting was required, > there would be doing a select all per tenant. > > Usage patterns would mean that for potentially 100 - 500 concurrent > users (100 typical case, 500 worst case) each user would be doing a > ( Select where tenant_id="" and date="" ) every minute or so. > > Does that clarify things a bit?In that case I''d probably take a look at table partitioning by date range. Implementation depends on the database, I''m most familiar with postgresql not sure if it works like this with mysql. The basic idea is that you create tables named by a date range, say by month. Then you use something like a rule to rewrite the queries to the appropriate table. Probably a good rule of thumb for partitioning is whatever size is small enough so you can keep the indexes in memory. It''s going to take some time to create the rules and auto create the tables, etc.., but it''s much less work in the long run then multiple databases and it''s transparent if done correctly. That''s just a general idea though, there are several variations on that, and they will all be database specific. Where you have complicated selects that need to hit the database, then partitioning can work great. But you might also look at some creative ways of caching since your selects are fairly simple and not even worry about partitioning for now. For example, on each insert/update do a memcache set, and have another key that you increment as a counter. Then on select get the counter key which contains the number of entries for the date range, and then a multi get to grab all the data. Your keys would use a naming scheme of some type so you know which ones to get. In any case I think you will find a more elegant solution then using multiple databases. I''ve gone that route before and regretted it. Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---