Frank Poo
2009-Aug-17 23:56 UTC
Potential concurrency issues with a global hash to reduce DB
Hi folks: I''m building something akin to Google Analytics and currently I''m doing real time database updates. However this isn''t really scaling anymore, so I need to stop doing synchronous DB operations. In the short term, I''m trying to reduce DB writes, so I''m thinking about a global hash (say declared in environment.rb) that is accessible from my controllers and models that I can write to in lieu of writing to the DB. Every so often I can have a task write the updates that need to be written to the DB. Question: 1) Does this sound reasonable? 2) Will I run into any concurrency problems? Which ones? 3) How does this compare with writing to the file system and importing later? Thanks so much in advance, you ROCK! -- Posted via http://www.ruby-forum.com/.
Maurício Linhares
2009-Aug-18 00:12 UTC
Re: Potential concurrency issues with a global hash to reduce DB
If you''re not running on rails threadsafe! mode you''ll have no concurrency issues. A messaging queue could also be a solution to your problem. - Maurício Linhares http://codeshooter.wordpress.com/ | http://twitter.com/mauriciojr On Mon, Aug 17, 2009 at 8:56 PM, Frank Poo<rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > > > Hi folks: > > I''m building something akin to Google Analytics and currently I''m doing > real time database updates. However this isn''t really scaling anymore, > so I need to stop doing synchronous DB operations. In the short term, > I''m trying to reduce DB writes, so I''m thinking about a global hash (say > declared in environment.rb) that is accessible from my controllers and > models that I can write to in lieu of writing to the DB. Every so often > I can have a task write the updates that need to be written to the DB. > > Question: > 1) Does this sound reasonable? > 2) Will I run into any concurrency problems? Which ones? > 3) How does this compare with writing to the file system and importing > later? > > Thanks so much in advance, you ROCK! > -- > Posted via http://www.ruby-forum.com/. > > > >
Marnen Laibow-Koser
2009-Aug-18 02:45 UTC
Re: Potential concurrency issues with a global hash to reduce DB
Frank Poo wrote:> > > Hi folks: > > I''m building something akin to Google Analytics and currently I''m doing > real time database updates. However this isn''t really scaling anymore, > so I need to stop doing synchronous DB operations. In the short term, > I''m trying to reduce DB writes, so I''m thinking about a global hash (say > declared in environment.rb) that is accessible from my controllers and > models that I can write to in lieu of writing to the DB. [...] > 1) Does this sound reasonable?No! Databases should scale as much as you need them to -- that''s what they are designed for. If your DB is not scaling, there is something wrong, either in your DB setup or in the way the app is using the DB. Find the problem and fix it rather than ripping out the DB. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Frank Poo
2009-Aug-18 03:12 UTC
Re: Potential concurrency issues with a global hash to reduce DB
Marnen: here are some more details. I need to be able to handle at least 10 API requests / second. Let met clarify what happens: 1) User makes a RESTful API request 2) I find a record in a database, return it as JSON 3) I record the request counter for the user in the database (i.e. if I user makes 2 API calls, I record ''2''). #1 and #2 are really fast in SQL - they are SELECTs. #3 is really slow, because it''s an UPDATE. In the real world, my database (MySQL) is NOT scaling. According to New Relic, #3 is taking most of the time. Any suggestions? Marnen Laibow-Koser wrote:> Frank Poo wrote: >> >> >> Hi folks: >> >> I''m building something akin to Google Analytics and currently I''m doing >> real time database updates. However this isn''t really scaling anymore, >> so I need to stop doing synchronous DB operations. In the short term, >> I''m trying to reduce DB writes, so I''m thinking about a global hash (say >> declared in environment.rb) that is accessible from my controllers and >> models that I can write to in lieu of writing to the DB. [...] >> 1) Does this sound reasonable? > > No! Databases should scale as much as you need them to -- that''s what > they are designed for. If your DB is not scaling, there is something > wrong, either in your DB setup or in the way the app is using the DB. > Find the problem and fix it rather than ripping out the DB. > > Best, > -- > Marnen Laibow-Koser > http://www.marnen.org > marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org-- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Aug-18 03:22 UTC
Re: Potential concurrency issues with a global hash to reduce DB
Frank Poo wrote:> Marnen: here are some more details. I need to be able to handle at least > 10 API requests / second. Let met clarify what happens: > > 1) User makes a RESTful API request > 2) I find a record in a database, return it as JSON > 3) I record the request counter for the user in the database (i.e. if I > user makes 2 API calls, I record ''2''). > > #1 and #2 are really fast in SQL - they are SELECTs. #3 is really slow, > because it''s an UPDATE. In the real world, my database (MySQL) is NOT > scaling.Well, there''s your first problem -- you''re using mySQL. :) Try PostgreSQL instead. (That was admittedly snarky. Although PostgreSQL is certainly the better choice, mySQL should still work.)> According to New Relic, #3 is taking most of the time.What does that query look like? A simple UPDATE query should not be a major performance hit, so there must be something else going on. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Frank Poo
2009-Aug-18 03:39 UTC
Re: Potential concurrency issues with a global hash to reduce DB
The query is just an UPDATE with a COALESCE keyword in it. All it does is increment counters: UPDATE `statistics_api` SET `count_request` = COALESCE(`count_request`, ?) + ? WHERE (`id` = ?) This one took 8,553 ms according to New Relic. The weird part is that there is an index on ''id'' (as well as a few other columns) on this table. I just can''t figure out why the query''s taking so long ... Marnen Laibow-Koser wrote:> Frank Poo wrote: >> Marnen: here are some more details. I need to be able to handle at least >> 10 API requests / second. Let met clarify what happens: >> >> 1) User makes a RESTful API request >> 2) I find a record in a database, return it as JSON >> 3) I record the request counter for the user in the database (i.e. if I >> user makes 2 API calls, I record ''2''). >> >> #1 and #2 are really fast in SQL - they are SELECTs. #3 is really slow, >> because it''s an UPDATE. In the real world, my database (MySQL) is NOT >> scaling. > > Well, there''s your first problem -- you''re using mySQL. :) Try > PostgreSQL instead. > > (That was admittedly snarky. Although PostgreSQL is certainly the > better choice, mySQL should still work.) > >> According to New Relic, #3 is taking most of the time. > > What does that query look like? A simple UPDATE query should not be a > major performance hit, so there must be something else going on. > > Best, > -- > Marnen Laibow-Koser > http://www.marnen.org > marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org-- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Aug-18 03:45 UTC
Re: Potential concurrency issues with a global hash to reduce DB
Frank Poo wrote:> The query is just an UPDATE with a COALESCE keyword in it. All it does > is increment counters: > > UPDATE `statistics_api` SET `count_request` = COALESCE(`count_request`, > ?) + ? WHERE (`id` = ?) > > This one took 8,553 ms according to New Relic. The weird part is that > there is an index on ''id'' (as well as a few other columns) on this > table. I just can''t figure out why the query''s taking so long ...8.5 seconds? For that?!? You might want to ask someone who knows more about mySQL configuration than I do, but clearly your DB setup has problems. If you run that query when your DB is not otherwise being hit by your app, how long does it take? BTW, what does the first placeholder represent? I''d assume it would always represent 0, in which case you can hard-code it. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Frank Poo
2009-Aug-18 05:08 UTC
Re: Potential concurrency issues with a global hash to reduce DB
This query is lightning fast when my server is NOT under load. Just dead slow when I''m seeing a lot of load. You''re right, the first placeholder is probably redundant.. Marnen Laibow-Koser wrote:> Frank Poo wrote: >> The query is just an UPDATE with a COALESCE keyword in it. All it does >> is increment counters: >> >> UPDATE `statistics_api` SET `count_request` = COALESCE(`count_request`, >> ?) + ? WHERE (`id` = ?) >> >> This one took 8,553 ms according to New Relic. The weird part is that >> there is an index on ''id'' (as well as a few other columns) on this >> table. I just can''t figure out why the query''s taking so long ... > > 8.5 seconds? For that?!? You might want to ask someone who knows more > about mySQL configuration than I do, but clearly your DB setup has > problems. If you run that query when your DB is not otherwise being hit > by your app, how long does it take? > > BTW, what does the first placeholder represent? I''d assume it would > always represent 0, in which case you can hard-code it. > > > Best, > -- > Marnen Laibow-Koser > http://www.marnen.org > marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org-- Posted via http://www.ruby-forum.com/.
Greg Willits
2009-Aug-18 07:25 UTC
Re: Potential concurrency issues with a global hash to reduce DB
> Marnen Laibow-Koser wrote: >> Frank Poo wrote: >>> The query is just an UPDATE with a COALESCE keyword in it. All it >>> does >>> is increment counters: >>> >>> UPDATE `statistics_api` SET `count_request` = >>> COALESCE(`count_request`, >>> ?) + ? WHERE (`id` = ?) >>> >>> This one took 8,553 ms according to New Relic. The weird part is >>> that >>> there is an index on ''id'' (as well as a few other columns) on this >>> table. I just can''t figure out why the query''s taking so long ... >> >> 8.5 seconds? For that?!? You might want to ask someone who knows >> more >> about mySQL configuration than I do, but clearly your DB setup has >> problems. If you run that query when your DB is not otherwise >> being hit >> by your app, how long does it take? >>> Frank Poo wrote: > This query is lightning fast when my server is NOT under load. Just > dead > slow when I''m seeing a lot of load. You''re right, the first > placeholder > is probably redundant..That would likely indicate that the DB is pretty dern busy fulfilling reads (though someone should analyse that to prove it). When doing this kind of logging it often becomes necessary to have a second database which is optimized for writes to take the continual updates so that your read/write-infrequent data set is left alone. Those log writes should also be fired off as an asyncronous task so that the rest of the page/request is not slowed down waiting for the write to occur. I agree that yanking the DB in favor of in-memory workload doesn''t seem appropriate yet. -- def gw writes_at ''www.railsdev.ws'' end
Frank Poo
2009-Aug-18 12:25 UTC
Re: Potential concurrency issues with a global hash to reduce DB
Greg, what do you recommend to do async writes? I''m a newbie to async tasks in Rails, but doesn''t something like delayed_job ALSO us the database to keep track of jobs? Greg Willits wrote:> > That would likely indicate that the DB is pretty dern busy fulfilling > reads (though someone should analyse that to prove it). When doing > this kind of logging it often becomes necessary to have a second > database which is optimized for writes to take the continual updates > so that your read/write-infrequent data set is left alone. Those log > writes should also be fired off as an asyncronous task so that the > rest of the page/request is not slowed down waiting for the write to > occur. > > I agree that yanking the DB in favor of in-memory workload doesn''t > seem appropriate yet.-- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Aug-18 15:29 UTC
Re: Potential concurrency issues with a global hash to reduce DB
Frank Poo wrote:> Greg, what do you recommend to do async writes? I''m a newbie to async > tasks in Rails, but doesn''t something like delayed_job ALSO us the > database to keep track of jobs?It may be possible to use the database''s own clustering features to do this and still let your app treat it as if it were one simple DB. If not, would the masochism plugin help? Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Wojciech Piekutowski
2009-Aug-19 12:42 UTC
Re: Potential concurrency issues with a global hash to reduce DB
Which MySQL engine do you use for this table? MyISAM is faster for read operations, but locks the whole table for an update or an insert and this might affect the performance a lot. InnoDB engine uses row- level locking. Note that PostgreSQL does too. Have you thought about using document DBs like Tokyo Cabinet, MongoDB or others? They can handle huge loads. On Aug 18, 5:39 am, Frank Poo <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> The query is just an UPDATE with a COALESCE keyword in it. All it does > is increment counters: > > UPDATE `statistics_api` SET `count_request` = COALESCE(`count_request`, > ?) + ? WHERE (`id` = ?) > > This one took 8,553 ms according to New Relic. The weird part is that > there is an index on ''id'' (as well as a few other columns) on this > table. I just can''t figure out why the query''s taking so long ... > > > > Marnen Laibow-Koser wrote: > > Frank Poo wrote: > >> Marnen: here are some more details. I need to be able to handle at least > >> 10 API requests / second. Let met clarify what happens: > > >> 1) User makes a RESTful API request > >> 2) I find a record in a database, return it as JSON > >> 3) I record the request counter for the user in the database (i.e. if I > >> user makes 2 API calls, I record ''2''). > > >> #1 and #2 are really fast in SQL - they are SELECTs. #3 is really slow, > >> because it''s an UPDATE. In the real world, my database (MySQL) is NOT > >> scaling. > > > Well, there''s your first problem -- you''re using mySQL. :) Try > > PostgreSQL instead. > > > (That was admittedly snarky. Although PostgreSQL is certainly the > > better choice, mySQL should still work.) > > >> According to New Relic, #3 is taking most of the time. > > > What does that query look like? A simple UPDATE query should not be a > > major performance hit, so there must be something else going on. > > > Best, > > -- > > Marnen Laibow-Koser > >http://www.marnen.org > > mar...-sbuyVjPbboAdnm+yROfE0A@public.gmane.org > > -- > Posted viahttp://www.ruby-forum.com/.
Frank Poo
2009-Aug-19 15:40 UTC
Re: Potential concurrency issues with a global hash to reduce DB
We use InnoDB. I did some more research on this, and it turns out that my app was literally overloading the db (non-clustered) with UPDATEs. Since all these updates do is increment counters, I cached the increments in memory and flush them every once in a while. This dramatically raised my app''s performance, without the need for a complicated database setup. Wojciech Piekutowski wrote:> Which MySQL engine do you use for this table? MyISAM is faster for > read operations, but locks the whole table for an update or an insert > and this might affect the performance a lot. InnoDB engine uses row- > level locking. Note that PostgreSQL does too. > > Have you thought about using document DBs like Tokyo Cabinet, MongoDB > or others? They can handle huge loads. > > On Aug 18, 5:39�am, Frank Poo <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>-- Posted via http://www.ruby-forum.com/.