When I first started my project, I thought long and hard about the exact information I would need. Because my site works largely on statistical analysis and virtual matchups can be done by any number of scenarios, it makes it very difficult to come up with a common solution. The simple facts are that at any type ''any two'' teams can be compared against one another. The data that is used for comparison is very extensive. When I started with a large join, the return time was so large that at times it would freeze up my development machine. So, I carefully and systematically broke it down into several smaller joins to ease the load. This worked fairly well, considering the type of data I''m pulling. With that said, the best time I can find when running this particular query is 540 ms. As more and more work load hits the test server, hours later this lone query can reach up to 14,000 ms (14 seconds is just too long). So, I''m trying to find a way to speed up my query, optimize it, or figure out what I can do to improve this result. Now the model method I''m about to show you is in fact enormous, but if you take it a part piece by piece you''ll see that I''m using selects for specific columns, joins for specific tables and trying to do several smaller queries to speed things up. My question is what can I do to improve this? def self.compareteams(teamone, teamtwo) week_start_date =Time.now.beginning_of_week.to_date.strftime(''%Y-%m-%d'') week_end_date = Time.now.end_of_week.to_date.strftime(''%Y-%m-%d'') compiled_on = week_start_date..week_end_date tsrs = Team.find(teamone,teamtwo, :select => "teams.name, teams.mascot, teams.formal_name, teams.image, tsrs_ratings.tsrs", :joins => [:tsrs_ratings], :conditions => {:tsrs_ratings => {:compiled_on => compiled_on}}, :order => "teams.id") tsos_offense = Team.find(teamone,teamtwo, :select => "tsos_offenses.totoff, tsos_offenses.rushoff, tsos_offenses.passoff, tsos_offenses.scoroff, tsos_offenses.rzonoff, tsos_offenses.fumlost, tsos_offenses.passhint, tsos_offenses.tolost, tsos_offenses.sacksall, tsos_offenses.tackflossall, tsos_offenses.passeff, tsos_offenses.firdwns, tsos_offenses.thrdwncon, tsos_offenses.fthdwncon, tsos_offenses.totals", :joins => [:tsos_offenses], :conditions => {:tsos_offenses => {:compiled_on => compiled_on}}, :order => "teams.id") tsos_defense = Team.find(teamone,teamtwo, :select => "tsos_defenses.totdef, tsos_defenses.rushdef, tsos_defenses.passdef, tsos_defenses.scordef, tsos_defenses.rzondef, tsos_defenses.fumgain, tsos_defenses.passint, tsos_defenses.togain, tsos_defenses.sacks AS allsacks, tsos_defenses.tackfloss, tsos_defenses.passeffdef, tsos_defenses.firdwnsdef, tsos_defenses.thrdwncondef, tsos_defenses.fthdwncon, tsos_defenses.totals", :joins => [:tsos_defenses], :conditions => {:tsos_defenses => {:compiled_on => compiled_on}}, :order => "teams.id") tsos_steams = Team.find(teamone,teamtwo, :select => "tsos_steams.kickret, tsos_steams.puntret, tsos_steams.netpunt, tsos_steams.kickretdef, tsos_steams.puntretdef, tsos_steams.totals", :joins => [:tsos_steams], :conditions => {:tsos_steams => {:compiled_on => compiled_on}}, :order => "teams.id") offense = Team.find(teamone,teamtwo, :select => "total_offenses.ydspgm AS offtotal, rushing_offenses.ydspg AS offrush, passing_offenses.ydspgm AS offpass, scoring_offenses.avg AS offscoring, red_zone_offenses.pct AS offrzone, fumbles_losts.fumbles_lost AS fumlost, passes_had_intercepteds.passes_had_intercepted AS passhint, sacks_alloweds.pg AS sacksall, tackles_for_loss_alloweds.pg AS tflossall, passing_efficiencies.rating AS pasoffeff, first_downs_offenses.pg AS firdwnoff, third_down_conversion_percentages.pct AS thirdwnpctoff, fourth_down_conversion_percentages.pct AS fourdwnpctoff", :joins => [:total_offenses, :rushing_offenses, :passing_offenses, :scoring_offenses, :red_zone_offenses, :fumbles_losts, :passes_had_intercepteds, :sacks_alloweds, :tackles_for_loss_alloweds, :passing_efficiencies, :first_downs_offenses, :third_down_conversion_percentages, :fourth_down_conversion_percentages], :conditions => {:total_offenses => {:compiled_on => compiled_on}, :rushing_offenses => {:compiled_on => compiled_on}, :passing_offenses => {:compiled_on => compiled_on}, :scoring_offenses => {:compiled_on => compiled_on}, :red_zone_offenses => {:compiled_on => compiled_on}, :fumbles_losts => {:compiled_on => compiled_on}, :passes_had_intercepteds => {:compiled_on => compiled_on}, :sacks_alloweds => {:compiled_on => compiled_on}, :tackles_for_loss_alloweds => {:compiled_on => compiled_on}, :passing_efficiencies => {:compiled_on => compiled_on}, :first_downs_offenses => {:compiled_on => compiled_on}, :third_down_conversion_percentages => {:compiled_on => compiled_on}, :fourth_down_conversion_percentages => {:compiled_on => compiled_on}}, :order => "teams.id") defense = Team.find(teamone,teamtwo, :select => "total_defenses.ydspgm AS deftotal, rushing_defenses.ydspg AS defrush, pass_defenses.ydspgm AS defpass, scoring_defenses.ptspgm AS defscoring, red_zone_defenses.pct AS defrzone, fumbles_recovereds.fumbles_recovered AS fumrec, passes_intercepteds.passes_intercepted AS passint, sacks.pg AS sacksdef, tackles_for_losses.pg AS tfloss, pass_efficiency_defenses.rating AS pasdefeff, first_downs_defenses.pg AS firdwndef, third_down_percentage_defenses.pct AS thirdwnpctdef, fourth_down_percentage_defenses.pct AS fourdwnpctdef", :joins => [:total_defenses, :rushing_defenses, :pass_defenses, :scoring_defenses, :red_zone_defenses, :fumbles_recovereds, :passes_intercepteds, :sacks, :tackles_for_losses, :pass_efficiency_defenses, :first_downs_defenses, :third_down_percentage_defenses, :fourth_down_percentage_defenses], :conditions => {:total_defenses => {:compiled_on => compiled_on}, :rushing_defenses => {:compiled_on => compiled_on}, :pass_defenses => {:compiled_on => compiled_on}, :scoring_defenses => {:compiled_on => compiled_on}, :red_zone_defenses => {:compiled_on => compiled_on}, :fumbles_recovereds => {:compiled_on => compiled_on}, :passes_intercepteds => {:compiled_on => compiled_on}, :sacks => {:compiled_on => compiled_on}, :tackles_for_losses => {:compiled_on => compiled_on}, :pass_efficiency_defenses => {:compiled_on => compiled_on}, :first_downs_defenses => {:compiled_on => compiled_on}, :third_down_percentage_defenses => {:compiled_on => compiled_on}, :fourth_down_percentage_defenses => {:compiled_on => compiled_on}}, :order => "teams.id") turnovers = Team.find(teamone,teamtwo, :select => "turnover_margins.margin AS tomargin, tsos_turnover_margins.margin AS tomarginrating", :joins => [:turnover_margins, :tsos_turnover_margins], :conditions => {:turnover_margins => {:compiled_on => compiled_on}, :tsos_turnover_margins => {:compiled_on => compiled_on}}, :order => "teams.id") special_teams = Team.find(teamone,teamtwo, :select => "kickoff_returns.avg AS kickydsper, kickoff_return_yardage_defenses.ydsperret AS kickydsperdef, punt_returns.avg AS puntydsper, punt_return_yardage_defenses.ydsperret AS puntydsperdef, net_puntings.net_avg AS netpunt", :joins => [:kickoff_returns, :kickoff_return_yardage_defenses, :punt_returns, :punt_return_yardage_defenses, :net_puntings], :conditions => {:kickoff_returns => {:compiled_on => compiled_on}, :kickoff_return_yardage_defenses => {:compiled_on => compiled_on}, :punt_returns => {:compiled_on => compiled_on}, :punt_return_yardage_defenses => {:compiled_on => compiled_on}, :net_puntings => {:compiled_on => compiled_on}}, :order => "teams.id") return tsrs, tsos_offense, tsos_defense, tsos_steams, offense, defense, turnovers, special_teams end 8 variables returned. The bottleneck begins to occur on offense = ... (around 30 ms) and then deepens on defense = ... (around 460 ms). These are indeed large but everything listed here is absolutely required for my app to work. What can I do to optimize this? thanks. -- Posted via http://www.ruby-forum.com/.
Alpha Blue
2009-Aug-06 20:42 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
For readability, this is better to look at: http://pastie.org/574570 -- Posted via http://www.ruby-forum.com/.
bill walton
2009-Aug-06 20:51 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
On Thu, 2009-08-06 at 22:42 +0200, Alpha Blue wrote:> For readability, this is better to look at: > > http://pastie.org/574570Are you already using Memcached? If not, you''ll want to. Your data doesn''t change very often, so one strategy you might consider is kicking off a job that runs your queries in the ''wee hours'' of the morning-after so they''re cached for your users. HTH, Bill
Philip Hallstrom
2009-Aug-06 21:04 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
On Aug 6, 2009, at 1:36 PM, Alpha Blue wrote:> > When I first started my project, I thought long and hard about the > exact > information I would need. Because my site works largely on > statistical > analysis and virtual matchups can be done by any number of > scenarios, it > makes it very difficult to come up with a common solution. > > The simple facts are that at any type ''any two'' teams can be compared > against one another. The data that is used for comparison is very > extensive. When I started with a large join, the return time was so > large that at times it would freeze up my development machine. So, I > carefully and systematically broke it down into several smaller > joins to > ease the load. > > This worked fairly well, considering the type of data I''m pulling. > With > that said, the best time I can find when running this particular query > is 540 ms. As more and more work load hits the test server, hours > later > this lone query can reach up to 14,000 ms (14 seconds is just too > long). > > So, I''m trying to find a way to speed up my query, optimize it, or > figure out what I can do to improve this result. Now the model method > I''m about to show you is in fact enormous, but if you take it a part > piece by piece you''ll see that I''m using selects for specific columns, > joins for specific tables and trying to do several smaller queries to > speed things up. > > My question is what can I do to improve this?As someone else mentioned if you can cache it, cache it. That said... am I reading that right that in one of your queries you are joining on 13 different tables? - Find the SQL produced for that query and run it through your database using whatever necessary to get the query plann (ie. EXPLAIN SELECT.....). Make sure you''ve got indexes on the columns that need them. - Play around with breaking that query up into smaller pieces. It may be that splitting them into individual queries and "building the join" yourself turns out to be faster. This is particularly true if your DB isn''t picking up the indexes and if you have massive tables, but are only returning a very small set of data (and again your DB isn''t picking up the indexes and lopping things off from the get go). If you''re using MySQL it can be very frustrating when you have the right indexes, but it chooses not to use them. Look to see if you can force it to or give it hints.
Frederick Cheung
2009-Aug-06 21:05 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
On Aug 6, 9:36 pm, Alpha Blue <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > The bottleneck begins to occur on offense = ... (around 30 ms) and then > deepens on defense = ... (around 460 ms). > > These are indeed large but everything listed here is absolutely required > for my app to work. What can I do to optimize this? >First off use EXPLAIN ... in your database''s the query browser/console to find out how database is executing your query. You may find that it''s doing something really stupid or that there are indices missing. Fred> thanks. > -- > Posted viahttp://www.ruby-forum.com/.
Alpha Blue
2009-Aug-06 21:07 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
bill walton wrote:> On Thu, 2009-08-06 at 22:42 +0200, Alpha Blue wrote: >> For readability, this is better to look at: >> >> http://pastie.org/574570 > > Are you already using Memcached? If not, you''ll want to. Your data > doesn''t change very often, so one strategy you might consider is kicking > off a job that runs your queries in the ''wee hours'' of the morning-after > so they''re cached for your users. > > HTH, > BillHi Bill, thanks for the response mate. I haven''t checked out memcached and will go look at that now. The problem for running my queries is that there are 120 teams so that would mean approximately 14,400 quries since any two teams can be matched up and compared to one another. I''m not sure that would work. Even without any type of stoppage time, I''m probably looking at around 90 minutes to run all those queries? Maybe I can do something like that. Without optimization (can''t quite think of anything further I can do to optimize it (unless memcached will work) - the only thing further I think that would help would be a multi-tiered server setup with the database sitting on a server by itself and using nginx as a front end on another server and my content on yet another server. I''ve been contemplating this but right now it would be fairly expensive and I don''t have enough time to implement it before the start of this season. -- Posted via http://www.ruby-forum.com/.
Alpha Blue
2009-Aug-06 21:09 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Phillip and Fred, Many thanks for those suggestions as well. I''m on it as we speak. -- Posted via http://www.ruby-forum.com/.
Alpha Blue
2009-Aug-06 21:25 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
On the table where the bottleneck is deepening the following is with explain: http://pastie.org/574630 -- Posted via http://www.ruby-forum.com/.
Philip Hallstrom
2009-Aug-06 21:27 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
> On the table where the bottleneck is deepening the following is with > explain: > > http://pastie.org/574630You need to add quite a few indexes to those tables then.
Alpha Blue
2009-Aug-06 21:35 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Yeah, I''m using innodb so I thought that the indexes were automatically setup when specifying _id on any column. I was mistaken. -- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Aug-06 21:41 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Alpha Blue wrote:> Yeah, > > I''m using innodb so I thought that the indexes were automatically setup > when specifying _id on any column. I was mistaken.Um, yeah. :) Also, consider switching to PostgreSQL. You''ll get better performance without sacrificing referential integrity. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
E. Litwin
2009-Aug-06 21:42 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
If you setup foreign key constraints, then the indexes will automatically be created on this [parent]_id columns. (And InnoDB doesn''t let you remove them even if you wanted to.) On Aug 6, 2:35 pm, Alpha Blue <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Yeah, > > I''m using innodb so I thought that the indexes were automatically setup > when specifying _id on any column. I was mistaken. > > -- > Posted viahttp://www.ruby-forum.com/.
Alpha Blue
2009-Aug-06 22:46 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Philip Hallstrom wrote:> That said... am I reading that right that in one of your queries you > are joining on 13 different tables? > > - Find the SQL produced for that query and run it through your > database using whatever necessary to get the query plann (ie. EXPLAIN > SELECT.....). Make sure you''ve got indexes on the columns that need > them. > > - Play around with breaking that query up into smaller pieces. It may > be that splitting them into individual queries and "building the join" > yourself turns out to be faster. This is particularly true if your DB > isn''t picking up the indexes and if you have massive tables, but are > only returning a very small set of data (and again your DB isn''t > picking up the indexes and lopping things off from the get go). > > If you''re using MySQL it can be very frustrating when you have the > right indexes, but it chooses not to use them. Look to see if you can > force it to or give it hints.Here''s the new pastie: http://pastie.org/574754 It appears that the indexes are being used now? I see the rows returned down to 1 - 3 rows which is much better than seeing 400+ rows on each column in the first segment. However, on the EXTRA it says that the rest are using WHERE but I don''t specifically saying it''s using index. I see the key indexes it shows but shouldn''t it be saying "Using where; Using index"? The time appears to be a bit better - went down from 14 seconds to 5 seconds but still bottlenecks a bit. I might have to do as you said - break up the two tables that I''m joining 14 tables on each.. Offense joined with 14 tables (bottleneck begins to occur) Defense joined with 14 tables (bottleneck is massivly apparent) All the other joins are 0.0 or 0.1 ms. So, I think my best bet is to break these up into smaller... -- Posted via http://www.ruby-forum.com/.
Alpha Blue
2009-Aug-06 23:01 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
I broke up the offense, defense joins in half and ran a new test... 20 ms (winner).. I still get all my data so with the indexes in place, the join splits, it looks good - I''ll have to do some extensive testing though to see how it load balances in production.. Many thanks everyone. I really appreciate the help with this optimization. -- Posted via http://www.ruby-forum.com/.
Matt Jones
2009-Aug-07 16:47 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Maybe not directly relevant, but have you considered that you may have normalized your database into nonfunctionality? From the queries you''ve shown, most of the tables have one data field(?!), a date, and a foreign key field for the team. This would be an appropriate structure for describing a system where each type of statistic might be generated at different times, but it seems completely inappropriate for your application where all the stats are (obviously) generated simultaneously. Why did you decide to go this way, vs a single ''stats'' table for everything? --Matt Jones
Alpha Blue
2009-Aug-07 18:26 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Matt Jones wrote:> Maybe not directly relevant, but have you considered that you may have > normalized your database into nonfunctionality? From the queries > you''ve shown, most of the tables have one data field(?!), a date, and > a foreign key field for the team. This would be an appropriate > structure for describing a system where each type of statistic might > be generated at different times, but it seems completely inappropriate > for your application where all the stats are (obviously) generated > simultaneously. > > Why did you decide to go this way, vs a single ''stats'' table for > everything? > > --Matt JonesHi Matt, There are approx. 37 different categories of statistics for NCAA football. Each category houses up to 14 columns. I would never attempt to build one table that housed 518 fields. All of the stats are not generated simultaneously. They are built into separate weeks. Each statistics table houses 120 rows of data paired to the given week it was compiled. So, if there''s say 17 weeks (counting bowls) for college football, then there will be 17 weeks worth of data (17 x 120 rows). Yes, I could simply place all of that into one giant table (1 stats table with 518 columns that contains 2040 rows per year. That''s not going to happen. It''s inefficient and difficult to manage. If the table gets corrupted, very likely given that type of scenario, it kills my entire site at once. You also have to account for 3 x /tmp space for the largest table you have in your database. Given the nature of this, and the fact that these stats will not be deleted for years, that can hit a server pretty hard. My tables are normalized properly. Thanks. -- Posted via http://www.ruby-forum.com/.
Matt Jones
2009-Aug-08 19:38 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
On Aug 7, 2:26 pm, Alpha Blue <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Matt Jones wrote: > > Maybe not directly relevant, but have you considered that you may have > > normalized your database into nonfunctionality? From the queries > > you''ve shown, most of the tables have one data field(?!), a date, and > > a foreign key field for the team. This would be an appropriate > > structure for describing a system where each type of statistic might > > be generated at different times, but it seems completely inappropriate > > for your application where all the stats are (obviously) generated > > simultaneously. > > > Why did you decide to go this way, vs a single ''stats'' table for > > everything? > > > --Matt Jones > > Hi Matt, > > There are approx. 37 different categories of statistics for NCAA > football. Each category houses up to 14 columns. I would never attempt > to build one table that housed 518 fields. >Why not? Even MySQL''s fairly restrictive limits are more than 5 times that (see http://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html). The general argument against that is that tables with that many rows aren''t correctly separating concerns. (see this post: http://forums.digitalpoint.com/showpost.php?s=c6f4b98064317f2ad75a532f0e1268ee&p=8255322&postcount=11 for a prime example of doing it wrong) However, in this case, a single event (a game) generates all 518 values.> All of the stats are not generated simultaneously. They are built into > separate weeks. Each statistics table houses 120 rows of data paired to > the given week it was compiled. So, if there''s say 17 weeks (counting > bowls) for college football, then there will be 17 weeks worth of data > (17 x 120 rows).Nothing controversial there.> > Yes, I could simply place all of that into one giant table (1 stats > table with 518 columns that contains 2040 rows per year. That''s not > going to happen. It''s inefficient and difficult to manage. If the > table gets corrupted, very likely given that type of scenario, it kills > my entire site at once. You also have to account for 3 x /tmp space for > the largest table you have in your database. Given the nature of this, > and the fact that these stats will not be deleted for years, that can > hit a server pretty hard. >Some quick math: 518 columns x 8 bytes per column (could be less, but assume there''s some overhead) -> 4144 bytes per row. MySQL''s limit is just shy of 65k, so no issue there. 2040 rows x 4144 bytes -> roughly 8.1 MB. Given that there are live applications that manage hundreds if not thousands of GBs of data in MySQL, I think this is a exceptionally premature optimization, unless you plan on running the server unaltered for the next 500 or so years. 3x /tmp space is similarly beneath consideration, unless you''re serving from an embedded microcontroller or something. Worrying about "table corruption" makes even less sense - if ANY data gets corrupted, it''s time to restore from backup. Finally, I''ll point out that while a combined row is way under the limit, the mondo-JOIN query you''re headed for is getting close to another limit - the maximum 61 tables allowed in a join clause. --Matt Jones
Alpha Blue
2009-Aug-08 19:48 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Hi Matt, Again, you bring up some solid points, and some I agree with and others I don''t. However, at this stage in my app (at least for this season) I can''t alter or test any of what you speak about. I will definitely test a single table in development as the season progresses and see how much more optimized it is. I''m just skeptical but I''m sure I''m not the only one that is this way when it comes to a table this large. I know it''s not the biggest table ever, but a table that size still has me concerned. However, you could be right! Maybe it''s the real way to go. I don''t know at this point. I just wanted to let you know that I do hear what you are saying and that if you are correct in everything you say, it would help me for each new season. So, my thanks to you for the input and feedback, and most of all - for following up on your comments with some solid rebuttals! Take care mate. -- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Aug-09 06:09 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Alpha Blue wrote:> Hi Matt, > > Again, you bring up some solid points, and some I agree with and others > I don''t. However, at this stage in my app (at least for this season) I > can''t alter or test any of what you speak about.Poppycock. You might not be willing to alter your production DB this season, but you *can* (and probably should) set up a testing environment to see what you can do.> > I will definitely test a single table in development as the season > progresses and see how much more optimized it is. I''m just skeptical > but I''m sure I''m not the only one that is this way when it comes to a > table this large.I think a 518-field table sounds odd, but perhaps right for what you''re doing. However, a large number of single-column tables sounds far odder.> > I know it''s not the biggest table ever, but a table that size still has > me concerned.Why?> However, you could be right! Maybe it''s the real way to > go. I don''t know at this point. > > I just wanted to let you know that I do hear what you are saying and > that if you are correct in everything you say, it would help me for each > new season. > > So, my thanks to you for the input and feedback, and most of all - for > following up on your comments with some solid rebuttals! > > Take care mate.Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Alpha Blue
2009-Aug-09 14:59 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Marnen Laibow-Koser wrote:> Poppycock. You might not be willing to alter your production DB this > season, but you *can* (and probably should) set up a testing environment > to see what you can do. >I mentioned in paragraph two that I will be setting up a test environment for it. But, again, I''m not going to alter my production DB at this time. The season is just a couple weeks away and it wouldn''t allow for enough time to implement, test, and then complete. I won''t risk any issues.> Why?I talked about table corruption previously. Matt simply said well you can restore from backup. If I had 50,000 subscribers all expecting a service and a table this large that would basically represent all of the data provided in the service failed, there will be a critical outage. Table corruption has a larger chance of occurring on larger tables. I like to ensure whatever I provide works 100%. My current situation is a lot more improved (query times are really good right now) and with caching, will only get better. What I''m doing right now is not wrong, it can just be more efficient later on (as Matt pointed out). At this point in time, we can agree to disagree on this topic. Thanks. -- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Aug-09 15:19 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Alpha Blue wrote: [...]> I mentioned in paragraph two that I will be setting up a test > environment for it.OK; I missed that.> But, again, I''m not going to alter my production DB > at this time. The season is just a couple weeks away and it wouldn''t > allow for enough time to implement, test, and then complete.Sure it would. This is not a big change from the point of view of your app.> I won''t > risk any issues. > >> Why? > > I talked about table corruption previously. Matt simply said well you > can restore from backup. If I had 50,000 subscribers all expecting a > service and a table this large that would basically represent all of the > data provided in the service failed, there will be a critical outage. > Table corruption has a larger chance of occurring on larger tables.Um, what? Table corruption is simply not an issue in any DB environment I have ever worked with. And I''m not sure why you say it has a greater chance of occurring on larger tables. If it''s simply that there''s more data, well, that''s true of many small tables too. Basically, if you can''t trust your DB server not to corrupt your tables, then don''t use that DB server. It''s that simple.> > I like to ensure whatever I provide works 100%. My current situation is > a lot more improved (query times are really good right now) and with > caching, will only get better. > > What I''m doing right now is not wrong, it can just be more efficient > later on (as Matt pointed out).Perhaps. I haven''t examined your queries in detail> > At this point in time, we can agree to disagree on this topic.But we shouldn''t. That''s a sign that more discussion is necessary, and should not simply be ignored. It''s lazy, and not in the Larry Wall sense. With what you''re doing, I doubt that you can afford that kind of laziness.> > Thanks.Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Alpha Blue
2009-Aug-09 16:38 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
In order to make the DB change work I would have to perform the following: I would have to change how my parser pulls data and uploads data I would have to change the querying for all MVCs in my project .. that''s just for starters .. That is a lot of work. I look at things from a priority perspective. Right now, all of the priority mechanisms in my app work 100% and have very good query times. Nothing is broken. Everything tests out 100% from within internal tests. Everything tests out 100% from external tests. I finished up all my mailers, observers and fully implemented my payment gateway. However, I''m working on another segment for the payment gateway and perhaps another type of payment gateway plan (paymo or boku for mobile payments) to cover those that don''t have credit cards or paypal accounts. I still have a lot of extra things that need to get done before I''m happy with my app (clubhouse pages, reporting cycles for upcoming bowl games, etc.). Because the DB situation is fine now (again, realizing that it can be more optimized down the road), there''s no reason for me to reinvent and add more potential problems to my app when I still have other things to work on. So, when I say we can agree to disagree - I''m referring to the "timing" of the optimizations. I believe they should occur "after this season", with proper testing, and implementation. I''m very systematic about how I work on things. -- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Aug-09 16:45 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Alpha Blue wrote: [...]> So, when I say we can agree to disagree - I''m referring to the "timing" > of the optimizations. I believe they should occur "after this season", > with proper testing, and implementation.I don''t think there''s necessarily a disagreement there -- I agree that you should be careful about what you do to your production DB -- although I wouldn''t be surprised if you have a DB meltdown in mid-season since your current schema is apparently so weird. (And BTW, you might want to review the usage of quotation marks. :) )> I''m very systematic about how > I work on things.Good. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Alpha Blue
2009-Aug-09 19:11 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Hi Marnen, Right now, the max time I''m pulling using the query routines that now have proper indexes and are broken down into manageable pieces runs anywhere from: 18ms - 24ms I don''t think that would create a DB meltdown considering these numbers are without caching enabled. I would imagine that once I have caching turned on, the amount of clicks the database receives would be greatly minimized. What I''ve decided to do (after talking to Slicehost staff) is to prepare for a larger slice upgrade (which would take approx. 30 minutes or less) in the wee hours of the morning if I get hit by too much load. I''m also seriously considering running a cache routine on all matchups each week so that they are cached (this was a pretty good suggestion given). I think given this scenario I should be good to go for now. However, I''m already looking at how to implement the larger table and test it against a double year data load (just for simulation purposes). My problem before was that my tables were not properly indexed and I had two queries that used 13 table joins each. It was just not efficient. The largest join I have now is 7 tables on indexed results. Those queries are only taking 3ms apiece. -- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Aug-10 00:02 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Alpha Blue wrote: [...]> The largest join I have now is 7 tables on indexed results. Those > queries are only taking 3ms apiece.OK, that sounds *way* better. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Timothy N. Tsvetkov
2009-Aug-10 07:49 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
One more vote for PostgreSQL (indexes for sorting (Index Scan Backward instead sorting all data in mysql), better usage of indexes, composite indexes and etc). Also one vote for using EXPLAIN Also one vote for using pastie.org %) And to avoid a lot of joins i use de-normalization.
Alpha Blue
2009-Aug-10 15:42 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Well I found a way to reach a middle ground. I created a cache table that houses the exact fields for the data I pull using VM. So, the methods that pull the data were moved to a task and I can pre-populate the cache table with the data for that week. So, my cache table will have 120 rows and 67 fields of data for each week. When people use the VM matchups, they are only querying that table so one table, no joins etc., and it''s a lot smaller than 500+ fields. -- Posted via http://www.ruby-forum.com/.
Alpha Blue
2009-Aug-10 20:42 UTC
Re: Speeding up ActiveRecord Querying on very large joins?
Well, good news and I want to thank Matt first and then Marnen second for their input into this. My old setup, even with the low query times was doing a full render of my VM matchup page in 540-620ms from start to completion. With the new single table setup using the cache table, my full time render of the page from start to completion is 220ms-240ms which is about a 300% improvement. I just wanted to say thanks to you all for getting me to look at the code and you were both right - it wasn''t hard to implement. Because the test worked out so well, I will more than likely start to work on implementing the very large table for my stats views. Thanks again! -- Posted via http://www.ruby-forum.com/.