Victor Lin
2011-Jul-06 18:17 UTC
Need help porting limited eager loading optimization to Rails 3
I''m fairly certain this 2.x feature never made it to rails 3. You can view the original ticket and patches here: http://web.archive.org/web/20081005125758/http://dev.rubyonrails.org/ticket/9560 The gist of it is, when you run something like this: Article.includes(:comments, :author).where(''authors.id = 1'').limit(10) You end up with two queries. The first is a "SELECT DISTINCT authors.id...", and the next will actually load the comments and authors associations. In Rails 2.x, AR was smart enough to only join against the tables that actually limited the resultset (e.g anything in the where or order clauses). Rails 3 will blindly join all the tables, which kills performance when you have several eager loaded associations. I started working on a patch to apply_join_dependency but ran into a problem with table aliasing. The diff is here: https://gist.github.com/1067917 The approach is basically to scan the order and where clauses for table names. Then scan the included associations for these table names, adding them (and any intermediate joins) to a list, and only joining those associations. The problem is when the arel object is built for clean_relation, it has fewer joins than the original. AR builds a JoinDependency object and JoinDependency#graft''s all my joins to it. That object never actually sees any of the original joins, and so the alias tracker hands out the default table name instead of the aliased table name. I don''t know how to deal with this without hacking up more of the source than I want to - anyone have any ideas about how to deal with this? Or maybe a completely different approach? -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/As1BiWORxRAJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Oriol Gual
2011-Jul-06 19:35 UTC
Re: Need help porting limited eager loading optimization to Rails 3
I''m not really sure, but maybe hacking directly Arel (https://github.com/rails/arel) instead AR is easier to solve this issue. On Wed, Jul 6, 2011 at 20:17, Victor Lin <victorhlin@gmail.com> wrote:> I''m fairly certain this 2.x feature never made it to rails 3. You can view > the original ticket and patches here: > http://web.archive.org/web/20081005125758/http://dev.rubyonrails.org/ticket/9560 > > The gist of it is, when you run something like this: > Article.includes(:comments, :author).where(''authors.id = 1'').limit(10) > > You end up with two queries. The first is a "SELECT DISTINCT authors.id...", > and the next will actually load the comments and authors associations. In > Rails 2.x, AR was smart enough to only join against the tables that actually > limited the resultset (e.g anything in the where or order clauses). Rails 3 > will blindly join all the tables, which kills performance when you have > several eager loaded associations. > > I started working on a patch to apply_join_dependency but ran into a problem > with table aliasing. The diff is here: > https://gist.github.com/1067917 > > The approach is basically to scan the order and where clauses for table > names. Then scan the included associations for these table names, adding > them (and any intermediate joins) to a list, and only joining those > associations. The problem is when the arel object is built for > clean_relation, it has fewer joins than the original. AR builds a > JoinDependency object and JoinDependency#graft''s all my joins to it. That > object never actually sees any of the original joins, and so the alias > tracker hands out the default table name instead of the aliased table name. > I don''t know how to deal with this without hacking up more of the source > than I want to - anyone have any ideas about how to deal with this? Or maybe > a completely different approach? > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Core" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/rubyonrails-core/-/As1BiWORxRAJ. > To post to this group, send email to rubyonrails-core@googlegroups.com. > To unsubscribe from this group, send email to > rubyonrails-core+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/rubyonrails-core?hl=en. >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Ernie Miller
2011-Jul-06 20:26 UTC
Re: Need help porting limited eager loading optimization to Rails 3
On Jul 6, 2:17 pm, Victor Lin <victorh...@gmail.com> wrote:> I''m fairly certain this 2.x feature never made it to rails 3. You can view > the original ticket and patches here:http://web.archive.org/web/20081005125758/http://dev.rubyonrails.org/... > > The gist of it is, when you run something like this: > Article.includes(:comments, :author).where(''authors.id = 1'').limit(10) > > You end up with two queries. The first is a "SELECT DISTINCT authors.id...", > and the next will actually load the comments and authors associations. In > Rails 2.x, AR was smart enough to only join against the tables that actually > limited the resultset (e.g anything in the where or order clauses). Rails 3 > will blindly join all the tables, which kills performance when you have > several eager loaded associations. > > I started working on a patch to apply_join_dependency but ran into a problem > with table aliasing. The diff is here:https://gist.github.com/1067917 > > The approach is basically to scan the order and where clauses for table > names. Then scan the included associations for these table names, adding > them (and any intermediate joins) to a list, and only joining those > associations. The problem is when the arel object is built for > clean_relation, it has fewer joins than the original. AR builds a > JoinDependency object and JoinDependency#graft''s all my joins to it. That > object never actually sees any of the original joins, and so the alias > tracker hands out the default table name instead of the aliased table name. > I don''t know how to deal with this without hacking up more of the source > than I want to - anyone have any ideas about how to deal with this? Or maybe > a completely different approach?Victor, I feel your pain, but in my opinion, moving further down the road of scanning queries for strings to determine a table''s inclusion takes things in the wrong direction. The trend in core (and one that I hope continues) has been toward using ARel objects representing the various parts of a query. For instance, your order-scanning code is subject to the same bug that this commit fixes: https://github.com/rails/rails/commit/08f3f30994d37f6f44acfac801f82fc43127fc78 . I don''t think there''s a quick (and also correct) way to fix this behavior, without modifying more than this one method. Jon or Aaron might tell me I''m wrong, though. :) -- Ernie Miller http://metautonomo.us -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Victor Lin
2011-Jul-06 21:21 UTC
Re: Need help porting limited eager loading optimization to Rails 3
I totally agree - scanning sql strings is pretty crappy. Extracting the necessary tables from the ARel object should be more robust, this was just a quick hack to get things going. I think there''s a related performance issue here too, where the second query uses the LEFT OUTER JOIN method of preloading associations instead of issuing separate queries. We have the IDs already from the prequery, there''s no need to use the join strategy anymore, right? -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/Fh7jPpx6MvIJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.