Hi! I''m trying to speed up some queries in my app, running Rails 3.2 and need some help. I''m running the app at Heroku on postgresql. I''m new to postgresql and need some help to optimize a query so it effectively uses indices. This is the query I''m currently working on: http://snipt.net/Linuus/rails-query?key=ee73173643e8d21a5a487d8a329c7a26 In the link above you can also see the full EXPLAIN (query plan) as well as my current indices. It doesn''t seem to use any of my indices at the moment... So, any help is very appreciated. Regards Linus -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/kWghe778QIAJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Would probably help if you posted the generated SQL as well as the output of EXPLAIN ANALYZE, which shows not just what the planner thinks the costs are, but what the costs actually were. You could also try running ANALYZE on the tables in question, or ANALYZE VERBOSE and checking if the estimates are pretty close to reality, e.g. ANALYZE VERBOSE categories; Jim Crate -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Thank you. I have added the generated SQL to the snipt in my first post. Regarding the EXPLAIN ANALYZE, can I somehow run that on Heroku? I''m using their shared database so I don''t have access to any psql console... I did run it on my local machine though. The thing is that it differs in database version (Local is PostgreSQL 9.1.2 while Heroku still uses 8.3 or something) and also in size. Anyway, the EXPLAIN ANALYZE from my local machine can be found here: http://snipt.net/Linuus/postgresql?key=e4728f444b24839e3f80adf3829bcba9 Regards Linus -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/UqLl5MkpgPcJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Wed, Jan 25, 2012 at 8:20 PM, Linus Pettersson < linus.pettersson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Thank you. I have added the generated SQL to the snipt in my first post. > > Regarding the EXPLAIN ANALYZE, can I somehow run that on Heroku? I''m using > their shared database so I don''t have access to any psql console... > > I did run it on my local machine though. The thing is that it differs in > database version (Local is PostgreSQL 9.1.2 while Heroku still uses 8.3 or > something) and also in size. >On Debian/Ubuntu you typically have a choice of an 8.4 or a 9.1 ... Try $ apt-cache search postgresql-8.4 and then you could install that alongside 9.1. It will run on a different port (one on 5432 and one on 5433 IIRC, check $ netstat -atn). That said, I have no information that the optimization would be different for such a "simple" query.> Anyway, the EXPLAIN ANALYZE from my local machine can be found here: > http://snipt.net/Linuus/postgresql?key=e4728f444b24839e3f80adf3829bcba9 > ><speculative> Just out of curiosity ... what problem are you trying to solve ? Did you actually see any slow queries ? I don''t see obvious problems with the query ... Not sure if this is the area that deserves your first attention wrt performance optimization ... there is this whole story about "premature optimization" (Google it to find many rants about it ...). </speculative> HTH, Peter -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Well, maybe it''s not necessary... It is the slowest of my queries as far as I can see anyway. I''m using Rails 3.2 and I have enabled automatic EXPLAIN on slow queries (>0.5s). This query is currently being "explained" in my console when I run it on my localhost. "SQL (556.5ms) ..." 556.5ms seems a bit slow to me, isn''t it? Then a bunch of other stuff is happening and I get this: "Completed 200 OK in 2737ms (Views: 813.5ms | ActiveRecord: 703.2ms)" So, 550ms of the total 703ms is the above query. Maybe I''m just picky? :) -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/kDHHedgd5AcJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Wed, Jan 25, 2012 at 8:51 PM, Linus Pettersson < linus.pettersson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Well, maybe it''s not necessary... It is the slowest of my queries as far > as I can see anyway. > > I''m using Rails 3.2 and I have enabled automatic EXPLAIN on slow queries > (>0.5s). This query is currently being "explained" in my console when I run > it on my localhost. "SQL (556.5ms) ..." > > 556.5ms seems a bit slow to me, isn''t it? >Indeed. TL;DR * On my system, a similar query seems a lot faster * you may have a bad "sort" on categories.name, subcategories.name without index ? I had a similar database structure here, filled it with 10,000 records (you seem to have around 2,200 ?), postgresql 9.1 on Ubuntu and ran a similar query: cl1 = ContactLine.includes(:contact) ; nil cl2 = cl1.includes(:contact => :person) ; nil cl3 = cl2.where("people.first_name = ''M'' OR people.first_name ''longer_word''") ; nil # 8 times ''M'' and 8 times ''longer_word'' cl4 = cl3.order("contacts.email") ; nil 167:0> puts cl4.explain SQL (16.9ms) SELECT "contact_lines"."id" AS ... FROM "contact_lines" LEFT OUTER JOIN "contacts" ON "contacts"."id" = "contact_lines"."contact_id" LEFT OUTER JOIN "people" ON "people"."id" = "contacts"."person_id" WHERE (people.first_name = ''M'' OR people.first_name = ''longer_word'') ORDER BY contacts.email EXPLAIN (1.4ms) EXPLAIN SELECT "contact_lines"."id" ... <same query description> EXPLAIN for: SELECT "contact_lines"."id" ... QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Sort (cost=691.05..691.06 rows=1 width=2699) Sort Key: contacts.email -> Hash Join (cost=455.53..691.04 rows=1 width=2699) Hash Cond: (contact_lines.contact_id = contacts.id) -> Seq Scan on contact_lines (cost=0.00..198.00 rows=10000 width=50) -> Hash (cost=455.52..455.52 rows=1 width=2649) -> Hash Join (cost=229.01..455.52 rows=1 width=2649) Hash Cond: (contacts.person_id = people.id) -> Seq Scan on contacts (cost=0.00..189.00 rows=10000 width=41) -> Hash (cost=229.00..229.00 rows=1 width=2608) -> Seq Scan on people (cost=0.00..229.00 rows=1 width=2608) Filter: (((first_name)::text = ''M''::text) OR ((first_name)::text = ''longer_word''::text)) (12 rows) => nil The log on a development server (plain `rails s`) is: Started GET "/contact_lines" for 127.0.0.1 at 2012-01-25 22:26:41 +0100 Processing by ContactLinesController#index as HTML SQL (8.6ms) SELECT "contact_lines"."id" AS t0_r0... ... ORDER BY contacts.email Rendered contact_lines/index.html.haml within layouts/application (99.4ms) Completed 200 OK in 105ms (Views: 89.5ms | ActiveRecord: 14.7ms) About indexes ... The above is without explicitly set indexes. Assuming that the first_name::text filtering would be the most expensive, I added this migration: class AddIndexFirstName < ActiveRecord::Migration def change add_index :people, :first_name end end This worked: # \d people Table "public.people" Column | Type | Modifiers --------------+-----------------------------+----------------------------------------------------- id | integer | not null default nextval(''people_id_seq''::regclass) full_name | character varying(255) | first_name | character varying(255) | ... Indexes: "people_pkey" PRIMARY KEY, btree (id) "index_people_on_first_name" btree (first_name) The EXPLAIN changed: SQL (12.7ms) SELECT "contact_lines"."id" AS t0_r0, ... QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Sort (cost=474.59..474.59 rows=1 width=2699) Sort Key: contacts.email -> Hash Join (cost=239.07..474.58 rows=1 width=2699) Hash Cond: (contact_lines.contact_id = contacts.id) -> Seq Scan on contact_lines (cost=0.00..198.00 rows=10000 width=50) -> Hash (cost=239.05..239.05 rows=1 width=2649) -> Hash Join (cost=12.54..239.05 rows=1 width=2649) Hash Cond: (contacts.person_id = people.id) -> Seq Scan on contacts (cost=0.00..189.00 rows=10000 width=41) -> Hash (cost=12.53..12.53 rows=1 width=2608) -> Bitmap Heap Scan on people (cost=8.52..12.53 rows=1 width=2608) Recheck Cond: (((first_name)::text ''M''::text) OR ((first_name)::text = ''longer_word''::text)) -> BitmapOr (cost=8.52..8.52 rows=1 width=0) -> Bitmap Index Scan on index_people_on_first_name (cost=0.00..4.26 rows=1 width=0) Index Cond: ((first_name)::text = ''M''::text) -> Bitmap Index Scan on index_people_on_first_name (cost=0.00..4.26 rows=1 width=0) Index Cond: ((first_name)::text = ''longer_word''::text) (17 rows) But the performance was the same. A log of a hit to the server. Processing by ContactLinesController#index as HTML SQL (11.7ms) SELECT "contact_lines"."id" AS t0_r0, ... ... WHERE (people.first_name = ''M'' OR people.first_name = ''longer_word'') Rendered contact_lines/index.html.haml within layouts/application (104.1ms) Completed 200 OK in 112ms (Views: 90.8ms | ActiveRecord: 19.8ms) Maybe it is related to something in your query plan that I do not see here and is suspicious: from http://snipt.net/Linuus/postgresql?key=e4728f444b24839e3f80adf3829bcba9 Sort (cost=3267.32..3272.25 rows=1972 width=2104) (actual time=39.308..39.468 rows=1880 loops=1) Sort Key: categories.name, subcategories.name Sort Method: quicksort Memory: 313kB ####### ?? from http://snipt.net/Linuus/rails-query?key=ee73173643e8d21a5a487d8a329c7a26 you seem to not have an index on those 2 `name` columns ? What happens to performance when you remove try sort ? Category.eager_load(:subcategories) .joins("INNER JOIN products AS p ON resellercategories.id p.resellercategory_id") ## WITHOUT SORT .order("categories.name ASC, subcategories.name ASC") .where("p.gender = ''unisex'' OR p.gender = ''male''") If that makes a significant difference, what happens when you add indices on those 2 columns (and possibly remove some indices on other columns) ?> Then a bunch of other stuff is happening and I get this: > "Completed 200 OK in 2737ms (Views: 813.5ms | ActiveRecord: 703.2ms)" > > So, 550ms of the total 703ms is the above query. > > Maybe I''m just picky? :) >Those indeed seem "large" numbers for a new/fresh application... Peter -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Hi I tested to remove the .order(...) part and indeed, the query time goes down to ~100ms. However, it doesn''t help to add indices, at least not as I did :) add_index :categories, :name add_index :subcategories, :name Did some more testing and if I keep the .order... but don''t join the products table I get a query that runs at about ~55ms. So the bottleneck seems to be the products table. The query that I''m running looks like this: Category.eager_load(:subcategories) .joins("INNER JOIN products AS p ON resellercategories.id = p.resellercategory_id") .order("categories.name ASC, subcategories.name ASC") (Skipping the gender here...) What I have is Categories and Subcategories. They are related to each other through a Resellercategories table. Products are related to Resellercategories. So, the reason that I want to join the products as well is because I only want to show categories and subcategories that actually have some products (there are some empty categories/subcategories still). So the above query is what we came up with in another thread here in the group. - Maybe there is a better way to check if a category/subcategory has products without joining the entire products table? BTW, in my dev database there is about 8700 rows in products table, 2200 rows in resellercategories, 5 rows in categories and 45 rows in subcategories. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/1AqHbcGn9n8J. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Also, another reason for the query that I forgot to mention is that if a user filters the products for female products only for instance, it should only show categories and subcategories that contains products for that gender. The gender is specified in the products table... -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/KXfbWuombtcJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Thu, Jan 26, 2012 at 6:15 PM, Linus Pettersson < linus.pettersson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hi > > I tested to remove the .order(...) part and indeed, the query time goes > down to ~100ms. However, it doesn''t help to add indices, at least not as I > did :) > > add_index :categories, :name > add_index :subcategories, :name > > Did some more testing and if I keep the .order... but don''t join the > products table I get a query that runs at about ~55ms. So the bottleneck > seems to be the products table. > The query that I''m running looks like this: > > Category.eager_load(:subcategories) > .joins("INNER JOIN products AS p ON resellercategories.id > p.resellercategory_id") > .order("categories.name ASC, subcategories.name ASC") > > (Skipping the gender here...) > > What I have is Categories and Subcategories. They are related to each > other through a Resellercategories table. Products are related to > Resellercategories. > So, the reason that I want to join the products as well is because I only > want to show categories and subcategories that actually have some products > (there are some empty categories/subcategories still). > > So the above query is what we came up with in another thread here in the > group. > > - Maybe there is a better way to check if a category/subcategory has > products without joining the entire products table? >It is possible to add a :counter_cache , but then you need to make sure you use the proper methods for each product that you add or remove from the association. Alternative to the default counter cache (from Rails), you could build your own logic as in: * has_male_products * ... changing you query to ... Category.eager_load(:subcategories). where(:has_male_products => true). order(...) Then you would need to set the cache on the appropriate categories in an after_save on the product you are creating/updating/deactivating/(deleting ?). Both ideas would probably be faster for querying, but certainly more complex for making sure that cache is always correct. HTH, Peter -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Thu, Jan 26, 2012 at 6:37 PM, Peter Vandenabeele <peter-jNuWw7i2w7syMbTcgqFhxg@public.gmane.org>wrote:> On Thu, Jan 26, 2012 at 6:15 PM, Linus Pettersson < > linus.pettersson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> Hi >> >> I tested to remove the .order(...) part and indeed, the query time goes >> down to ~100ms. However, it doesn''t help to add indices, at least not as I >> did :) >> >> add_index :categories, :name >> add_index :subcategories, :name >> >> Did some more testing and if I keep the .order... but don''t join the >> products table I get a query that runs at about ~55ms. So the bottleneck >> seems to be the products table. >> The query that I''m running looks like this: >> >> Category.eager_load(:subcategories) >> .joins("INNER JOIN products AS p ON resellercategories.id >> p.resellercategory_id") >> .order("categories.name ASC, subcategories.name ASC") >> >> (Skipping the gender here...) >> >> What I have is Categories and Subcategories. They are related to each >> other through a Resellercategories table. Products are related to >> Resellercategories. >> So, the reason that I want to join the products as well is because I only >> want to show categories and subcategories that actually have some products >> (there are some empty categories/subcategories still). >> >> So the above query is what we came up with in another thread here in the >> group. >> >> - Maybe there is a better way to check if a category/subcategory has >> products without joining the entire products table? >> > > > It is possible to add a :counter_cache , but then you need to make sure you > use the proper methods for each product that you add or remove from the > association. > > Alternative to the default counter cache (from Rails), you could build > your own > logic as in: > > * has_male_products > * ... > > changing you query to ... > > Category.eager_load(:subcategories). > where(:has_male_products => true). > order(...) > > Then you would need to set the cache on the appropriate categories in > an after_save on the product you are > creating/updating/deactivating/(deleting ?). > > Both ideas would probably be faster for querying, but certainly more > complex for making sure that cache is always correct. >Sorry to reply to my own post. TL;DR Is there pagination? Then a smaller set may return much faster. I was thinking over my reply and may have forgotten a fundamental aspect ... If you say 2200 categories, 8000 products. How many entries does you query return ? (replace .add with .count at the end). How many do you need ? What happens when you add .limit(20) to your query ? By which "primary object" do you want to sort and paginate ? (I will assume ''Product'' in the discussion below). With the includes that are currently implemented, you may have to redo the whole query into 2 qeuries ... 1) for fetching the "primary objects" (e.g. exactly 20 Products, no additional "has_many" data, because that would increase the number of returned rows for 1 product and make proper pagination in the database impossible; including "belongs_to" here is no problem) 2) a second query for fetching eventual "has_many" data on those 20 "primary products" (is that "Reification" ?) If the performance problem could be solved by taking the pagination into account, that would be a _much_ better solution that building cache columns in this early phase of your project. HTH, Peter -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
This query is only to get the appropriate Categories & Subcategories. There are 5 Categories and 45 Subcategories. My products are imported from webshops which are using different categories for the same things ("tshirt", "t-shirts", "t-shirt", "short sleeved shirts" may all be the same). To cope with this issue I have the "Resellercategories" that I relate to a specific Category and Subcategory. Category/Subcategory -> Resellercategory -> Product (Resellercategories are never seen in the frontend) If I filter the products for "male" I only want to display (non-empty) Categories/Subcategories that have male products. To answer your question, there is no pagination on the Categories and Subcategories that I''m fetching here with this query. The query only returns the appropriate Categories/Subcategories. So if filtered by "male" it returns 3 Categories and 12 Subcategories (In my dev environment). I am using pagination for the actual products though, but that is a separate query. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/ZGBq4G24MIUJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Thu, Jan 26, 2012 at 8:51 PM, Linus Pettersson < linus.pettersson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> This query is only to get the appropriate Categories & Subcategories. > There are 5 Categories and 45 Subcategories. > > My products are imported from webshops which are using different > categories for the same things ("tshirt", "t-shirts", "t-shirt", "short > sleeved shirts" may all be the same). To cope with this issue I have the > "Resellercategories" that I relate to a specific Category and Subcategory. > > Category/Subcategory -> Resellercategory -> Product > > (Resellercategories are never seen in the frontend) > > If I filter the products for "male" I only want to display (non-empty) > Categories/Subcategories that have male products. > > > To answer your question, there is no pagination on the Categories and > Subcategories that I''m fetching here with this query. The query only > returns the appropriate Categories/Subcategories. So if filtered by "male" > it returns 3 Categories and 12 Subcategories (In my dev environment). >OK, I understand (small amount of categories, much more products). If you find no other way, then the caching the "has_male_products" etc in the Category/Subcategory may be the best remaining way. I was also thinking, maybe it is acceptable that the set of Categories with "male_products" etc. is only update every hour or so. That would avoid the complexity of real-time update of that cache column. On the other hand, an after_save on products is not _that_ difficult. Just as a test, does your performance improve significantly if you add a "has_male_products" column on categories and filter on that? HTH, Peter -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.