Hi guys I have a problem with a fairly standard Rails / mySQL query. I must be doing something very wrong. I was hoping somebody could shed some light on this. First model: article, has_many comments Second model: comment, belongs_to article The number of articles and comments can be big (one to two millions). Every once in a while, I "clean" the DB and remove older articles and comments: Articles.delete_all(["a_date < ?", xxxx] Comments.delete_all("comments.article_id not in (select id from articles)") The first query is executed with no problems. But the second one hangs and completely loads my database server. Notes: - I am not using :dependent => :delete since I was thinking doing a destroy instead of delete on Articles would be too slow (my understanding is when you "destroy", Rails loads the object first) - I have killed all other queries on the DB, just to make sure there are no locks somewhere. Thanks! Pierre -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On 14 September 2011 12:16, PierreW <wamrewam-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> The number of articles and comments can be big (one to two millions). > Every once in a while, I "clean" the DB and remove older articles and > comments: > > Articles.delete_all(["a_date < ?", xxxx] > Comments.delete_all("comments.article_id not in (select id from > articles)") > > The first query is executed with no problems. But the second one hangs > and completely loads my database server.Delete the comments first? Comments.delete_all("comments.article_id in (select id from articles where a_date < ?)", xxxx) -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Sep 14, 12:16 pm, PierreW <wamre...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> > Articles.delete_all(["a_date < ?", xxxx] > Comments.delete_all("comments.article_id not in (select id from > articles)") > > The first query is executed with no problems. But the second one hangs > and completely loads my database server. > > Notes: > - I am not using :dependent => :delete since I was thinking doing a > destroy instead of delete on Articles would be too slow (my > understanding is when you "destroy", Rails loads the object first) > - I have killed all other queries on the DB, just to make sure there > are no locks somewhere. >Do you have an index on article_id? Id yo don''t this will be super slow. A subselect that this is probably going to be slow - i''d try a left join instead, i.e. something like Comment.joins("left join articles on articles.id article_id").where("articles.id is null").delete_all An index on article id is still advisable Fred -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
> I have a problem with a fairly standard Rails / mySQL query. I must be > doing something very wrong. I was hoping somebody could shed some > light on this. > > First model: article, has_many comments > Second model: comment, belongs_to article > > The number of articles and comments can be big (one to two millions). > Every once in a while, I "clean" the DB and remove older articles and > comments: > > Articles.delete_all(["a_date < ?", xxxx] > Comments.delete_all("comments.article_id not in (select id from > articles)") > > The first query is executed with no problems. But the second one hangs > and completely loads my database server.Innodb or myisam table types? If it''s the latter, the table in question is going to get locked while the delete happens. If you''re removing a lot of rows, regardless of how you index it, it''s going to be slow. In addition to the other advice, you might clean the table up more often (so you''re removing fewer rows). Another option if you''re willing to stick with mysql is to use their ???? (can''t remember the name) feature. It lets you create what appears to be a normal table, but it actually splits it up into multiple tables based on one of the columns -- in your case... the date. So when you remove the old entries you''re not touching the "latest table". At least if I''m remembering things right. -philip -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Wed, Sep 14, 2011 at 11:03 AM, Philip Hallstrom <philip-LSG90OXdqQE@public.gmane.org> wrote:> > Another option if you''re willing to stick with mysql is to use their ???? (can''t remember the name) feature. > It lets you create what appears to be a normal table, but it actually splits it up into multiple tables based > on one of the columns -- in your case... the date. So when you remove the old entries you''re not touching > the "latest table". At least if I''m remembering things right. > > -philip >I think you''re talking about partitioning: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html -J -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Thanks a lot guys for your help. I apologize I did not mention it, but I had the index on article_id unfortunately. Fred: I tried your LEFT JOIN method and indeed, when I EXPLAIN it the second select is of type eq_ref (vs unique_subquery with my method) so it should be faster. Yet, when I try it on a smaller test DB, the difference seems only marginal I am afraid. It still goes through all the comments rows it seems. I just made a simple test directly in mySQL on my instance: "select count(*) from comments" and even that one does not want to return! I tried to restart mysql: same. I did not know this was even possible. I realize it is now more a mySQL issue than a Rails one, but just in case: have you guys ever been confronted to something like that? I will try partitioning as well. Thanks Pierre On Sep 14, 5:13 pm, Jason Stover <jason.sto...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On Wed, Sep 14, 2011 at 11:03 AM, Philip Hallstrom <phi...-LSG90OXdqQE@public.gmane.org> wrote: > > > Another option if you''re willing to stick with mysql is to use their ???? (can''t remember the name) feature. > > It lets you create what appears to be a normal table, but it actually splits it up into multiple tables based > > on one of the columns -- in your case... the date. So when you remove the old entries you''re not touching > > the "latest table". At least if I''m remembering things right. > > > -philip > > I think you''re talking about partitioning: > > http://dev.mysql.com/doc/refman/5.1/en/partitioning.html > > -J-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On 14 Sep 2011, at 19:35, PierreW <wamrewam-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> Thanks a lot guys for your help. > > I apologize I did not mention it, but I had the index on article_id > unfortunately. > > Fred: I tried your LEFT JOIN method and indeed, when I EXPLAIN it the > second select is of type eq_ref (vs unique_subquery with my method) so > it should be faster. Yet, when I try it on a smaller test DB, the > difference seems only marginal I am afraid. It still goes through all > the comments rows it seems. > > I just made a simple test directly in mySQL on my instance: "select > count(*) from comments" and even that one does not want to return! I > tried to restart mysql: same. I did not know this was even possible. >Count(*) isn''t magically fast in innodb - mysql has to do an index scan (versus being able to just read some table metadata for some db types). You might try using show innodb status to see what is going on (there''s another thing you can do to see more detailed info about locks being held, but I don''t remember off the top of my head. Another option might be to delete the comments before you delete the articles (so inner joining comments & articles with whatever condition on articles you use to determine what to delete). Fred> I realize it is now more a mySQL issue than a Rails one, but just in > case: have you guys ever been confronted to something like that? > > I will try partitioning as well. > > Thanks > Pierre > > > > On Sep 14, 5:13 pm, Jason Stover <jason.sto...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> On Wed, Sep 14, 2011 at 11:03 AM, Philip Hallstrom <phi...-LSG90OXdqQE@public.gmane.org> wrote: >> >>> Another option if you''re willing to stick with mysql is to use their ???? (can''t remember the name) feature. >>> It lets you create what appears to be a normal table, but it actually splits it up into multiple tables based >>> on one of the columns -- in your case... the date. So when you remove the old entries you''re not touching >>> the "latest table". At least if I''m remembering things right. >> >>> -philip >> >> I think you''re talking about partitioning: >> >> http://dev.mysql.com/doc/refman/5.1/en/partitioning.html >> >> -J > > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en. >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
i don''t think partitioning is going to help you if count(*) didn''t even work. some basic questions for you on the comments table. is there a primary key? is there a separately indexed article_id? i''m sure you have them, but better to rule out the easy stuff first. i was going to suggest using the exists() function rather than an outer join, but again it doesn''t sound like your main issue. (you should learn to use the exists() function if you don''t know it. i''ve found it to work better.) for the order of operations you are trying to do, i agree with an earlier suggestion to delete comments first, then articles. also don''t use "not in". instead delete comments.* from comments where comments.article_id = @article" (sorry if this shows up like 3 times, it is my first time posting to this group and it wasn''t working.) -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/jLhHswr_E7QJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Guys, Thanks a lot for all your help and input. Just in case it can help someone else one day: it turned out that my table was corrupted. It was a real surprise to me since I was still able to "use it" (via joins) and mySQL would not "complain". I found out doing a "check table": it did not return an error but it was crashing mysqld every time. Thanks Pierre On Sep 15, 1:49 pm, JayC <jaycinco...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> i don''t think partitioning is going to help you if count(*) didn''t even > work. > > some basic questions for you on the comments table. is there a primary key? > is there a separately indexed article_id? i''m sure you have them, but better > to rule out the easy stuff first. > > i was going to suggest using the exists() function rather than an outer > join, but again it doesn''t sound like your main issue. (you should learn to > use the exists() function if you don''t know it. i''ve found it to work > better.) > > for the order of operations you are trying to do, i agree with an earlier > suggestion to delete comments first, then articles. also don''t use "not > in". instead delete comments.* from comments where comments.article_id > @article" > > (sorry if this shows up like 3 times, it is my first time posting to this > group and it wasn''t working.)-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.