Hello everyone! I''ve a question that I hope someone in here would be able to answer. Today I was writing a simple test for a method that takes the last "n" (where n is an integer) users that signed up on my web app. I know it''s a simple method, but then I started to think about the optimal way to query the DB, and I tried different options. The first thing I did was create 1000 users through a simple: 1000.times {Factory(:student)} The difference between the first and the last records is about 2 minutes. first created_at: 2010-12-15 20:35:41 last created_at: 2010-12-15 20:37:51 And this is what I''ve got on the console when using ActiveRecord::Base.uncached: *User.order("created_at DESC").limit(5).reverse* User Load (*7.4ms*) SELECT "users".* FROM "users" ORDER BY created_at DESC LIMIT 5 *User.order("created_at ASC").last(5)* User Load (*1179.1ms*) SELECT "users".* FROM "users" ORDER BY created_at ASC *User.last(5)* *# I didn''t even know I could add a parameter to last* User Load (*1171.2ms*) SELECT "users".* FROM "users" *User.all.last(5)* User Load (*1177.3ms*) SELECT "users".* FROM "users" Well, there''s obviously a huge performance impact between querying the DB with the first one and the rest. There''re some things I would love to understand. - If the users are created in order, this means the last user is the last that signed up. This means that the first query would have had to order all the 1000 users by date, then take only 5 and then reverse the array right??? So, why is it so fast??? - The second query, in this case would not have to order the users, as by default they are all ordered right? So, it would only have to take the last 5 elements of the array. But why is it slower than the first one? - The third and fourth queries, I guess load all the users in memory and then return only the last 5. Any information about this would be really helpful. Thanks. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Marnen Laibow-Koser
2010-Dec-15 21:37 UTC
Re: Mysql query optimization for order condition
Daniel A. wrote in post #968711:> Hello everyone! I''ve a question that I hope someone in here would be > able to > answer. Today I was writing a simple test for a method that takes the > last > "n" (where n is an integer) users that signed up on my web app. > > I know it''s a simple method, but then I started to think about the > optimal > way to query the DB, and I tried different options. The first thing I > did > was create 1000 users through a simple: > > 1000.times {Factory(:student)} > > The difference between the first and the last records is about 2 > minutes. > > first created_at: 2010-12-15 20:35:41 > last created_at: 2010-12-15 20:37:51 > > And this is what I''ve got on the console when using > ActiveRecord::Base.uncached: > > *User.order("created_at DESC").limit(5).reverse* > User Load (*7.4ms*) SELECT "users".* FROM "users" ORDER BY created_at > DESC > LIMIT 5 > > *User.order("created_at ASC").last(5)* > User Load (*1179.1ms*) SELECT "users".* FROM "users" ORDER BY > created_at > ASC > > *User.last(5)* *# I didn''t even know I could add a parameter to last* > User Load (*1171.2ms*) SELECT "users".* FROM "users" > > *User.all.last(5)* > User Load (*1177.3ms*) SELECT "users".* FROM "users" > > Well, there''s obviously a huge performance impact between querying the > DB > with the first one and the rest. There''re some things I would love to > understand.Herewith my speculations.> > - If the users are created in order, this means the last user is the > last > that signed up. This means that the first query would have had to order > all > the 1000 users by date, then take only 5 and then reverse the array > right??? > So, why is it so fast???Probably because most of the work is being done in the database, which is then only returning 5 records for Ruby to work with in memory. (If you were using PostgreSQL, you could make it even faster by adding an index on created_at. Come to think of it, maybe Rails already does this, though I doubt it. However, MySQL is too stupid to do backward scans of indices, IIRC.)> > - The second query, in this case would not have to order the users, as > by > default they are all ordered right? So, it would only have to take the > last > 5 elements of the array. But why is it slower than the first one?Well, the DB has to do the same operation (sort all the users by date), though this time in reverse. However, then it returns a bigger result set to Ruby, which means that Ruby has more data to manipulate in memory. I''m surprised that the time difference is that big, though.> > - The third and fourth queries, I guess load all the users in memory and > then return only the last 5.Yes -- but they won''t do what you want. Since there''s no order specified in your query, the DB will return records in an unpredictable order, and so taking the last 5 of them won''t necessarily yield the 5 newest records.> > Any information about this would be really helpful. Thanks.Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Daniel Salmer?n Amselem
2010-Dec-15 23:20 UTC
Re: Mysql query optimization for order condition
Thanks Marnen. I think the times are only related to the time taken to execute the SQL query, not to put that in memory and then sort the array. In the first case, I''m curious about how MySQL can sort all the elements so fast. I''ve read (http://www.igvita.com/2009/03/26/ruby-algorithms-sorting-trie-heaps/) that Ruby''s Array sort method implementation is pretty good. It uses Quick Sort. I''ve tried to to sort an array of strings an another one of integers, already sorted both of them, with *irb* and the result it''s almost instantaneous. Maybe MySQL uses quick sort too.. About the second query you say it has to sort the users in the opposite order which MySQL should take the same amount of time than before, but this is not the case. I see what you mean about Ruby, but I think that time is only for MySQL. About the third and fourth queries, I''ve compared both results with queries one and two and I get the same array. Why would the DB return the records in an unpredictable order? In the Wikipedia <http://en.wikipedia.org/wiki/Quicksort>, the right sidebar explains the reason why Quick Sort is slow when the set is already sorted. Oh, this is also a pretty nice website<http://www.sorting-algorithms.com/reversed-initial-order>, it has animations of different sort algorithms. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Marnen Laibow-Koser
2010-Dec-15 23:48 UTC
Re: Mysql query optimization for order condition
Pleaee quote when replying Daniel A. wrote in post #968730:> Thanks Marnen. > > I think the times are only related to the time taken to execute the SQL > query, not to put that in memory and then sort the array.You''re right that the times don''t include in-memory sorting. But on reflection, those times are ActiveRecord load times. That means they most likely include the time to retrieve records from the DB -- and then to instantiate Ruby objects for each record! Try this experiment: run those two queries from your favorite database client, outside of Rails. See if the time difference persists.> > In the first case, I''m curious about how MySQL can sort all the elements > so > fast.Why not look up the MySQL source code and find out?> I''ve read > (http://www.igvita.com/2009/03/26/ruby-algorithms-sorting-trie-heaps/) > that > Ruby''s Array sort method implementation is pretty good. It uses Quick > Sort. > I''ve tried to to sort an array of strings an another one of integers, > already sorted both of them, with *irb* and the result it''s > almost instantaneous. Maybe MySQL uses quick sort too..Perhaps. It probably has other tricks.> > About the second query you say it has to sort the users in the opposite > order which MySQL should take the same amount of time than before, but > this > is not the case. I see what you mean about Ruby, but I think that time > is > only for MySQL.See above.> > About the third and fourth queries, I''ve compared both results with > queries > one and two and I get the same array.Sure, you will sometimes, but there''s no guarantee.> Why would the DB return the > records in > an unpredictable order?Because that''s the way SQL databases work. The records within each table have no intrinsic ordering, and the DB is allowed to return them to you in whatever order it likes, unless you specify otherwise. If you need a particular order, then you always, without exception, need to specify what that order is.> > > In the Wikipedia <http://en.wikipedia.org/wiki/Quicksort>, the right > sidebar > explains the reason why Quick Sort is slow when the set is already > sorted. > Oh, this is also a pretty nice > website<http://www.sorting-algorithms.com/reversed-initial-order>, > it has animations of different sort algorithms.I''ll check those out. I do not think, however, that the difference in time you''re seeing is due to sort algorithms as such. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org Sent from my iPhone -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Daniel A. wrote in post #968711:> Hello everyone! I''ve a question that I hope someone in here would be > able to > answer. Today I was writing a simple test for a method that takes the > last > "n" (where n is an integer) users that signed up on my web app. > > I know it''s a simple method, but then I started to think about the > optimal > way to query the DB, and I tried different options. The first thing I > did > was create 1000 users through a simple: > > 1000.times {Factory(:student)} > > The difference between the first and the last records is about 2 > minutes. > > first created_at: 2010-12-15 20:35:41 > last created_at: 2010-12-15 20:37:51 > > And this is what I''ve got on the console when using > ActiveRecord::Base.uncached: > > *User.order("created_at DESC").limit(5).reverse* > User Load (*7.4ms*) SELECT "users".* FROM "users" ORDER BY created_at > DESC > LIMIT 5 > > *User.order("created_at ASC").last(5)* > User Load (*1179.1ms*) SELECT "users".* FROM "users" ORDER BY > created_at > ASC > > *User.last(5)* *# I didn''t even know I could add a parameter to last* > User Load (*1171.2ms*) SELECT "users".* FROM "users" > > *User.all.last(5)* > User Load (*1177.3ms*) SELECT "users".* FROM "users" > > Well, there''s obviously a huge performance impact between querying the > DB with the first one and the rest.Only the first of your four tested queries has a "LIMIT 5" in the SQL. With that SQL, only 5 records are transfered to the Ruby array. This is because the "limit(5)" function does that (I assume on an ActiveRecord::Relation object) and can thus modify the SQL query. When the last(5) is called on the resulting ActiveRecord::Relation object, it acts more like the Array#last method and operates on the data that is already fully loaded from the database. So, this would transfer all 1000 records to a Ruby array and then in Ruby select 5 records out of that array. I presume that is the problem. For a database, processing 1000 records should not be a real problem. But transferring them to the ruby program may take some time. In almost all cases, the construct which triggers a LIMIT on the database is the better one, _if_ you can do an ORDER BY that can be expressed in the database (you can use database functions in the database to add some flexibility there). This is typically used when you want to paginate through a very large dataset, because you do not want to load all the records into ruby, only the small subset to render that 1 paginated page. So, I see two proper solutions where you clearly see that the order and the limit are applied.> User.all(:order => "created_at DESC", :limit => 5) > User.order("created_at DESC").limit(5)They both result in: User Load (0.5ms) SELECT "users".* FROM "users" ORDER BY created_at DESC LIMIT 5 User Load (0.3ms) SELECT "users".* FROM "users" ORDER BY created_at DESC LIMIT 5 -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Daniel Salmer?n Amselem
2010-Dec-16 15:12 UTC
Re: Mysql query optimization for order condition
Thanks! That makes much more sense. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Had same issue, this is what did it for me: apt-get update apt-get install libreadline5-dev rvm package install readline rvm package install zlib # probably not necessary rvm install 1.9.2 -C --with-zlib-dir=/usr/local/rvm/usr/lib --with-readline-dir=/usr/local/rvm/usr/lib -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.