Hi guys. On my laptop, I''m finding that it''s taking about 5 minutes to save 25,000 model instances. The model has 5 validations, and I''m saving all of the model instances in a single transaction: http://pastie.org/509281 25,000 model instances is a lot, but 5 minutes seems too long. Any suggestions for how to speed this up? Thanks, Nick
Nick Hoffman wrote: [...]> 25,000 model instances is a lot, but 5 minutes seems too long. Any > suggestions for how to speed this up? > > Thanks, > NickSince you''re calling k.save each time through the loop, you''re making 25,000 database queries. Don''t do that -- instead, build one big insert query with all the data (or at least do it in batches of 1000 or so). ar-extensions might help here, or there may be another batch insert plugin; failing that, building the SQL directly wouldn''t be too difficult. In general, a query within a loop is a sure sign that something is wrong, or at least inefficient. Databases work best when you give them all the data at once and let them sort out how to handle it. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
On Thu, Jun 11, 2009 at 11:24 PM, Ashwin Mangale<ashwin.mangale-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Try commenting out the uniqueness constraint and check the result. To be > fair, the validations listed are fairly standard and can be done before the > keyword object is created ( client-side using something like ajax + jquery > ). > > Hope this helps. > > -AshwinHi Ashwin. I''m finding that the uniqueness constraint doesn''t add much additional processing time. You''re right that the validations are quite standard. However, you can''t rely on input data; it must be validated server-side. Thanks, Nick
On Jun 11, 11:29 pm, Marnen Laibow-Koser <rails-mailing-l...@andreas- s.net> wrote:> Nick Hoffman wrote: > > [...] > > > 25,000 model instances is a lot, but 5 minutes seems too long. Any > > suggestions for how to speed this up? > > > Thanks, > > Nick > > Since you''re calling k.save each time through the loop, you''re making > 25,000 database queries. Don''t do that -- instead, build one big insert > query with all the data (or at least do it in batches of 1000 or so). > ar-extensions might help here, or there may be another batch insert > plugin; failing that, building the SQL directly wouldn''t be too > difficult. > > In general, a query within a loop is a sure sign that something is > wrong, or at least inefficient. Databases work best when you give them > all the data at once and let them sort out how to handle it. > > Best, > -- > Marnen Laibow-Koserhttp://www.marnen.org > mar...-sbuyVjPbboAdnm+yROfE0A@public.gmane.orgHi Marnen. I''m surprised I haven''t come across ar-extensions yet. It looks like the exact solution I need to this problem. Thanks for recommending it! -Nick
On Friday 12 June 2009, Nick Hoffman wrote:> On Thu, Jun 11, 2009 at 11:24 PM, Ashwin > > Mangale<ashwin.mangale-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Try commenting out the uniqueness constraint and check the result. > > To be fair, the validations listed are fairly standard and can be > > done before the keyword object is created ( client-side using > > something like ajax + jquery ). > > > > Hope this helps. > > > > -Ashwin > > Hi Ashwin. I''m finding that the uniqueness constraint doesn''t add > much additional processing time.It''s a bogus constraint, however, because...> You''re right that the validations are quite standard. However, you > can''t rely on input data; it must be validated server-side.it checks uniqueness in the wrong place. validates_uniqueness_of does not preclude an interleaving of operations of operations from two processes that looks like this A B create(:keyword => ''foo'') validate uniqueness create(:keyword => ''foo'') validate uniqueness save save To be safe, you must have a constraint in the database. If you have that, the additional app-level validation just wastes performance. add_index :keywords, :keyword, :unique => true Michael -- Michael Schuerig mailto:michael-q5aiKMLteq4b1SvskN2V4Q@public.gmane.org http://www.schuerig.de/michael/
On Jun 12, 4:54 am, Michael Schuerig <mich...-q5aiKMLteq4b1SvskN2V4Q@public.gmane.org> wrote:> On Friday 12 June 2009, Nick Hoffman wrote: > > > On Thu, Jun 11, 2009 at 11:24 PM, Ashwin > > > Mangale<ashwin.mang...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > Try commenting out the uniqueness constraint and check the result. > > > To be fair, the validations listed are fairly standard and can be > > > done before the keyword object is created ( client-side using > > > something like ajax + jquery ). > > > > Hope this helps. > > > > -Ashwin > > > Hi Ashwin. I''m finding that the uniqueness constraint doesn''t add > > much additional processing time. > > It''s a bogus constraint, however, because... > > > You''re right that the validations are quite standard. However, you > > can''t rely on input data; it must be validated server-side. > > it checks uniqueness in the wrong place. validates_uniqueness_of does > not preclude an interleaving of operations of operations from two > processes that looks like this > > A B > create(:keyword => ''foo'') > validate uniqueness > create(:keyword => ''foo'') > validate uniqueness > save > save > > To be safe, you must have a constraint in the database. If you have > that, the additional app-level validation just wastes performance. > > add_index :keywords, :keyword, :unique => true > > Michael > > -- > Michael Schuerig > mailto:mich...-q5aiKMLteq5BV9CJdY2HSA@public.gmane.org://www.schuerig.de/michael/Are you sure about that?: http://pastie.org/509569
I had a similar situation where I had to import groups of around 2,500 objects at a time based upon the files found within a directory, back before ar-extensions was around, and by outputing a text file and then telling mysql to use that as an infile, the time dropped from a total of 45 seconds on average to about 2 seconds, for the whole action. I expected some improvment, but I had no idea how much faster it would really be until I did it. Carl On Jun 12, 5:20 am, Nick Hoffman <n...-qGbiljoI0DQkmLvzuZlaBw@public.gmane.org> wrote:> On Jun 12, 4:54 am, Michael Schuerig <mich...-q5aiKMLteq4b1SvskN2V4Q@public.gmane.org> wrote: > > > > > On Friday 12 June 2009, Nick Hoffman wrote: > > > > On Thu, Jun 11, 2009 at 11:24 PM, Ashwin > > > > Mangale<ashwin.mang...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > Try commenting out the uniqueness constraint and check the result. > > > > To be fair, the validations listed are fairly standard and can be > > > > done before the keyword object is created ( client-side using > > > > something like ajax + jquery ). > > > > > Hope this helps. > > > > > -Ashwin > > > > Hi Ashwin. I''m finding that the uniqueness constraint doesn''t add > > > much additional processing time. > > > It''s a bogus constraint, however, because... > > > > You''re right that the validations are quite standard. However, you > > > can''t rely on input data; it must be validated server-side. > > > it checks uniqueness in the wrong place. validates_uniqueness_of does > > not preclude an interleaving of operations of operations from two > > processes that looks like this > > > A B > > create(:keyword => ''foo'') > > validate uniqueness > > create(:keyword => ''foo'') > > validate uniqueness > > save > > save > > > To be safe, you must have a constraint in the database. If you have > > that, the additional app-level validation just wastes performance. > > > add_index :keywords, :keyword, :unique => true > > > Michael > > > -- > > Michael Schuerig > > mailto:mich...-q5aiKMLteq5BV9CJdY2HSA@public.gmane.org://www.schuerig.de/michael/ > > Are you sure about that?:http://pastie.org/509569
Quoting Nick Hoffman <nick-qGbiljoI0DQkmLvzuZlaBw@public.gmane.org>:> > On Jun 12, 4:54 am, Michael Schuerig <mich...-q5aiKMLteq4b1SvskN2V4Q@public.gmane.org> wrote: > > On Friday 12 June 2009, Nick Hoffman wrote: > > > > > On Thu, Jun 11, 2009 at 11:24 PM, Ashwin > > > > > Mangale<ashwin.mang...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > Try commenting out the uniqueness constraint and check the result. > > > > To be fair, the validations listed are fairly standard and can be > > > > done before the keyword object is created ( client-side using > > > > something like ajax + jquery ). > > > > > > Hope this helps. > > > > > > -Ashwin > > > > > Hi Ashwin. I''m finding that the uniqueness constraint doesn''t add > > > much additional processing time. > > > > It''s a bogus constraint, however, because... > > > > > You''re right that the validations are quite standard. However, you > > > can''t rely on input data; it must be validated server-side. > > > > it checks uniqueness in the wrong place. validates_uniqueness_of does > > not preclude an interleaving of operations of operations from two > > processes that looks like this > > > > A B > > create(:keyword => ''foo'') > > validate uniqueness > > create(:keyword => ''foo'') > > validate uniqueness > > save > > save > > > > To be safe, you must have a constraint in the database. If you have > > that, the additional app-level validation just wastes performance. > > > > add_index :keywords, :keyword, :unique => true > > > > Michael > > > > -- > > Michael Schuerig > > mailto:mich...-q5aiKMLteq5BV9CJdY2HSA@public.gmane.org://www.schuerig.de/michael/ > > Are you sure about that?: > http://pastie.org/509569Your example is two saves in the same thread/process. The counter-example REQUIRES two concurrent processes/threads. And yes his counter-example is correct. Unless you can guarantee that your application will NEVER run multi-process/multi-threaded, e.g., Webrick. If your application will ever be run by multiple people at once, it will almost certainly require this. Such is the price of success. Also pushing the uniqueness validation into the database will cut your run time. The uniqueness validation is implemented by doing a lookup on the unique field before doing the insert. Letting the DB do it cuts the number of database calls in half. Doing the inserts in batches will also increase the speed significantly. Yeah, the code isn''t as pretty but if it is already clear you need the speed, do it. It is doubtful that it will be the ugliest bit of code in your app. HTH, Jeffrey
Try to use bulk insert for save all of your instances. http://www.igvita.com/2007/07/11/efficient-updates-data-import-in-rails/ On Jun 12, 6:14 am, Nick <n...-qGbiljoI0DQkmLvzuZlaBw@public.gmane.org> wrote:> Hi guys. On my laptop, I''m finding that it''s taking about 5 minutes to > save 25,000 model instances. > > The model has 5 validations, and I''m saving all of the model instances > in a single transaction:http://pastie.org/509281 > > 25,000 model instances is a lot, but 5 minutes seems too long. Any > suggestions for how to speed this up? > > Thanks, > Nick
> Try to use bulk insert for save all of your instances. > http://www.igvita.com/2007/07/11/efficient-updates-data-import-in-rails/ > > On Jun 12, 6:14 am, Nick <n...-qGbiljoI0DQkmLvzuZlaBw@public.gmane.org> wrote: >> Hi guys. On my laptop, I''m finding that it''s taking about 5 minutes >> to >> save 25,000 model instances. >> >> The model has 5 validations, and I''m saving all of the model >> instances >> in a single transaction:http://pastie.org/509281 >> >> 25,000 model instances is a lot, but 5 minutes seems too long. Any >> suggestions for how to speed this up?You might also look at your database to see if it can handle a transaction that large. Depending on how it does it may need to store quite a lot in memory...