May be I am missing something obvious, but I do not understand why the method find_in_batches cannot take the option[:order]. I have created a ticket http://is.gd/lUwX with a patch to remove the error raised accordingly. Thanks. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---
Ryan Angilly
2009-Mar-05 13:39 UTC
Re: ActiverRecord find_in_batches to take option[:order]
If, for example, you order by ''Name'', and then while batch processing you insert a new record with the name "Angilly", it is possible (probable actually) that your batch processing will miss records. ID''s are not very likely to change, and the auto-incrementing means it''s much more difficult to miss records. This is also the reason that it requires integer ID''s (and not UUIDs). Cheers, Ryan On Thu, Mar 5, 2009 at 8:27 AM, blj <braghavan@gmail.com> wrote:> > May be I am missing something obvious, but I do not understand why the > method find_in_batches cannot take the option[:order]. > > I have created a ticket http://is.gd/lUwX with a patch to remove the > error raised accordingly. > > Thanks. > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---
Brian Durand
2009-Mar-05 22:21 UTC
Re: ActiverRecord find_in_batches to take option[:order]
I created a patch to allow :order clauses on this topic http://groups.google.com/group/rubyonrails-core/browse_thread/thread/66df9e4241fdc0a7 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---
Jack Christensen
2009-Mar-05 22:28 UTC
Re: ActiverRecord find_in_batches to take option[:order]
Ryan Angilly wrote:> If, for example, you order by ''Name'', and then while batch processing > you insert a new record with the name "Angilly", it is possible > (probable actually) that your batch processing will miss records. > ID''s are not very likely to change, and the auto-incrementing means > it''s much more difficult to miss records. This is also the reason > that it requires integer ID''s (and not UUIDs).I haven''t looked closely at the code, but wouldn''t it be possible to just select all the ID''s in one query with whatever order criteria were desired, then use those ID''s to select the full records in batches. It would be something like how eager loading does multiple selects in some cases. Jack> > Cheers, > Ryan > > On Thu, Mar 5, 2009 at 8:27 AM, blj <braghavan@gmail.com > <mailto:braghavan@gmail.com>> wrote: > > > May be I am missing something obvious, but I do not understand why the > method find_in_batches cannot take the option[:order]. > > I have created a ticket http://is.gd/lUwX with a patch to remove the > error raised accordingly. > > Thanks. > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---
Ryan Angilly
2009-Mar-05 22:36 UTC
Re: ActiverRecord find_in_batches to take option[:order]
Yeah that''s possible. But then you could run into the same memory bloat problem. Just a million records with UUID pk''s gives you over 35 megabytes of data to store in memory (not including any of the overhead due to Ruby''s Array and String classes). Maybe that''s enough of an edge case where it''s worth ignoring for now, but then I still come back to the original question: can you think of a case where you really need to order it? On Thu, Mar 5, 2009 at 5:28 PM, Jack Christensen <jack@jncsoftware.com>wrote:> Ryan Angilly wrote: > > If, for example, you order by ''Name'', and then while batch processing you > insert a new record with the name "Angilly", it is possible (probable > actually) that your batch processing will miss records. ID''s are not very > likely to change, and the auto-incrementing means it''s much more difficult > to miss records. This is also the reason that it requires integer ID''s (and > not UUIDs). > > I haven''t looked closely at the code, but wouldn''t it be possible to just > select all the ID''s in one query with whatever order criteria were desired, > then use those ID''s to select the full records in batches. It would be > something like how eager loading does multiple selects in some cases. > > Jack > > > Cheers, > Ryan > > On Thu, Mar 5, 2009 at 8:27 AM, blj <braghavan@gmail.com> wrote: > >> >> May be I am missing something obvious, but I do not understand why the >> method find_in_batches cannot take the option[:order]. >> >> I have created a ticket http://is.gd/lUwX with a patch to remove the >> error raised accordingly. >> >> Thanks. >> >> > > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---
Brian Durand
2009-Mar-05 22:59 UTC
Re: ActiverRecord find_in_batches to take option[:order]
If you need to process a million records, you may want to process them in a specific order like most recently update first since those may be the most important and it may take a while to finish the batch. Processing them in primary key order may well mean that the most important records are processed last. The patch I submitted does just what Jack suggests. While you will end up with 35Mb for a million records, it will be a constant memory size and will not bloat as the batch is processed if you load associations. While 35Mb may be too much for an environment with limited resources, you could break it up using :limit and :offset into smaller batches like any other ActiveRecord find. On Mar 5, 4:36 pm, Ryan Angilly <angi...@gmail.com> wrote:> Yeah that''s possible. But then you could run into the same memory bloat > problem. Just a million records with UUID pk''s gives you over 35 megabytes > of data to store in memory (not including any of the overhead due to Ruby''s > Array and String classes). Maybe that''s enough of an edge case where it''s > worth ignoring for now, but then I still come back to the original question: > can you think of a case where you really need to order it? > > On Thu, Mar 5, 2009 at 5:28 PM, Jack Christensen <j...@jncsoftware.com>wrote: > > > Ryan Angilly wrote: > > > If, for example, you order by ''Name'', and then while batch processing you > > insert a new record with the name "Angilly", it is possible (probable > > actually) that your batch processing will miss records. ID''s are not very > > likely to change, and the auto-incrementing means it''s much more difficult > > to miss records. This is also the reason that it requires integer ID''s (and > > not UUIDs). > > > I haven''t looked closely at the code, but wouldn''t it be possible to just > > select all the ID''s in one query with whatever order criteria were desired, > > then use those ID''s to select the full records in batches. It would be > > something like how eager loading does multiple selects in some cases. > > > Jack > > > Cheers, > > Ryan > > > On Thu, Mar 5, 2009 at 8:27 AM, blj <bragha...@gmail.com> wrote: > > >> May be I am missing something obvious, but I do not understand why the > >> method find_in_batches cannot take the option[:order]. > > >> I have created a tickethttp://is.gd/lUwXwith a patch to remove the > >> error raised accordingly. > > >> Thanks.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---
Mislav Marohnić
2009-Mar-06 01:20 UTC
Re: ActiverRecord find_in_batches to take option[:order]
On Thu, Mar 5, 2009 at 14:27, blj <braghavan@gmail.com> wrote:> > May be I am missing something obvious, but I do not understand why the > method find_in_batches cannot take the option[:order].Ordering is forced to the "id" column because: 1. mysql optimizes iterating over huge datasets when you use the "WHERE id > ?" clause much better than with LIMIT and OFFSET; 2. you don''t need ordering -- every single record gets yielded. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---
Brian Durand
2009-Mar-06 15:21 UTC
Re: ActiverRecord find_in_batches to take option[:order]
I don''t think the implementation core functionality should be based on on how MySql handles something. :limit, :offset, and :order are only options parameters that I''m exposing in case the developer needs them. If you want to iterate over the entire data set you can certainly do so and these options will not be sent to the database. In fact, in some circumstances, the query will end up more efficient. If you construct a horrendously complex query that the optimizer doesn''t come up with a good plan for, the code in my patch will execute the horrible query once and afterward do a simple "WHERE id IN (?, ?, ...)" query for every batch. Using "id> ?" will execute the same query for every batch.On Mar 5, 7:20 pm, Mislav Marohnić <mislav.maroh...@gmail.com> wrote:> Ordering is forced to the "id" column because: > > 1. mysql optimizes iterating over huge datasets when you use the "WHERE > id > ?" clause much better than with LIMIT and OFFSET; > 2. you don''t need ordering -- every single record gets yielded.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---