Is there any scalable alternative to iterating all the records of a certain model? #find_all seems to load everything into memory. With 500.000 records it will be a swap storm. Pedro.
Pedro C?rte-Real wrote:> Is there any scalable alternative to iterating all the records of a > certain model? #find_all seems to load everything into memory. With > 500.000 records it will be a swap storm. > > Pedro. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/railsIt is hard to answer this question without knowing what exactly you want to do with the results. Matthew Margolis blog.mattmargolis.net
On 7/5/06, Matthew Margolis <mrmargolis@wisc.edu> wrote:> It is hard to answer this question without knowing what exactly you > want to do with the results.I''m trying to fix acts_as_ferret#rebuild_index to not consume a bunch of memory when rebuilding a large index. The code is very simple, it just iterates all the model objects adding them to the ferret index. They''re only used once and can then be forgotten. Before I used acts_as_ferret I had my own indexing solution. What I did in my rebuild_index was to fetch 1000 records, index them, and then fetch the next 1000. But this is probably wrong if we have concurrent modifications. Pedro.
Pedro C?rte-Real wrote:> The code is very simple, it > just iterates all the model objects adding them to the ferret index. > They''re only used once and can then be forgotten.It sounds as though you should not be creating model objects in the first place. This is an unnecessary overhead. I am not familiar with low level SQL access from Ruby but I''d start with the ActiveRecord source code the find_by_sql method. Look for the code that loads up record objects and make a copy, adding whatever you need to load your index instead. Julian -- Posted via http://www.ruby-forum.com/.
On 7/5/06, Julian Gall <julian.gall@gmail.com> wrote:> It sounds as though you should not be creating model objects in the > first place. This is an unnecessary overhead. I am not familiar with low > level SQL access from Ruby but I''d start with the ActiveRecord source > code the find_by_sql method. Look for the code that loads up record > objects and make a copy, adding whatever you need to load your index > instead.The index fields are taken from the model and aren''t necessarily database fields, some might be model methods that calculate something. When doing the initial data import the indexing works fine by indexing each record as it is sent to the database. Reindexing is slow because of all the wasted memory of #find_all. Creating model objects is a small overhead and it''s required to respect DRY. Pedro.
On 7/5/06, Pedro C?rte-Real <pedrocr@gmail.com> wrote:> On 7/5/06, Matthew Margolis <mrmargolis@wisc.edu> wrote: > > It is hard to answer this question without knowing what exactly you > > want to do with the results. > > I''m trying to fix acts_as_ferret#rebuild_index to not consume a bunch > of memory when rebuilding a large index. The code is very simple, it > just iterates all the model objects adding them to the ferret index. > They''re only used once and can then be forgotten. > > Before I used acts_as_ferret I had my own indexing solution. What I > did in my rebuild_index was to fetch 1000 records, index them, and > then fetch the next 1000. But this is probably wrong if we have > concurrent modifications. >You probably want select_all. ActiveRecord::Base.connection.select_all "select a, b, c, d from blah where x = 1" http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html#M000622
On 7/5/06, Wilson Bilkovich <wilsonb@gmail.com> wrote:> You probably want select_all. > ActiveRecord::Base.connection.select_all "select a, b, c, d from blah > where x = 1"I index based on the actual model objects and not the direct content in the database, so this won''t help me directly. I guess I could use it to fetch the list of id''s in a single query and then fetch the model objects for it in blocks. Pedro.
On 7/5/06, Pedro C?rte-Real <pedrocr@gmail.com> wrote:> On 7/5/06, Wilson Bilkovich <wilsonb@gmail.com> wrote: > > You probably want select_all. > > ActiveRecord::Base.connection.select_all "select a, b, c, d from blah > > where x = 1" > > I index based on the actual model objects and not the direct content > in the database, so this won''t help me directly. I guess I could use > it to fetch the list of id''s in a single query and then fetch the > model objects for it in blocks.You could always use :limit and :offset in your find :all query, and manually paginate through your active record objects, e.g: .find :all, :limit => 500, :offset => 0 # first 500 .find :all, :limit => 500, :offset => 500 # second 500 Tom
On 7/5/06, Tom Ward <tom@popdog.net> wrote:> > I index based on the actual model objects and not the direct content > > in the database, so this won''t help me directly. I guess I could use > > it to fetch the list of id''s in a single query and then fetch the > > model objects for it in blocks. > > You could always use :limit and :offset in your find :all query, and > manually paginate through your active record objects, e.g: > > .find :all, :limit => 500, :offset => 0 # first 500 > .find :all, :limit => 500, :offset => 500 # second 500Yes, and that''s what I did before but is this run inside a transaction or will this break if concurrent modifications are hapenning? Pedro.
On 7/5/06, Pedro C?rte-Real <pedrocr@gmail.com> wrote:> On 7/5/06, Tom Ward <tom@popdog.net> wrote: > > > I index based on the actual model objects and not the direct content > > > in the database, so this won''t help me directly. I guess I could use > > > it to fetch the list of id''s in a single query and then fetch the > > > model objects for it in blocks. > > > > You could always use :limit and :offset in your find :all query, and > > manually paginate through your active record objects, e.g: > > > > .find :all, :limit => 500, :offset => 0 # first 500 > > .find :all, :limit => 500, :offset => 500 # second 500 > > Yes, and that''s what I did before but is this run inside a transaction > or will this break if concurrent modifications are hapenning? >You could do it this way: SomeModel.transaction do batch_size = 500 0.step(SomeModel.count, batch_size) do |i| batch = self.find :all, :limit => batch_size, :offset => i # do something with batch here. end end This will wrap the whole thing in a transaction, and then fetch the records 500 at a time. Actually, that should probably be SomeModel.count-1, but it shouldn''t matter.
On 7/5/06, Wilson Bilkovich <wilsonb@gmail.com> wrote:> You could do it this way: > SomeModel.transaction do > batch_size = 500 > 0.step(SomeModel.count, batch_size) do |i| > batch = self.find :all, :limit => batch_size, :offset => i > # do something with batch here. > end > end > > This will wrap the whole thing in a transaction, and then fetch the > records 500 at a time. > Actually, that should probably be SomeModel.count-1, but it shouldn''t matter.Yep, this is basically what I had before but with the transaction protecting it. Didn''t know about this method. Thanks. Pedro.