Tony McMahon
2006-May-10 18:51 UTC
[Rails] What to do with HUGE instance variables in Rails?
I''m learning rails and I can succesfully use the following things in the controller: @var1 = Var.find :all @var2 = Var2.find :all Problem is that the DB has about 260,000 lines which considerably slows everything down if I load everything in @var1. Isn''t there a way to load those items progressively? I treat them separately (e.g. no interactions between them) in the program so it should be possible. Thanks guys. -T -- Posted via http://www.ruby-forum.com/.
Kevin Olbrich
2006-May-10 18:55 UTC
[Rails] What to do with HUGE instance variables in Rails?
On Wednesday, May 10, 2006, at 8:51 PM, Tony McMahon wrote:>I''m learning rails and I can succesfully use the following things in the >controller: > >@var1 = Var.find :all >@var2 = Var2.find :all > >Problem is that the DB has about 260,000 lines which considerably slows >everything down if I load everything in @var1. > >Isn''t there a way to load those items progressively? I treat them >separately (e.g. no interactions between them) in the program so it >should be possible. > >Thanks guys. > >-T > >-- >Posted via http://www.ruby-forum.com/. >_______________________________________________ >Rails mailing list >Rails@lists.rubyonrails.org >http://lists.rubyonrails.org/mailman/listinfo/railsWhy would you need to load all of your records at once? Usually you can use the database to select just the relevant subset to operate on. _Kevin -- Posted with http://DevLists.com. Sign up and save your mailbox.
Aye, it''s call pagination, yo. :) On 10 May 2006 18:55:03 -0000, Kevin Olbrich < devlists-rubyonrails@devlists.com> wrote:> > > On Wednesday, May 10, 2006, at 8:51 PM, Tony McMahon wrote: > >I''m learning rails and I can succesfully use the following things in the > >controller: > > > >@var1 = Var.find :all > >@var2 = Var2.find :all > > > >Problem is that the DB has about 260,000 lines which considerably slows > >everything down if I load everything in @var1. > > > >Isn''t there a way to load those items progressively? I treat them > >separately (e.g. no interactions between them) in the program so it > >should be possible. > > > >Thanks guys. > > > >-T > > > >-- > >Posted via http://www.ruby-forum.com/. > >_______________________________________________ > >Rails mailing list > >Rails@lists.rubyonrails.org > >http://lists.rubyonrails.org/mailman/listinfo/rails > > Why would you need to load all of your records at once? > Usually you can use the database to select just the relevant subset to > operate on. > > _Kevin > > -- > Posted with http://DevLists.com. Sign up and save your mailbox. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060510/2a867bc9/attachment.html
Tony McMahon
2006-May-10 19:43 UTC
[Rails] Re: What to do with HUGE instance variables in Rails?
> Why would you need to load all of your records at once? > Usually you can use the database to select just the relevant subset to > operate on.Because my program is some sort of an iterative loop that builds small files for each of the items in @ var. So I''m loading everything in @var and then do a for... each in the program. It works perfectly with a small number of lines but not when I load the huge production DB. Maybe I should do something else but I''m just learning Rails and most of the examples load everything in a @var... -T -- Posted via http://www.ruby-forum.com/.
Do it once and cache the output?! Or do the thing you do _before_ you put the data in the database, and the next time, just show the result. If you really need to modify 260k items for every view.. Well.. Except for using in-db functions or maybe a custom db system (and even this might be slow), I don''t see a way to this in a performant way. -- Posted via http://www.ruby-forum.com/.
Lionel Bouton
2006-May-10 22:49 UTC
[Rails] Re: What to do with HUGE instance variables in Rails?
Colin wrote the following on 10.05.2006 23:06 :> Do it once and cache the output?! Or do the thing you do _before_ you > put the data in the database, and the next time, just show the result. > > If you really need to modify 260k items for every view.. Well.. Except > for using in-db functions or maybe a custom db system (and even this > might be slow), I don''t see a way to this in a performant way. > >You could use something like the following (I had to in another context where I deleted huge numbers of rows from a table, PostgreSQL simply barfed when being sent a DELETE FROM table WHERE id in <huge_list>). Var.transaction { count = Var.count batchsize = 1000 # or whatever suits your case offset = 0 while offset < count vars = Var.find :all, :offset => offset, :limit => batchsize <do whatever you need to do with vars> offset += batchsize end } Note that if entries can be added/deleted while you execute this, you''ll most probably need the transaction (if you can''t afford to process the same entry twice and/or miss entries)... Lionel.
Stefan Kaes
2006-May-11 04:50 UTC
[Rails] What to do with HUGE instance variables in Rails?
Tony McMahon wrote:> I''m learning rails and I can succesfully use the following things in the > controller: > > @var1 = Var.find :all > @var2 = Var2.find :all > > Problem is that the DB has about 260,000 lines which considerably slows > everything down if I load everything in @var1. > > Isn''t there a way to load those items progressively? I treat them > separately (e.g. no interactions between them) in the program so it > should be possible. > > Thanks guys. > > -T > >My first answer is: don''t do it. Second: Never ever. Third: are you serious? ;-) That said: if you need to process 260.000 objects, drop down into SQL, write a stored procedure and retrieve the results. This isn''t a Rails or Ruby problem. Your approach would be slow in any language. -- stefan -- Rails performance tuning: http://railsexpress.de/blog Subscription: http://railsexpress.de/blog/xml/rss20/feed.xml
Alex Young
2006-May-11 07:22 UTC
[Rails] What to do with HUGE instance variables in Rails?
Stefan Kaes wrote:> Tony McMahon wrote: > >> I''m learning rails and I can succesfully use the following things in the >> controller: >> >> @var1 = Var.find :all >> @var2 = Var2.find :all >> >> Problem is that the DB has about 260,000 lines which considerably slows >> everything down if I load everything in @var1. >> >> Isn''t there a way to load those items progressively? I treat them >> separately (e.g. no interactions between them) in the program so it >> should be possible. >> >> Thanks guys. >> >> -T >> >> > > My first answer is: don''t do it. > Second: Never ever. > Third: are you serious? > > ;-) > > That said: if you need to process 260.000 objects, drop down into SQL, > write a stored procedure and retrieve the results.I can think of a few cases where that''s just not going to work... particularly where the processing involves some sort of meaningful interaction with things outside the database.> This isn''t a Rails or Ruby problem. Your approach would be slow in any > language.It should be slow, but manageable - not impossible. Isn''t this sort of thing why we have cursors? How would this look as syntax: @var = Foo.find(:first) while @var: do_stuff_with @var @var = Foo.next(@var) end And as a generic solution, I think something like this would do it: class Foo < ActiveRecord::Base def Foo.next(foo) Foo.find(:first, :conditions => [''id > ?'', foo.id]) end end Untested, but that''s the general idea... -- Alex
Ross Dawson
2006-May-11 07:42 UTC
[Rails] What to do with HUGE instance variables in Rails?
> Stefan Kaes wrote: > > Tony McMahon wrote: > > > >> I''m learning rails and I can succesfully use the following > things in the > >> controller: > >> > >> @var1 = Var.find :all > >> @var2 = Var2.find :all > >> > >> Problem is that the DB has about 260,000 lines which > considerably slows > >> everything down if I load everything in @var1. > >> > >> Isn''t there a way to load those items progressively? I treat them > >> separately (e.g. no interactions between them) in the program so it > >> should be possible. > >> > >> Thanks guys. > >> > >> -T > >> > >> > > > > My first answer is: don''t do it. > > Second: Never ever. > > Third: are you serious?Well lets qualify Stefan''s comments a bit. A task processing 260K rows in a database is not a job to be done during a web request. If you were doing a mail merge for example then I''d put that off as a scheduled task. Which ever way you go this is going to take some considerable processing. You want a web page to respond in a couple of seconds not a few minutes. In a web page reading 260K rows into memory at once is bad (long time to process and consumes LOTs of ram). Reading 260K rows one at a time (find :first/ find :next) is also bad as it takes a long time to process. So my first answer is: don''t do it in Rails, but you could do it in Ruby still using ActiveRecord scheduled with cron. Second: Never do something like this in any Web framework. Even if you process 1000 rows/sec that''s 4 1/2 minutes, who''d wait that long for a web page? Third: do it as an offline or overnight batch job. Perhaps using a web page to trigger the start of processing and an email notifying it''s completion so you can download from the server. Ross
Lionel Bouton
2006-May-11 12:10 UTC
[Rails] What to do with HUGE instance variables in Rails?
Alex Young wrote the following on 11.05.2006 09:21 :> Stefan Kaes wrote: > >> Tony McMahon wrote: >> >>> I''m learning rails and I can succesfully use the following things in >>> the >>> controller: >>> >>> @var1 = Var.find :all >>> @var2 = Var2.find :all >>> >>> Problem is that the DB has about 260,000 lines which considerably slows >>> everything down if I load everything in @var1. >>> >>> Isn''t there a way to load those items progressively? I treat them >>> separately (e.g. no interactions between them) in the program so it >>> should be possible. >>> >>> Thanks guys. >>> >>> -T >>> >>> >> >> >> My first answer is: don''t do it. >> Second: Never ever. >> Third: are you serious? >> >> ;-) >> >> That said: if you need to process 260.000 objects, drop down into >> SQL, write a stored procedure and retrieve the results. > > I can think of a few cases where that''s just not going to work... > particularly where the processing involves some sort of meaningful > interaction with things outside the database. > >> This isn''t a Rails or Ruby problem. Your approach would be slow in >> any language. > > It should be slow, but manageable - not impossible. Isn''t this sort > of thing why we have cursors? > > How would this look as syntax: > > @var = Foo.find(:first) > while @var: > do_stuff_with @var > @var = Foo.next(@var) > end > > And as a generic solution, I think something like this would do it: > > class Foo < ActiveRecord::Base > def Foo.next(foo) > Foo.find(:first, :conditions => [''id > ?'', foo.id]) > end > end > > Untested, but that''s the general idea... >give the find ":order => id" and it should work. Note that while the interface is elegant from a performance point of view this is horrible. I wonder if we shouldn''t use ruby blocks to make it more clean and scalable. What about (reusing the code I proposed earlier): class Foo < ActiveRecord::Base def apply(conditions = nil) Foo.transaction { count = Foo.count batchsize = 1000 # or whatever suits your case offset = 0 while offset < count foos = Foo.find :all, :conditions => conditions, :offset => offset, :limit => batchsize for foo in foos yield foo end offset += batchsize end } end end Then all you have to do is Foo.apply {|foo| <do whatever you want to do with foo>} # process each row Foo.apply(''attr = value'') {|foo| <do whatever you want to do with foo>} # process a subset You can then modify the apply method if you find a better way to get your Foo instances (if ActiveRecord implements a nice Cursor interface in the future for example). You won''t need to modify the Foo.apply calls. Lionel.
Alex Young
2006-May-11 12:40 UTC
[Rails] What to do with HUGE instance variables in Rails?
Lionel Bouton wrote:> give the find ":order => id" and it should work.Oh yes - best have that for safety. Although I *think* I''m right in saying that under AR''s semantics, id''s are guaranteed to be monotonic increasing, so the :first with the condition should be enough. Belt and braces can''t hurt, though.> Note that while the interface is elegant from a performance point of > view this is horrible.Well, yes - it''s a time/space trade-off. I like your batching, though - quite neat, and gives more control over the trade-off ratio. -- Alex
Nicolas Buet
2006-May-11 12:59 UTC
[Rails] What to do with HUGE instance variables in Rails?
Hi Tony, Could you describe your expectations a little more? For example: 1) Do you display this data? 2) Why does it need to be fast? 3) Is it best to spend overall more time (ex: loading and processing data 1 by 1) but with less memory use, than overall less time (load all), but with no "progress status" and huge memeory use? 4) .... Maybe that could help us finding a trade-off solution ;-) Regards, Nicolas On 5/10/06, Tony McMahon <example@example.com> wrote:> > I''m learning rails and I can succesfully use the following things in the > controller: > > @var1 = Var.find :all > @var2 = Var2.find :all > > Problem is that the DB has about 260,000 lines which considerably slows > everything down if I load everything in @var1. > > Isn''t there a way to load those items progressively? I treat them > separately (e.g. no interactions between them) in the program so it > should be possible. > > Thanks guys. > > -T > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060511/d685709e/attachment.html
Jeremy Evans
2006-May-11 15:53 UTC
[Rails] What to do with HUGE instance variables in Rails?
On 5/11/06, Alex Young <alex@blackkettle.org> wrote:> Lionel Bouton wrote: > > give the find ":order => id" and it should work. > Oh yes - best have that for safety. Although I *think* I''m right in > saying that under AR''s semantics, id''s are guaranteed to be monotonic > increasing, so the :first with the condition should be enough. Belt and > braces can''t hurt, though.ids are monotonic increasing, but if you select without ordering, you can''t be sure to retreive the next id, only one with an id greater than the current record. The :first condition is not enough, as the ordering of rows without a specific ORDER BY clause is arbitrary (and so the first element returned would not necessarily be the next element). The only situation I can think of where this is likely to work without :order is if you are only using SELECT and INSERT, and not UPDATE or DELETE, and even then it is probably database dependent. Jeremy