any rails developers out there know why the select_* and find_* methods are not iterator methods? i''d have assumed definitions like def select_all(sql, name = nil, &block) select(sql, name, &block) end ... def find_by_sql(sql, &block) query = sanitize_sql sql if block connection.select_all(query, "#{name} Load") do |record| yield(instantiate(record)) end else connection.select_all(query, "#{name} Load").inject([]) do |objects, record| objects << instantiate(record) end end end ... i realize you would most often need the array of results for display, but consdier these scenarios: def custom_find_by sql = query_which_will_result_in_fantastically_large_result_set_for_which_no_sql_can_limit objects = [] find_by_sql(sql) do |record| objects << record if some_cond(record) # limit query further end objects end imagine that objects will, ultimately contain only 10 records. can this currenlty by done? the other reason i''m wondering about this is that, as i said, it makes good sense that one generally would want to get a list of all records back (for display) but what if you were sharing your ActiveRecord class with console apps, utilizing the before_validation_on_update() method to implement a trigger required to maintain data integrity because you might be using a silly db that does not support triggers and the normal ri facilities are not adequate for your needs. you write this into your web app and all updates do not screw your data itegrity. you''d like to leverage this code in console apps (using your custom ActiveRecord classes) to perform bulk updates late in the evening. you already wrote the logic in your web app and would like to use it over (DRY). the problem is your bulk updates operate on subsets of your data that contain 10,000,000, or more, records. the lack of iterator methods are raising ENOMEM every night a 2am. am i way off base? is there another way to leverage the rails db abstraction as iterator for huge result sets? is using before_* after_* methods considered ''good form'' when one must maintain data integrity the requires smarts beyond the ri facilities (or lack there of) provided by your db? kind regards. -a -- ==============================================================================| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki ===============================================================================
Ara.T.Howard wrote:> > any rails developers out there know why the select_* and find_* methods are > not iterator methods?There is no real reason. I''m sure a patch would be welcome.
On Fri, 25 Feb 2005, Andreas Schwarz wrote:> Ara.T.Howard wrote: >> >> any rails developers out there know why the select_* and find_* methods >> are >> not iterator methods? > > There is no real reason. I''m sure a patch would be welcome.i realize this would entail adapting all the db plugins as well - but surely all of them support this (i know sqlite, postgresql, and mysql do) too. i''ll look into it some more... -a -- ==============================================================================| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki ===============================================================================
> i realize you would most often need the array of results for display, but > consdier these scenarios: > > def custom_find_by > > sql = query_which_will_result_in_fantastically_large_result_set_for_which_no_sql_can_limit > > objects = [] > > find_by_sql(sql) do |record| > objects << record if some_cond(record) # limit query further > end > > objects > endWhy wouldn''t you just do find_by_sql(sql).filter{|r| some_cond(r)} ? Or do you mean you want to retrieve one record at a time and filter them as you go? Wouldn''t that be rather expensive in terms of queries? Although I suppose you could buffer by fetching, say, 20 at a time. Hadley
On Fri, 25 Feb 2005, hadley wickham wrote:> Why wouldn''t you just do find_by_sql(sql).filter{|r| some_cond(r)} ?this starts by allocating the entire result set - what if this contains 1,000,000 records and you have 128MB of ram. what if 20 of these request come into your webserver at once? it''s an extremely common ruby idiom to provide an iterator method when the returned value could potentiall be a memory killer. all the database apis sqlite uses already provide this ability - it''s simply not currently exposed.> Or do you mean you want to retrieve one record at a time and filter them as > you go?no. here is an example using sqlite: this technique uses sqlite''s ability to pass a block with which to handle each tuple as it is returned from the db. jib:~ > cat a.rb #!/usr/bin/env ruby require ''sqlite'' db = SQLite::Database::new(ARGV.shift, 0) db.use_array = true db.execute(''select * from jobs''){|tuple| p tuple} jib:~ > time a.rb db > /dev/null real 0m1.510s user 0m1.420s sys 0m0.080s this technique returns all tuples at once (the way rails does) and then iterates over this set. jib:~ > cat b.rb #!/usr/bin/env ruby require ''sqlite'' db = SQLite::Database::new(ARGV.shift, 0) db.use_array = true tuples = db.execute(''select * from jobs'') tuples.each{|tuple| p tuple} jib:~ > time b.rb db > /dev/null real 0m2.506s user 0m2.370s sys 0m0.140s note that the block method is nearly twice as fast and, much, much more importantly only allocated memory for ONE tuple at a time. if your query returns ALL tuples at a time imagine what will happen to your webserver if you run a query that returns 50,000 tuples and you get 10 request for this page at once... now, it''s REQUIRED by rails that, at some point, you have a set of tuples to render via eruby - but by no means will this alway be the exact set of tuples returned by your sql (though for obvious reasons you should try to keep your sql and (de)normalization to allow this). for example, say you are using a db with lame regex facilities and want to do some pattern matching on your result set to further filter them: something like matching = [] tuples = recipes.find_by_sql sql tuples.each do |tuple| matching << tuple if tuple.description =~ /saucy/ end now, further imagine that the set of tuples will be 50,000,000 big. your code will explode with an Errno::ENOMEM exception being thrown. if rails exposed the (already existing) iterator methods of all the database apis one could matching = [] recipes.find_by_sql(sql) do |tuple| matching << tuple if tuple.description =~ /saucy/ end and ONLY those tuples matching ''saucy'' would be allocated. (of course this is not strictly true since ruby is garbage allocated - it''s more appropriate to say that all but those tuples not saved would be candiate for subsequent gc runs which, of course, get triggered if malloc ever fails)> Wouldn''t that be rather expensive in terms of queries? Although I suppose > you could buffer by fetching, say, 20 at a time.again. the whole reason to do it is that is more efficient both in space and time. play with the builtin ''find'' module of ruby using the following two techniques in a directory hierarchy with several million files and you''ll see what i mean. compare require ''find'' entries = Find::find(''.'') entries.each{|entry| p entry} with require ''find'' Find::find(''.''){|entry| p entry} have top running beforehand for the first one because it''ll grind your terminal to a halt pretty quickly unless you have tons of RAM. kind regards. -a -- ==============================================================================| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki ===============================================================================
> no. here is an example using sqlite: > > this technique uses sqlite''s ability to pass a block with which to handle each > tuple as it is returned from the db. > > jib:~ > cat a.rb > #!/usr/bin/env ruby > require ''sqlite'' > db = SQLite::Database::new(ARGV.shift, 0) > db.use_array = true > db.execute(''select * from jobs''){|tuple| p tuple} > > jib:~ > time a.rb db > /dev/null > > real 0m1.510s > user 0m1.420s > sys 0m0.080sAaaaah. Now I see - I was actually doing something similar (dumping 30,000 records into a CSV file) a couple of days ago and thought there must be a better way than loading them all into memory at once (and this obviously that better way). Thanks for the detailed explanation and please do look into adding this functionality to rails! Hadley
On Fri, 25 Feb 2005, hadley wickham wrote:>> no. here is an example using sqlite: >> >> this technique uses sqlite''s ability to pass a block with which to handle each >> tuple as it is returned from the db. >> >> jib:~ > cat a.rb >> #!/usr/bin/env ruby >> require ''sqlite'' >> db = SQLite::Database::new(ARGV.shift, 0) >> db.use_array = true >> db.execute(''select * from jobs''){|tuple| p tuple} >> >> jib:~ > time a.rb db > /dev/null >> >> real 0m1.510s >> user 0m1.420s >> sys 0m0.080s > > Aaaaah. Now I see - I was actually doing something similar (dumping > 30,000 records into a CSV file) a couple of days ago and thought there > must be a better way than loading them all into memory at once (and > this obviously that better way).the csv module that comes with ruby supports iterator type usage both for reading and writing - i was just using it today ;-)> Thanks for the detailed explanation and please do look into adding this > functionality to rails!it would be very, very easy to do - only bariers would be time and ability to test things like DB2, which i do not have access to. cheers. -a -- ==============================================================================| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki ===============================================================================