-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have been working on a temporary tables plugin/extension for ActiveRecord since early January 2006 as I have needed to support hundreds of thousands of rows being inserted at a time every few hours. I mentioned it on the list and got a few private emails which has led me to believe other people have this problem. As time has progressed I have extended AR to support faster and more operations. I am a MySQL user so these only apply to MySQL. I know that DHH has made public comments about trying to stay implementation agnostic when it comes to the different databases that AR supports, so perhaps my plugin is best as a plugin only. Thoughts? I want to ensure that the API interface is easy and friendly. I don''t want to change interfaces for each update, i know how much of a pain in the arse that can be. I''d like to do the following things in this email: - describe issues I have with AR - describe the API I have used to solve them - ask for feedback Here are the issues I''ve had with AR, the ones I''ve set out to fix. 1 - to slow for large insertions of data, due to creating too many objects in ruby 2 - no easy support for multiple value insertions via one statemente. This has two drawbacks: a. more then needed data is sending over the mysql client/server connection b. mysql has to go through sending query and parsing query for each insert statement. This is to much overhead for large insertions of data 3 - no in memory temporary table support 4 - no fast ability copy data from one table to another 5 - no ability to enable/disable foreign keys 6 - no ability to drop primary keys (yes there are times when this is needed) Here is the API I have implemented to solve these. # Create a temporary in memory table off from an existing model temp_model = MyTable.create_temporary_table # temp_model can also be referenced as TempMyTable # Create a temporary in memory table off from an existing model and specify it''s name temp_model = MyTable.create_temporary_table :name=>''MyTemporaryTable'' # temp_model can also be referenced as MyTemporaryTable # drop temporary models temp_model.drop MyTemporaryTable.drop # redefine columns no the temporary model temp_model.redefine_column :id=> ''id int default 0'', :field1=>''new_field varchar(255)'' # drop primary keys on primary tables temp_model.drop_primary_key # enable/disable foreign key checks on ActiveRecord::Base ActiveRecord::Base.foreign_key_checks :disable ActiveRecord::Base.foreign_key_checks :enable # support streamlined insert operations, using subarrays # which map the a column identifying array columns = [ :id, :column1, :column2 ], values = [ [1,''value''], [2, ''value''], [3,''value''] ] MyModel.insert columns, values # copy data from one table into another options = { :on_duplicate_key_update=>{ :column1 => :column2 } MyModel.insert_into MyTemporaryTable, options This increases performance 1000x when working with large datasets. Now granted, you aren''t going to use this for someone browsing your web site who fill out a webform and click submit, it isn''t intended for that audience. It is intended for people who are using (or want to use) RoR for large internet or even intranet driven corporate sites, which have to be able to handle mass updates of data several times a day. I currently use RoR on a few small web sites, and some internal webapps, but I spend 90% of my time with RoR with AR optimizing it''s speed. Any thoughts, comments or suggestions would be greatly appreciated. All of this was done with ActiveRecord 1.13.2, so some of this may have been addressed in AR 1.14.2, if so please let me down gently. ;) Thanks, Zach -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFERWyAMyx0fW1d8G0RAvLpAJ41vFNDUvDlVH6QpJuZ9hby9ACQ/QCfcexY SG9z0x3TK+iBhpNBlYZof+s=dxFr -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 zdennis wrote:> > I currently use RoR on a few small web sites, and some internal webapps, but I spend 90% of my time with RoR with AR optimizing > it''s speed. >I forgot to add, the 90% of my time with AR is because of a huge intranet driven corporate system, which includes web sites, java based thick clients and a kick ass backend using ruby, rails, ruby-libxml bindings. The company is Fortune 50... so you can imagine the # of people we are supporting and why speed and efficiency is a concern for me. Zach -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFERXCNMyx0fW1d8G0RAun0AJ459OFHyHMqIV6ndhEgD4udF+oTugCeORhR vmzdzseMiB6mJpI2/Fn4ais=KvFX -----END PGP SIGNATURE-----
> > I currently use RoR on a few small web sites, and some internal webapps, but I spend 90% of my time with RoR with AR optimizing > > it''s speed.Are you only noticing this speed problem with big batch operations like you mentioned in the earlier email? When working with an ORM, it''s fairly common to do heavy data imports, batch loads and the like using the database''s built in tools, what makes you want to use AR instead of tools like mysqldump etc? -- Cheers Koz
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Koziarski wrote:>>>I currently use RoR on a few small web sites, and some internal webapps, but I spend 90% of my time with RoR with AR optimizing >>>it''s speed. > > > Are you only noticing this speed problem with big batch operations > like you mentioned in the earlier email?The bigger the batch, the more time it takes, so it''s easier to notice big speed issues with 500,000 records, rather then 50 records. Alot of what rails (and AR) give the user in terms of flexibility and ease of use comes with a huge performance penalty. I am willing to sacrifice niceties for better performance.> When working with an ORM, it''s fairly common to do heavy data imports, > batch loads and the like using the database''s built in tools, what > makes you want to use AR instead of tools like mysqldump etc? >There is more to my story though, but I didn''t want to rant on to long in my last email. Myself and a colleague of mine, Mark Van Holstyn, came up with a idea for an IO Mode plugin for AR. It works beautifully. You can turn it off and on, on the fly. It is great for debugging rails SQL commands if you don''t wan''t to parse log files. I use my AR plugin in combination with this AR IO mode plugin to generate all of my SQL statements and I have them written to disk. I then batch load them via a mysql command line client. In one instance every few hours I receive an update from another system. Some updates are very small, a few kb in size, other updates it''s 15Mb to 40Mb of straight data. It doesn''t come formatted in SQL statements, so I have to parse the data myself. It is faster for me to deal with my AR plugin API, and let AR do all of the hard work for me, generating statements, quoting, etc. plus I get all of the already builtin AR functionality for free should I need to extend my work. It is also easy to maintain, since we use AR, ruby and the rest of rails for alot of this system on the backend, we worry about ruby coding and working withing ruby-based method APIs. We also get a huge code base to use, which is nice and easy to extend. I am working on limiting this performance gap for mysql though, to where things don''t need to be batch loaded outside of a ruby instance. This is limited though due to how ruby''s GC work, if you want to continuous run a daemon that does these huge data operations, but you never restart the daemon. Another problem I face is the max_allowed_packet size issue. One of my next todo''s is allow the user to configure whether or not they want it dynamically resized per that session, or if they want the AR plugin to split up large queries into the least # of queries to support the current max_allowed_packet size. And to answer your questions after 6 paragraphs, =) , I use mysqldump and other tools where it makes sense. But I need a way to get from data_format_a to data_format_sql, and I prefer ruby in that regard with AR at the moment. Zach -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFERXuXMyx0fW1d8G0RAuKPAJ920A+gWxpX4+UhTZgKIZgw2RamWQCdFnMX qWMGmY9yxxs4QdqzIczCP6g=m1Bi -----END PGP SIGNATURE-----
Your IO Mode plugin sounds interesting too... care to share it? That and this bulk AR/virtual table both sound useful to me. Release them as plugins? : ) -jeff On 4/18/06, zdennis <zdennis@mktec.com> wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Michael Koziarski wrote: > >>>I currently use RoR on a few small web sites, and some internal webapps, but I spend 90% of my time with RoR with AR optimizing > >>>it''s speed. > > > > > > Are you only noticing this speed problem with big batch operations > > like you mentioned in the earlier email? > > The bigger the batch, the more time it takes, so it''s easier to notice big speed issues with 500,000 records, rather then 50 > records. Alot of what rails (and AR) give the user in terms of flexibility and ease of use comes with a huge performance penalty. > I am willing to sacrifice niceties for better performance. > > > When working with an ORM, it''s fairly common to do heavy data imports, > > batch loads and the like using the database''s built in tools, what > > makes you want to use AR instead of tools like mysqldump etc? > > > > There is more to my story though, but I didn''t want to rant on to long in my last email. Myself and a colleague of mine, Mark Van > Holstyn, came up with a idea for an IO Mode plugin for AR. It works beautifully. You can turn it off and on, on the fly. It is > great for debugging rails SQL commands if you don''t wan''t to parse log files. I use my AR plugin in combination with this AR IO > mode plugin to generate all of my SQL statements and I have them written to disk. I then batch load them via a mysql command line > client. > > In one instance every few hours I receive an update from another system. Some updates are very small, a few kb in size, other > updates it''s 15Mb to 40Mb of straight data. It doesn''t come formatted in SQL statements, so I have to parse the data myself. > > It is faster for me to deal with my AR plugin API, and let AR do all of the hard work for me, generating statements, quoting, etc. > plus I get all of the already builtin AR functionality for free should I need to extend my work. > > It is also easy to maintain, since we use AR, ruby and the rest of rails for alot of this system on the backend, we worry about > ruby coding and working withing ruby-based method APIs. We also get a huge code base to use, which is nice and easy to extend. > > I am working on limiting this performance gap for mysql though, to where things don''t need to be batch loaded outside of a ruby > instance. This is limited though due to how ruby''s GC work, if you want to continuous run a daemon that does these huge data > operations, but you never restart the daemon. > > Another problem I face is the max_allowed_packet size issue. One of my next todo''s is allow the user to configure whether or not > they want it dynamically resized per that session, or if they want the AR plugin to split up large queries into the least # of > queries to support the current max_allowed_packet size. > > And to answer your questions after 6 paragraphs, =) , I use mysqldump and other tools where it makes sense. But I need a way to > get from data_format_a to data_format_sql, and I prefer ruby in that regard with AR at the moment. > > Zach > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFERXuXMyx0fW1d8G0RAuKPAJ920A+gWxpX4+UhTZgKIZgw2RamWQCdFnMX > qWMGmY9yxxs4QdqzIczCP6g> =m1Bi > -----END PGP SIGNATURE----- > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-core >-- Jeff Lindsay http://blogrium.com/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jeff Lindsay wrote:> Your IO Mode plugin sounds interesting too... care to share it? That > and this bulk AR/virtual table both sound useful to me. Release them > as plugins? : )Here is a link to 0.0.1 of the ActiveRecord IO Mode plugin: http://www.mktec.com/oss/rails-plugins/active_record_io_mode.0.0.1.tgz It is more documentation then code, and only 201 lines. It works successfully with the MysqlAdapter. I believe it should work for Postgres, but I''m not a Postgres user so any feedback would be great. Any Oracle, MSSQL, SQLite, etc... users please let me know if this works or not for you. Once installed in the vendor/plugins directory you can test this by running "ruby script/console" from the RAILS_ROOT, and doing something similar to: # to redirect sql to an array >> arr = [] >> ActiveRecord::Base.send_sql_to arr => true # let''s test it >> Test.find :all, :limit => 1 => [] >> arr => ["SELECT * FROM tests LIMIT 1;\n"] # redirect sql back to our db connection >> ActiveRecord::Base.send_sql_to :db => true >> Test.find :all, :limit => 1 => [#<Test:0x306daef0 @attributes={"name"=>"test1", "id"=>"1"}>] I am working on packaging the other plugin sometime in the next day or two. Any feedback or suggestions are graciously accepted! Zach -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFERvrhMyx0fW1d8G0RAsOeAJ9abtP2VPk4rzpLu1uC3w4kiwKCgACcD9do 9dI8dPKOvzrKOjsQ97CpQgI=OHQS -----END PGP SIGNATURE-----
Very cool. That''s just what I thought it was. : ) -jeff On 4/19/06, zdennis <zdennis@mktec.com> wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Jeff Lindsay wrote: > > Your IO Mode plugin sounds interesting too... care to share it? That > > and this bulk AR/virtual table both sound useful to me. Release them > > as plugins? : ) > > Here is a link to 0.0.1 of the ActiveRecord IO Mode plugin: > http://www.mktec.com/oss/rails-plugins/active_record_io_mode.0.0.1.tgz > > It is more documentation then code, and only 201 lines. It works > successfully with the MysqlAdapter. I believe it should work for > Postgres, but I''m not a Postgres user so any feedback would be great. > Any Oracle, MSSQL, SQLite, etc... users please let me know if this works > or not for you. > > Once installed in the vendor/plugins directory you can test this by > running "ruby script/console" from the RAILS_ROOT, and doing something > similar to: > > # to redirect sql to an array > >> arr = [] > >> ActiveRecord::Base.send_sql_to arr > => true > > # let''s test it > >> Test.find :all, :limit => 1 > => [] > >> arr > => ["SELECT * FROM tests LIMIT 1;\n"] > > # redirect sql back to our db connection > >> ActiveRecord::Base.send_sql_to :db > => true > >> Test.find :all, :limit => 1 > => [#<Test:0x306daef0 @attributes={"name"=>"test1", "id"=>"1"}>] > > I am working on packaging the other plugin sometime in the next day or two. > > Any feedback or suggestions are graciously accepted! > > Zach > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFERvrhMyx0fW1d8G0RAsOeAJ9abtP2VPk4rzpLu1uC3w4kiwKCgACcD9do > 9dI8dPKOvzrKOjsQ97CpQgI> =OHQS > -----END PGP SIGNATURE----- > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-core >-- Jeff Lindsay http://blogrium.com/