I''ve been working on improving the performance and functionality of to_xml. While putting together this patch, I discovered that the recent upgrade of Builder::XMLMarkup to 2.0 slows down to_xml by 80%, since it now encodes attributes and seems to do a better job, that''s more expensive, of encoding all values. http://dev.rubyonrails.org/changeset/4260 Could somebody take a look at this patch and give some feedback. http://dev.rubyonrails.org/ticket/4989 The major benefits of this patch are (more improvements are listed in the ticket). 1) Allow ActiveRecord::Base subclasses to override to_xml. Currently, all AR instances are turned into a Hash and then have to_xml run on the hash, which prevents subclassing of AR instances through an :include. 2) Speeds up to_xml generation by 22-30% depending upon the model. This is done by using the fact that we know the SQL column type and this allows us to skip the XML encoding if the type doesn''t need encoding, such as booleans, floats, integers. I''ve attached a small script you can use to time the to_xml before and after the patch. 3) Allows the :include option to be arbitrary deep, previously, only one level of :include worked. 4) Binary columns are Base64 encoded and have an encoding="base64" attribute added to it. It turns out that it is 80 times faster to do builder << Base64.encode(value) then builder.tag!(tag, value). Outstanding issues are 1) The XML output is not at nice looking with the optimization of using builder.tag!(tag) do builder << value.to_s end instead of builder.tag!(tag, value), but that seems like a small price to pay. <?xml version="1.0" encoding="UTF-8"?> <node> <node-id type="integer"> 1234 </node-id> </node> Regards, Blair -- Blair Zajac, Ph.D. <blair@orcaware.com> Subversion training, consulting and support http://www.orcaware.com/svn/ #!/usr/local/bin/ruby # This assumes that the script is run from the top of the Ruby on # Rails project. print "Loading Ruby on Rails..." $stdout.flush require ''config/environment'' print "\n" require ''benchmark'' include Benchmark N = 1000 subclasses = ActiveRecord::Base.send(:subclasses) subclasses = subclasses.reject { |s| ''CGI::Session::ActiveRecordStore::Session'' == s.to_s } subclasses = subclasses.sort_by { |s| s.to_s } max_length = subclasses.collect { |s| s.to_s.length }.max bm(max_length + 4) do |x| subclasses.each do |subclass| obj = subclass.find(:first, :order => subclass.primary_key) s = sprintf(''%s %2d %s'', subclass, subclass.column_names.length, '' ''*(max_length - subclass.to_s.length)) x.report(s) { N.times { obj.to_xml } } end end
Julian ''Julik'' Tarkhanov
2006-Jun-02 23:47 UTC
Re: 22-30% faster and added to_xml functionality
On 2-jun-2006, at 23:44, Blair Zajac wrote:> o_xml generation by 22-30% depending upon the model. This is done > by using the fact that we know the SQL column type and this allows > us to skip > the XML encoding if the type doesn''t need encoding, such as > booleans, floats, > integers. I''ve attached a small script you can use to time the > to_xml before > and after the patch. > > 3) Allows the :include option to be arbitrary deep, previously, > only one level > of :include worked. > > 4) Binary columns are Base64 encoded and have an encoding="base64" > attribute > added to it. It turns out that it is 80 times faster to do builder << > Base64.encode(value) then builder.tag!(tag, value). > > Outstanding issues are > > 1) The XML output is not at nice looking with the optimization of > using > > builder.tag!(tag) do > builder << value.to_s > endThe new Builder "escape all" approach is a bit absurd, especially considering it''s doing packs/unpacks to secure all Unicode values outside of ASCII. I tried to devise a patch to that but the stuff is buried too deep and Jim didn''t seem keen on the idea. -- Julian ''Julik'' Tarkhanov please send all personal mail to me at julik.nl
David Heinemeier Hansson
2006-Jun-05 03:38 UTC
Re: 22-30% faster and added to_xml functionality
> I''ve been working on improving the performance and functionality of to_xml.Good stuff, Blair. I''ve taken the liberty to massively refactor the implementation, though. Having a method going on 100+ lines of code was a sure tell sign that it needed some love. And it never was a good fit for base.rb anyway, so now it all sits in xml_serialization.rb and is mixed in. -- David Heinemeier Hansson http://www.loudthinking.com -- Broadcasting Brain http://www.basecamphq.com -- Online project management http://www.backpackit.com -- Personal information manager http://www.rubyonrails.com -- Web-application framework
David Heinemeier Hansson wrote:>> I''ve been working on improving the performance and functionality of >> to_xml. > > > Good stuff, Blair. I''ve taken the liberty to massively refactor the > implementation, though. Having a method going on 100+ lines of code > was a sure tell sign that it needed some love. And it never was a good > fit for base.rb anyway, so now it all sits in xml_serialization.rb and > is mixed in.Thanks David. Regards, Blair