I''ve been working on improving the performance and functionality of
to_xml.
While putting together this patch, I discovered that the recent upgrade of
Builder::XMLMarkup to 2.0 slows down to_xml by 80%, since it now encodes
attributes and seems to do a better job, that''s more expensive, of
encoding all
values.
http://dev.rubyonrails.org/changeset/4260
Could somebody take a look at this patch and give some feedback.
http://dev.rubyonrails.org/ticket/4989
The major benefits of this patch are (more improvements are listed in the
ticket).
1) Allow ActiveRecord::Base subclasses to override to_xml. Currently, all AR
instances are turned into a Hash and then have to_xml run on the hash, which
prevents subclassing of AR instances through an :include.
2) Speeds up to_xml generation by 22-30% depending upon the model. This is done
by using the fact that we know the SQL column type and this allows us to skip
the XML encoding if the type doesn''t need encoding, such as booleans,
floats,
integers. I''ve attached a small script you can use to time the to_xml
before
and after the patch.
3) Allows the :include option to be arbitrary deep, previously, only one level
of :include worked.
4) Binary columns are Base64 encoded and have an encoding="base64"
attribute
added to it. It turns out that it is 80 times faster to do builder <<
Base64.encode(value) then builder.tag!(tag, value).
Outstanding issues are
1) The XML output is not at nice looking with the optimization of using
builder.tag!(tag) do
builder << value.to_s
end
instead of builder.tag!(tag, value), but that seems like a small price to pay.
<?xml version="1.0" encoding="UTF-8"?>
<node>
<node-id type="integer">
1234 </node-id>
</node>
Regards,
Blair
--
Blair Zajac, Ph.D.
<blair@orcaware.com>
Subversion training, consulting and support
http://www.orcaware.com/svn/
#!/usr/local/bin/ruby
# This assumes that the script is run from the top of the Ruby on
# Rails project.
print "Loading Ruby on Rails..."
$stdout.flush
require ''config/environment''
print "\n"
require ''benchmark''
include Benchmark
N = 1000
subclasses = ActiveRecord::Base.send(:subclasses)
subclasses = subclasses.reject { |s|
''CGI::Session::ActiveRecordStore::Session''
== s.to_s }
subclasses = subclasses.sort_by { |s| s.to_s }
max_length = subclasses.collect { |s| s.to_s.length }.max
bm(max_length + 4) do |x|
subclasses.each do |subclass|
obj = subclass.find(:first, :order => subclass.primary_key)
s = sprintf(''%s %2d %s'',
subclass,
subclass.column_names.length,
'' ''*(max_length - subclass.to_s.length))
x.report(s) { N.times { obj.to_xml } }
end
end
Julian ''Julik'' Tarkhanov
2006-Jun-02 23:47 UTC
Re: 22-30% faster and added to_xml functionality
On 2-jun-2006, at 23:44, Blair Zajac wrote:> o_xml generation by 22-30% depending upon the model. This is done > by using the fact that we know the SQL column type and this allows > us to skip > the XML encoding if the type doesn''t need encoding, such as > booleans, floats, > integers. I''ve attached a small script you can use to time the > to_xml before > and after the patch. > > 3) Allows the :include option to be arbitrary deep, previously, > only one level > of :include worked. > > 4) Binary columns are Base64 encoded and have an encoding="base64" > attribute > added to it. It turns out that it is 80 times faster to do builder << > Base64.encode(value) then builder.tag!(tag, value). > > Outstanding issues are > > 1) The XML output is not at nice looking with the optimization of > using > > builder.tag!(tag) do > builder << value.to_s > endThe new Builder "escape all" approach is a bit absurd, especially considering it''s doing packs/unpacks to secure all Unicode values outside of ASCII. I tried to devise a patch to that but the stuff is buried too deep and Jim didn''t seem keen on the idea. -- Julian ''Julik'' Tarkhanov please send all personal mail to me at julik.nl
David Heinemeier Hansson
2006-Jun-05 03:38 UTC
Re: 22-30% faster and added to_xml functionality
> I''ve been working on improving the performance and functionality of to_xml.Good stuff, Blair. I''ve taken the liberty to massively refactor the implementation, though. Having a method going on 100+ lines of code was a sure tell sign that it needed some love. And it never was a good fit for base.rb anyway, so now it all sits in xml_serialization.rb and is mixed in. -- David Heinemeier Hansson http://www.loudthinking.com -- Broadcasting Brain http://www.basecamphq.com -- Online project management http://www.backpackit.com -- Personal information manager http://www.rubyonrails.com -- Web-application framework
David Heinemeier Hansson wrote:>> I''ve been working on improving the performance and functionality of >> to_xml. > > > Good stuff, Blair. I''ve taken the liberty to massively refactor the > implementation, though. Having a method going on 100+ lines of code > was a sure tell sign that it needed some love. And it never was a good > fit for base.rb anyway, so now it all sits in xml_serialization.rb and > is mixed in.Thanks David. Regards, Blair