thr3ads.net - Facebooker talk - [Facebooker-talk] REXML vs libxml [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Yu-Shan Fung

2008-Oct-15 21:44 UTC

[Facebooker-talk] REXML vs libxml

Hi all,

I''ve been looking at the performance of my fb app and one glaring issue
seems to be with the parsing speed of rexml in processing the results. Has
anyone looked into porting the facebooker parser from rexml to libxml? If
not, any reason I shouldn''t try?

Thanks!
Yu-Shan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/facebooker-talk/attachments/20081015/d52870eb/attachment.html>

Mike Mangino

2008-Oct-15 22:27 UTC

head link

[Facebooker-talk] REXML vs libxml

On Oct 15, 2008, at 5:44 PM, Yu-Shan Fung wrote:
> Hi all,
>
> I''ve been looking at the performance of my fb app and one glaring
> issue seems to be with the parsing speed of rexml in processing the  
> results. Has anyone looked into porting the facebooker parser from  
> rexml to libxml? If not, any reason I shouldn''t try?
Have you actually benchmarked this? If you have, and it is truly an  
issue and you can make it transparent, go for it. I would be shocked  
if this is a bottleneck for the typical web application.

Mike

>
>
> Thanks!
> Yu-Shan.
> _______________________________________________
> Facebooker-talk mailing list
> Facebooker-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/facebooker-talk
--
Mike Mangino
http://www.elevatedrails.com

Yu-Shan Fung

2008-Oct-15 23:16 UTC

head link

[Facebooker-talk] REXML vs libxml

Thanks Mike, that''s what I did. Ours is probably not what
you''d consider a
typical web application. We analyzes the user''s facebook network
extensively. A single request for user.friends! for a user with 900 friends
takes almost 400s to run, almost 300 of which is spent doing this:
REXML::Document#initialize

To make matters worse, the current parser runes through
REXML::Document#initialize twice, once in Errors.process, and another in the
regular Parser.process
In fact, a simple fix to Errors.process already shaved half of the 300s
(commented was the original):

  class Errors < Parser#:nodoc:
    def self.process(data)
      data = data.body rescue data # either data or an HTTP response
      if data.include?(''<error_response'')
        error_code = error_msg = nil
        matches = /<error_code>(\d+)<\/error_code>/.match(data)
        if matches
          error_code = matches[0]
        end
        matches = /<error_msg>(.*?)<\/error_msg>/.match(data)
        if matches
          error_msg = matches[0]
        end
        exception = EXCEPTIONS[error_code.to_i] || StandardError
        raise exception.new(error_msg)
#        response_element = element(''error_response'', data)
rescue nil
#        if response_element
#          hash = hashinate(response_element)
#          exception = EXCEPTIONS[Integer(hash[''error_code''])]
||
StandardError
#          raise exception.new(hash[''error_msg''])
#        end
      else
        nil
      end
    end


Not half as pretty, but a lot faster...



On Wed, Oct 15, 2008 at 3:27 PM, Mike Mangino <mmangino at
elevatedrails.com>wrote:
>
> On Oct 15, 2008, at 5:44 PM, Yu-Shan Fung wrote:
>
>  Hi all,
>>
>> I''ve been looking at the performance of my fb app and one
glaring issue
>> seems to be with the parsing speed of rexml in processing the results.
Has
>> anyone looked into porting the facebooker parser from rexml to libxml?
If
>> not, any reason I shouldn''t try?
>>
>
> Have you actually benchmarked this? If you have, and it is truly an issue
> and you can make it transparent, go for it. I would be shocked if this is a
> bottleneck for the typical web application.
>
> Mike
>
>
>
>>
>> Thanks!
>> Yu-Shan.
>> _______________________________________________
>> Facebooker-talk mailing list
>> Facebooker-talk at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/facebooker-talk
>>
>
> --
> Mike Mangino
> http://www.elevatedrails.com
>
>
>
>

-- 
"Reality is that which, when you stop believing in it, doesn''t go
away."
- Philip K. Dick, American Writer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/facebooker-talk/attachments/20081015/5f98bd3c/attachment-0001.html>

Sean Abrahams

2008-Oct-16 01:02 UTC

head link

[Facebooker-talk] REXML vs libxml

REXML is notorious for its slowness. I would love to see libxml
replace it in facebooker.

Let me know if you would like any help.

Cheers.

On Wed, Oct 15, 2008 at 4:16 PM, Yu-Shan Fung <ambivalence at gmail.com>
wrote:> Thanks Mike, that''s what I did. Ours is probably not what
you''d consider a
> typical web application. We analyzes the user''s facebook network
> extensively. A single request for user.friends! for a user with 900 friends
> takes almost 400s to run, almost 300 of which is spent doing this:
> REXML::Document#initialize
>
> To make matters worse, the current parser runes through
> REXML::Document#initialize twice, once in Errors.process, and another in
the
> regular Parser.process
> In fact, a simple fix to Errors.process already shaved half of the 300s
> (commented was the original):
>
>   class Errors < Parser#:nodoc:
>     def self.process(data)
>       data = data.body rescue data # either data or an HTTP response
>       if data.include?(''<error_response'')
>         error_code = error_msg = nil
>         matches = /<error_code>(\d+)<\/error_code>/.match(data)
>         if matches
>           error_code = matches[0]
>         end
>         matches = /<error_msg>(.*?)<\/error_msg>/.match(data)
>         if matches
>           error_msg = matches[0]
>         end
>         exception = EXCEPTIONS[error_code.to_i] || StandardError
>         raise exception.new(error_msg)
> #        response_element = element(''error_response'',
data) rescue nil
> #        if response_element
> #          hash = hashinate(response_element)
> #          exception =
EXCEPTIONS[Integer(hash[''error_code''])] ||
> StandardError
> #          raise exception.new(hash[''error_msg''])
> #        end
>       else
>         nil
>       end
>     end
>
>
> Not half as pretty, but a lot faster...
>
>
>
> On Wed, Oct 15, 2008 at 3:27 PM, Mike Mangino <mmangino at
elevatedrails.com>
> wrote:
>>
>> On Oct 15, 2008, at 5:44 PM, Yu-Shan Fung wrote:
>>
>>> Hi all,
>>>
>>> I''ve been looking at the performance of my fb app and one
glaring issue
>>> seems to be with the parsing speed of rexml in processing the
results. Has
>>> anyone looked into porting the facebooker parser from rexml to
libxml? If
>>> not, any reason I shouldn''t try?
>>
>> Have you actually benchmarked this? If you have, and it is truly an
issue
>> and you can make it transparent, go for it. I would be shocked if this
is a
>> bottleneck for the typical web application.
>>
>> Mike
>>
>>
>>>
>>>
>>> Thanks!
>>> Yu-Shan.
>>> _______________________________________________
>>> Facebooker-talk mailing list
>>> Facebooker-talk at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/facebooker-talk
>>
>> --
>> Mike Mangino
>> http://www.elevatedrails.com
>>
>>
>>
>
>
>
> --
> "Reality is that which, when you stop believing in it,
doesn''t go away."
> - Philip K. Dick, American Writer
>
> _______________________________________________
> Facebooker-talk mailing list
> Facebooker-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/facebooker-talk
>
>

Mike Mangino

2008-Oct-16 12:49 UTC

head link

[Facebooker-talk] REXML vs libxml

On Oct 15, 2008, at 7:16 PM, Yu-Shan Fung wrote:
> Thanks Mike, that''s what I did. Ours is probably not what
you''d
> consider a typical web application. We analyzes the user''s
facebook
> network extensively. A single request for user.friends! for a user  
> with 900 friends takes almost 400s to run, almost 300 of which is  
> spent doing this:
> REXML::Document#initialize
>
Ah, okay. That sounds like something worth fixing.
> To make matters worse, the current parser runes through  
> REXML::Document#initialize twice, once in Errors.process, and  
> another in the regular Parser.process
> In fact, a simple fix to Errors.process already shaved half of the  
> 300s (commented was the original):
>

Do you have this as a commit in a fork on github? If so, I can pull  
this in. I would like to see the error code handling pulled out in  
functions like extract_error_code and extract_error_message, but other  
than that, this looks good.

Mike
>   class Errors < Parser#:nodoc:
>     def self.process(data)
>       data = data.body rescue data # either data or an HTTP response
>       if data.include?(''<error_response'')
>         error_code = error_msg = nil
>         matches = /<error_code>(\d+)<\/error_code>/.match(data)
>         if matches
>           error_code = matches[0]
>         end
>         matches = /<error_msg>(.*?)<\/error_msg>/.match(data)
>         if matches
>           error_msg = matches[0]
>         end
>         exception = EXCEPTIONS[error_code.to_i] || StandardError
>         raise exception.new(error_msg)
> #        response_element = element(''error_response'',
data) rescue nil
> #        if response_element
> #          hash = hashinate(response_element)
> #          exception =
EXCEPTIONS[Integer(hash[''error_code''])] ||
> StandardError
> #          raise exception.new(hash[''error_msg''])
> #        end
>       else
>         nil
>       end
>     end
>
>
> Not half as pretty, but a lot faster...
>
>
>
> On Wed, Oct 15, 2008 at 3:27 PM, Mike Mangino <mmangino at
elevatedrails.com
> > wrote:
>
> On Oct 15, 2008, at 5:44 PM, Yu-Shan Fung wrote:
>
> Hi all,
>
> I''ve been looking at the performance of my fb app and one glaring
> issue seems to be with the parsing speed of rexml in processing the  
> results. Has anyone looked into porting the facebooker parser from  
> rexml to libxml? If not, any reason I shouldn''t try?
>
> Have you actually benchmarked this? If you have, and it is truly an  
> issue and you can make it transparent, go for it. I would be shocked  
> if this is a bottleneck for the typical web application.
>
> Mike
>
>
>
>
> Thanks!
> Yu-Shan.
> _______________________________________________
> Facebooker-talk mailing list
> Facebooker-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/facebooker-talk
>
> --
> Mike Mangino
> http://www.elevatedrails.com
>
>
>
>
>
>
> -- 
> "Reality is that which, when you stop believing in it,
doesn''t go
> away."
> - Philip K. Dick, American Writer
--
Mike Mangino
http://www.elevatedrails.com

Joe Van Dyk

2008-Oct-17 22:17 UTC

head link

[Facebooker-talk] REXML vs libxml

On Wed, Oct 15, 2008 at 4:16 PM, Yu-Shan Fung <ambivalence at gmail.com>
wrote:> Thanks Mike, that''s what I did. Ours is probably not what
you''d consider a
> typical web application. We analyzes the user''s facebook network
> extensively. A single request for user.friends! for a user with 900 friends
> takes almost 400s to run, almost 300 of which is spent doing this:
> REXML::Document#initialize
Really 400 seconds?!  Yeesh.

Joe

Apparently Analagous Threads

Search for more apparently analagous threads

Facebooker talk - Oct 2008 - REXML vs libxml

[Facebooker-talk] REXML vs libxml

[Facebooker-talk] REXML vs libxml

[Facebooker-talk] REXML vs libxml

[Facebooker-talk] REXML vs libxml

[Facebooker-talk] REXML vs libxml

[Facebooker-talk] REXML vs libxml

Apparently Analagous Threads