thr3ads.net - Rails - net/http vs . . . curl? anything else? what''s fastest [Aug 2007]

If this information is useful, please help other people find it:
Share via:

charlie caroff

2007-Aug-15 20:24 UTC

net/http vs . . . curl? anything else? what''s fastest

Hi,

I''m grabbing xml feeds with net/http, and I''m wondering if
there is
anything else out there that''s faster.  Any suggestions?

Charlie


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Davi

2007-Aug-15 20:51 UTC

head link

Re: net/http vs . . . curl? anything else? what''s fastest

Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu:> Hi,
>
> I''m grabbing xml feeds with net/http, and I''m wondering
if there is
> anything else out there that''s faster.  Any suggestions?
>
	Someone have suggested me to use Hpricot.

	HTH,
-- 
Davi Vidal
--
E-mail: davividal-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org
MSN   : davividal-uAjRD0nVeow@public.gmane.org
GTalk : davividal-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Skype : davi vidal
ICQ   : 138815296

charlie caroff

2007-Aug-15 21:08 UTC

head link

Re: net/http vs . . . curl? anything else? what''s fastest

I think hpricot uses open-uri to grab xml.  I believe that open-uri is
a wrapper around net/http, so I don''t think it will be faster than net/
http.

I''m looking for the grabbing part.  I wonder if there is anything out
there faster than net/http.

Charlie

On Aug 15, 1:51 pm, Davi
<davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org>
wrote:> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu:
>
> > Hi,
>
> > I''m grabbing xml feeds with net/http, and I''m
wondering if there is
> > anything else out there that''s faster.  Any suggestions?
>
>         Someone have suggested me to use Hpricot.
>
>         HTH,
> --
> Davi Vidal
> --
> E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org
> MSN   : davivi...-uAjRD0nVeow@public.gmane.org
> GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
> Skype : davi vidal
> ICQ   : 138815296
>
>  application_pgp-signature_part
> 1KDownload

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Paul Hoehne

2007-Aug-15 21:09 UTC

head link

Re: net/http vs . . . curl? anything else? what''s fastest

When you say faster, what do you mean?

Is it a throughput issue (i.e. # of docs/sec)?
Is it a latency issue (i.e. from the start of retrieving the doc to  
the time the doc gets  back is too long)?
Is is an xml parsing issue (i.e. parsing that many documents is slow  
and loading the server)?

One way to fix the first kind of problem might be spawn a process to  
down-load each document instead of downloading the documents in  
series.  But that''s not really a Net/HTTP issue.  Ruby is a nicely  
extensible language that you can plug in C modules that may be faster  
than the equivalent ruby code.  Sometimes you can find modules that  
rely on C code instead of ruby code to perform a given task, and they  
may be faster.

One problem with TCP performance, especially with SSL, may be paying  
for the connection handshake.  If you retrieve all your docs from the  
same address(es), you might want to engineer a solution that avoids  
having to setup and tear-down connections for each doc.

On Aug 15, 2007, at 4:51 PM, Davi wrote:
> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu:
>> Hi,
>>
>> I''m grabbing xml feeds with net/http, and I''m
wondering if there is
>> anything else out there that''s faster.  Any suggestions?
>>
>
> 	Someone have suggested me to use Hpricot.
>
> 	HTH,
> -- 
> Davi Vidal
> --
> E-mail: davividal-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org
> MSN   : davividal-uAjRD0nVeow@public.gmane.org
> GTalk : davividal-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
> Skype : davi vidal
> ICQ   : 138815296

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

charlie caroff

2007-Aug-15 21:39 UTC

head link

Re: net/http vs . . . curl? anything else? what''s fastest

There are some good ideas here.  Thanks.  Here''s what I know so far:

-- xmlparser (~.2 sec)  is many many times faster than rexml (~1.9
sec) or hpricot (1.3 sec), at least the way I''m kludging it so far
-- now that i''ve decided on xmlparser, for now, the biggest time lag
is getting the content over the ''net via net/http (~1.0 - 1.8 sec) --
that is from the beginning of the request, to the time the content is
completely retrieved
-- i don''t know where the time lag is coming from
-- it would be terrific to reuse a connection that is grabbing many
feeds from the same source, any hints on that?
-- i don''t know whether throughput is an issue, and I don''t
know how
to break that down with net/http

Is there some kind of tutorial or guide you know about where I can
learn how to extend, say, curl to work with ruby, so I can grab
content with curl?

Charlie


On Aug 15, 2:09 pm, Paul Hoehne
<phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org>
wrote:> When you say faster, what do you mean?
>
> Is it a throughput issue (i.e. # of docs/sec)?
> Is it a latency issue (i.e. from the start of retrieving the doc to
> the time the doc gets  back is too long)?
> Is is an xml parsing issue (i.e. parsing that many documents is slow
> and loading the server)?
>
> One way to fix the first kind of problem might be spawn a process to
> down-load each document instead of downloading the documents in
> series.  But that''s not really a Net/HTTP issue.  Ruby is a nicely
> extensible language that you can plug in C modules that may be faster
> than the equivalent ruby code.  Sometimes you can find modules that
> rely on C code instead of ruby code to perform a given task, and they
> may be faster.
>
> One problem with TCP performance, especially with SSL, may be paying
> for the connection handshake.  If you retrieve all your docs from the
> same address(es), you might want to engineer a solution that avoids
> having to setup and tear-down connections for each doc.
>
> On Aug 15, 2007, at 4:51 PM, Davi wrote:
>
> > Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu:
> >> Hi,
>
> >> I''m grabbing xml feeds with net/http, and I''m
wondering if there is
> >> anything else out there that''s faster.  Any suggestions?
>
> >    Someone have suggested me to use Hpricot.
>
> >    HTH,
> > --
> > Davi Vidal
> > --
> > E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org
> > MSN   : davivi...-uAjRD0nVeow@public.gmane.org
> > GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
> > Skype : davi vidal
> > ICQ   : 138815296

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Paul Hoehne

2007-Aug-15 21:43 UTC

head link

Re: net/http vs . . . curl? anything else? what''s fastest

http://curb.rubyforge.org/

On Aug 15, 2007, at 5:39 PM, charlie caroff wrote:
>
> There are some good ideas here.  Thanks.  Here''s what I know so
far:
>
> -- xmlparser (~.2 sec)  is many many times faster than rexml (~1.9
> sec) or hpricot (1.3 sec), at least the way I''m kludging it so far
> -- now that i''ve decided on xmlparser, for now, the biggest time
lag
> is getting the content over the ''net via net/http (~1.0 - 1.8 sec)
--
> that is from the beginning of the request, to the time the content is
> completely retrieved
> -- i don''t know where the time lag is coming from
> -- it would be terrific to reuse a connection that is grabbing many
> feeds from the same source, any hints on that?
> -- i don''t know whether throughput is an issue, and I
don''t know how
> to break that down with net/http
>
> Is there some kind of tutorial or guide you know about where I can
> learn how to extend, say, curl to work with ruby, so I can grab
> content with curl?
>
> Charlie
>
>
> On Aug 15, 2:09 pm, Paul Hoehne
<phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org> wrote:
>> When you say faster, what do you mean?
>>
>> Is it a throughput issue (i.e. # of docs/sec)?
>> Is it a latency issue (i.e. from the start of retrieving the doc to
>> the time the doc gets  back is too long)?
>> Is is an xml parsing issue (i.e. parsing that many documents is slow
>> and loading the server)?
>>
>> One way to fix the first kind of problem might be spawn a process to
>> down-load each document instead of downloading the documents in
>> series.  But that''s not really a Net/HTTP issue.  Ruby is a
nicely
>> extensible language that you can plug in C modules that may be faster
>> than the equivalent ruby code.  Sometimes you can find modules that
>> rely on C code instead of ruby code to perform a given task, and they
>> may be faster.
>>
>> One problem with TCP performance, especially with SSL, may be paying
>> for the connection handshake.  If you retrieve all your docs from the
>> same address(es), you might want to engineer a solution that avoids
>> having to setup and tear-down connections for each doc.
>>
>> On Aug 15, 2007, at 4:51 PM, Davi wrote:
>>
>>> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu:
>>>> Hi,
>>
>>>> I''m grabbing xml feeds with net/http, and I''m
wondering if there is
>>>> anything else out there that''s faster.  Any
suggestions?
>>
>>>    Someone have suggested me to use Hpricot.
>>
>>>    HTH,
>>> --
>>> Davi Vidal
>>> --
>>> E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org
>>> MSN   : davivi...-uAjRD0nVeow@public.gmane.org
>>> GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
>>> Skype : davi vidal
>>> ICQ   : 138815296
>
>
> >
>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Paul Hoehne

2007-Aug-15 21:54 UTC

head link

Re: net/http vs . . . curl? anything else? what''s fastest

This is just a suggestion:

If you''re grabbing documents from the same place, you might want to  
cobble your own server specific to sending xml documents.  (A one- 
trick pony that''s very good and fast at its trick).  On the
"client"
side you have a local service that establishes a connection to your  
remote service.  A process on your client contacts the local server  
which uses a connection in a pool of connections between the local  
server and the remote server.  When the document is retrieved from  
the remote server, the connection is put back into the pool and the  
document is returned to the process that requested it from the local  
server.  Because the local server, proxy if you will, pools its  
connections you only pay for the ssl/tcp connection once.  However,  
this may require more work than you''re willing to do - depending on  
the degree of performance you need.

HTTP is also supposed to be able to re-use a connection, so just  
keeping your http connections around, or pooling them, might help as  
well.


Local Process ----Get doc---> Local server ---Get doc---> Remote server.

On Aug 15, 2007, at 5:39 PM, charlie caroff wrote:
>
> There are some good ideas here.  Thanks.  Here''s what I know so
far:
>
> -- xmlparser (~.2 sec)  is many many times faster than rexml (~1.9
> sec) or hpricot (1.3 sec), at least the way I''m kludging it so far
> -- now that i''ve decided on xmlparser, for now, the biggest time
lag
> is getting the content over the ''net via net/http (~1.0 - 1.8 sec)
--
> that is from the beginning of the request, to the time the content is
> completely retrieved
> -- i don''t know where the time lag is coming from
> -- it would be terrific to reuse a connection that is grabbing many
> feeds from the same source, any hints on that?
> -- i don''t know whether throughput is an issue, and I
don''t know how
> to break that down with net/http
>
> Is there some kind of tutorial or guide you know about where I can
> learn how to extend, say, curl to work with ruby, so I can grab
> content with curl?
>
> Charlie
>
>
> On Aug 15, 2:09 pm, Paul Hoehne
<phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org> wrote:
>> When you say faster, what do you mean?
>>
>> Is it a throughput issue (i.e. # of docs/sec)?
>> Is it a latency issue (i.e. from the start of retrieving the doc to
>> the time the doc gets  back is too long)?
>> Is is an xml parsing issue (i.e. parsing that many documents is slow
>> and loading the server)?
>>
>> One way to fix the first kind of problem might be spawn a process to
>> down-load each document instead of downloading the documents in
>> series.  But that''s not really a Net/HTTP issue.  Ruby is a
nicely
>> extensible language that you can plug in C modules that may be faster
>> than the equivalent ruby code.  Sometimes you can find modules that
>> rely on C code instead of ruby code to perform a given task, and they
>> may be faster.
>>
>> One problem with TCP performance, especially with SSL, may be paying
>> for the connection handshake.  If you retrieve all your docs from the
>> same address(es), you might want to engineer a solution that avoids
>> having to setup and tear-down connections for each doc.
>>
>> On Aug 15, 2007, at 4:51 PM, Davi wrote:
>>
>>> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu:
>>>> Hi,
>>
>>>> I''m grabbing xml feeds with net/http, and I''m
wondering if there is
>>>> anything else out there that''s faster.  Any
suggestions?
>>
>>>    Someone have suggested me to use Hpricot.
>>
>>>    HTH,
>>> --
>>> Davi Vidal
>>> --
>>> E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org
>>> MSN   : davivi...-uAjRD0nVeow@public.gmane.org
>>> GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
>>> Skype : davi vidal
>>> ICQ   : 138815296
>
>
> >
>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

charlie caroff

2007-Aug-15 22:33 UTC

head link

Re: net/http vs . . . curl? anything else? what''s fastest

Thanks again.  Your suggestions are a little beyond me, so it''s going
to take me some time to figure them out.

I tried curb vs. net/http, and the results are almost identical.
>From what I can tell, net/http uses http 1.1 by default.  I triedlooping through and grabbing three different xml documents, with both
curb and net/http, but the second and third tries were just as slow as
the first.  It would be nice if there were some "built-in" way to
reuse connections with curb or net/http, and I''m going to investigate
that.

Also, I would like to point out new times for the various parsers.  I
was including the times it took to print to STDOUT in my parsing
times:

rexml: 1.6 sec
hpricot: .25 sec
xmlparser: .02 sec

Here are the config options I used to install curb, in case some other
newbie using FreeBSD 6.2, or some other config-needing OS, runs into
the installation problems I did.

1.  after the gem install curb fails, chdir to ./ext
2.  ruby extconf.rb --with-curl-lib=/usr/local/lib --with-curl-
include=/usr/local/include/

(or your path/to/lib or path/to/include)

Charlie

On Aug 15, 2:54 pm, Paul Hoehne
<phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org>
wrote:> This is just a suggestion:
>
> If you''re grabbing documents from the same place, you might want
to
> cobble your own server specific to sending xml documents.  (A one-
> trick pony that''s very good and fast at its trick).  On the
"client"
> side you have a local service that establishes a connection to your
> remote service.  A process on your client contacts the local server
> which uses a connection in a pool of connections between the local
> server and the remote server.  When the document is retrieved from
> the remote server, the connection is put back into the pool and the
> document is returned to the process that requested it from the local
> server.  Because the local server, proxy if you will, pools its
> connections you only pay for the ssl/tcp connection once.  However,
> this may require more work than you''re willing to do - depending
on
> the degree of performance you need.
>
> HTTP is also supposed to be able to re-use a connection, so just
> keeping your http connections around, or pooling them, might help as
> well.
>
> Local Process ----Get doc---> Local server ---Get doc---> Remote
server.
>
> On Aug 15, 2007, at 5:39 PM, charlie caroff wrote:
>
>
>
> > There are some good ideas here.  Thanks.  Here''s what I know
so far:
>
> > -- xmlparser (~.2 sec)  is many many times faster than rexml (~1.9
> > sec) or hpricot (1.3 sec), at least the way I''m kludging it
so far
> > -- now that i''ve decided on xmlparser, for now, the biggest
time lag
> > is getting the content over the ''net via net/http (~1.0 - 1.8
sec) --
> > that is from the beginning of the request, to the time the content is
> > completely retrieved
> > -- i don''t know where the time lag is coming from
> > -- it would be terrific to reuse a connection that is grabbing many
> > feeds from the same source, any hints on that?
> > -- i don''t know whether throughput is an issue, and I
don''t know how
> > to break that down with net/http
>
> > Is there some kind of tutorial or guide you know about where I can
> > learn how to extend, say, curl to work with ruby, so I can grab
> > content with curl?
>
> > Charlie
>
> > On Aug 15, 2:09 pm, Paul Hoehne
<phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org> wrote:
> >> When you say faster, what do you mean?
>
> >> Is it a throughput issue (i.e. # of docs/sec)?
> >> Is it a latency issue (i.e. from the start of retrieving the doc
to
> >> the time the doc gets  back is too long)?
> >> Is is an xml parsing issue (i.e. parsing that many documents is
slow
> >> and loading the server)?
>
> >> One way to fix the first kind of problem might be spawn a process
to
> >> down-load each document instead of downloading the documents in
> >> series.  But that''s not really a Net/HTTP issue.  Ruby is
a nicely
> >> extensible language that you can plug in C modules that may be
faster
> >> than the equivalent ruby code.  Sometimes you can find modules
that
> >> rely on C code instead of ruby code to perform a given task, and
they
> >> may be faster.
>
> >> One problem with TCP performance, especially with SSL, may be
paying
> >> for the connection handshake.  If you retrieve all your docs from
the
> >> same address(es), you might want to engineer a solution that
avoids
> >> having to setup and tear-down connections for each doc.
>
> >> On Aug 15, 2007, at 4:51 PM, Davi wrote:
>
> >>> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu:
> >>>> Hi,
>
> >>>> I''m grabbing xml feeds with net/http, and
I''m wondering if there is
> >>>> anything else out there that''s faster.  Any
suggestions?
>
> >>>    Someone have suggested me to use Hpricot.
>
> >>>    HTH,
> >>> --
> >>> Davi Vidal
> >>> --
> >>> E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org
> >>> MSN   : davivi...-uAjRD0nVeow@public.gmane.org
> >>> GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
> >>> Skype : davi vidal
> >>> ICQ   : 138815296

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Mohit Sindhwani

2007-Aug-16 03:27 UTC

head link

Re: net/http vs . . . curl? anything else? what''s fastest

charlie caroff wrote:> Hi,
>
> I''m grabbing xml feeds with net/http, and I''m wondering
if there is
> anything else out there that''s faster.  Any suggestions?
>
> Charlie
>
>   
Hi Charlie

It appears that you are doing quite a few things similar to me!  In my 
case, I''m using curl to grab XML files over HTTP.  I''m not
sure which is
faster but in my case, I have to get a few megabytes of data every 5 
minutes, so speed is not that critical.  For what it''s worth, Ruby + 
curl has served me well.

Cheers,
Mohit.
8/16/2007 | 11:27 AM.



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Rails - Aug 2007 - net/http vs . . . curl? anything else? what's fastest

net/http vs . . . curl? anything else? what''s fastest

Re: net/http vs . . . curl? anything else? what''s fastest

Re: net/http vs . . . curl? anything else? what''s fastest

Re: net/http vs . . . curl? anything else? what''s fastest

Re: net/http vs . . . curl? anything else? what''s fastest

Re: net/http vs . . . curl? anything else? what''s fastest

Re: net/http vs . . . curl? anything else? what''s fastest

Re: net/http vs . . . curl? anything else? what''s fastest

Re: net/http vs . . . curl? anything else? what''s fastest