charlie caroff
2007-Aug-15 20:24 UTC
net/http vs . . . curl? anything else? what''s fastest
Hi, I''m grabbing xml feeds with net/http, and I''m wondering if there is anything else out there that''s faster. Any suggestions? Charlie --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu:> Hi, > > I''m grabbing xml feeds with net/http, and I''m wondering if there is > anything else out there that''s faster. Any suggestions? >Someone have suggested me to use Hpricot. HTH, -- Davi Vidal -- E-mail: davividal-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org MSN : davividal-uAjRD0nVeow@public.gmane.org GTalk : davividal-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Skype : davi vidal ICQ : 138815296
charlie caroff
2007-Aug-15 21:08 UTC
Re: net/http vs . . . curl? anything else? what''s fastest
I think hpricot uses open-uri to grab xml. I believe that open-uri is a wrapper around net/http, so I don''t think it will be faster than net/ http. I''m looking for the grabbing part. I wonder if there is anything out there faster than net/http. Charlie On Aug 15, 1:51 pm, Davi <davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org> wrote:> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu: > > > Hi, > > > I''m grabbing xml feeds with net/http, and I''m wondering if there is > > anything else out there that''s faster. Any suggestions? > > Someone have suggested me to use Hpricot. > > HTH, > -- > Davi Vidal > -- > E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org > MSN : davivi...-uAjRD0nVeow@public.gmane.org > GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org > Skype : davi vidal > ICQ : 138815296 > > application_pgp-signature_part > 1KDownload--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Paul Hoehne
2007-Aug-15 21:09 UTC
Re: net/http vs . . . curl? anything else? what''s fastest
When you say faster, what do you mean? Is it a throughput issue (i.e. # of docs/sec)? Is it a latency issue (i.e. from the start of retrieving the doc to the time the doc gets back is too long)? Is is an xml parsing issue (i.e. parsing that many documents is slow and loading the server)? One way to fix the first kind of problem might be spawn a process to down-load each document instead of downloading the documents in series. But that''s not really a Net/HTTP issue. Ruby is a nicely extensible language that you can plug in C modules that may be faster than the equivalent ruby code. Sometimes you can find modules that rely on C code instead of ruby code to perform a given task, and they may be faster. One problem with TCP performance, especially with SSL, may be paying for the connection handshake. If you retrieve all your docs from the same address(es), you might want to engineer a solution that avoids having to setup and tear-down connections for each doc. On Aug 15, 2007, at 4:51 PM, Davi wrote:> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu: >> Hi, >> >> I''m grabbing xml feeds with net/http, and I''m wondering if there is >> anything else out there that''s faster. Any suggestions? >> > > Someone have suggested me to use Hpricot. > > HTH, > -- > Davi Vidal > -- > E-mail: davividal-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org > MSN : davividal-uAjRD0nVeow@public.gmane.org > GTalk : davividal-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org > Skype : davi vidal > ICQ : 138815296--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
charlie caroff
2007-Aug-15 21:39 UTC
Re: net/http vs . . . curl? anything else? what''s fastest
There are some good ideas here. Thanks. Here''s what I know so far: -- xmlparser (~.2 sec) is many many times faster than rexml (~1.9 sec) or hpricot (1.3 sec), at least the way I''m kludging it so far -- now that i''ve decided on xmlparser, for now, the biggest time lag is getting the content over the ''net via net/http (~1.0 - 1.8 sec) -- that is from the beginning of the request, to the time the content is completely retrieved -- i don''t know where the time lag is coming from -- it would be terrific to reuse a connection that is grabbing many feeds from the same source, any hints on that? -- i don''t know whether throughput is an issue, and I don''t know how to break that down with net/http Is there some kind of tutorial or guide you know about where I can learn how to extend, say, curl to work with ruby, so I can grab content with curl? Charlie On Aug 15, 2:09 pm, Paul Hoehne <phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org> wrote:> When you say faster, what do you mean? > > Is it a throughput issue (i.e. # of docs/sec)? > Is it a latency issue (i.e. from the start of retrieving the doc to > the time the doc gets back is too long)? > Is is an xml parsing issue (i.e. parsing that many documents is slow > and loading the server)? > > One way to fix the first kind of problem might be spawn a process to > down-load each document instead of downloading the documents in > series. But that''s not really a Net/HTTP issue. Ruby is a nicely > extensible language that you can plug in C modules that may be faster > than the equivalent ruby code. Sometimes you can find modules that > rely on C code instead of ruby code to perform a given task, and they > may be faster. > > One problem with TCP performance, especially with SSL, may be paying > for the connection handshake. If you retrieve all your docs from the > same address(es), you might want to engineer a solution that avoids > having to setup and tear-down connections for each doc. > > On Aug 15, 2007, at 4:51 PM, Davi wrote: > > > Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu: > >> Hi, > > >> I''m grabbing xml feeds with net/http, and I''m wondering if there is > >> anything else out there that''s faster. Any suggestions? > > > Someone have suggested me to use Hpricot. > > > HTH, > > -- > > Davi Vidal > > -- > > E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org > > MSN : davivi...-uAjRD0nVeow@public.gmane.org > > GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org > > Skype : davi vidal > > ICQ : 138815296--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Paul Hoehne
2007-Aug-15 21:43 UTC
Re: net/http vs . . . curl? anything else? what''s fastest
http://curb.rubyforge.org/ On Aug 15, 2007, at 5:39 PM, charlie caroff wrote:> > There are some good ideas here. Thanks. Here''s what I know so far: > > -- xmlparser (~.2 sec) is many many times faster than rexml (~1.9 > sec) or hpricot (1.3 sec), at least the way I''m kludging it so far > -- now that i''ve decided on xmlparser, for now, the biggest time lag > is getting the content over the ''net via net/http (~1.0 - 1.8 sec) -- > that is from the beginning of the request, to the time the content is > completely retrieved > -- i don''t know where the time lag is coming from > -- it would be terrific to reuse a connection that is grabbing many > feeds from the same source, any hints on that? > -- i don''t know whether throughput is an issue, and I don''t know how > to break that down with net/http > > Is there some kind of tutorial or guide you know about where I can > learn how to extend, say, curl to work with ruby, so I can grab > content with curl? > > Charlie > > > On Aug 15, 2:09 pm, Paul Hoehne <phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org> wrote: >> When you say faster, what do you mean? >> >> Is it a throughput issue (i.e. # of docs/sec)? >> Is it a latency issue (i.e. from the start of retrieving the doc to >> the time the doc gets back is too long)? >> Is is an xml parsing issue (i.e. parsing that many documents is slow >> and loading the server)? >> >> One way to fix the first kind of problem might be spawn a process to >> down-load each document instead of downloading the documents in >> series. But that''s not really a Net/HTTP issue. Ruby is a nicely >> extensible language that you can plug in C modules that may be faster >> than the equivalent ruby code. Sometimes you can find modules that >> rely on C code instead of ruby code to perform a given task, and they >> may be faster. >> >> One problem with TCP performance, especially with SSL, may be paying >> for the connection handshake. If you retrieve all your docs from the >> same address(es), you might want to engineer a solution that avoids >> having to setup and tear-down connections for each doc. >> >> On Aug 15, 2007, at 4:51 PM, Davi wrote: >> >>> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu: >>>> Hi, >> >>>> I''m grabbing xml feeds with net/http, and I''m wondering if there is >>>> anything else out there that''s faster. Any suggestions? >> >>> Someone have suggested me to use Hpricot. >> >>> HTH, >>> -- >>> Davi Vidal >>> -- >>> E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org >>> MSN : davivi...-uAjRD0nVeow@public.gmane.org >>> GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org >>> Skype : davi vidal >>> ICQ : 138815296 > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Paul Hoehne
2007-Aug-15 21:54 UTC
Re: net/http vs . . . curl? anything else? what''s fastest
This is just a suggestion: If you''re grabbing documents from the same place, you might want to cobble your own server specific to sending xml documents. (A one- trick pony that''s very good and fast at its trick). On the "client" side you have a local service that establishes a connection to your remote service. A process on your client contacts the local server which uses a connection in a pool of connections between the local server and the remote server. When the document is retrieved from the remote server, the connection is put back into the pool and the document is returned to the process that requested it from the local server. Because the local server, proxy if you will, pools its connections you only pay for the ssl/tcp connection once. However, this may require more work than you''re willing to do - depending on the degree of performance you need. HTTP is also supposed to be able to re-use a connection, so just keeping your http connections around, or pooling them, might help as well. Local Process ----Get doc---> Local server ---Get doc---> Remote server. On Aug 15, 2007, at 5:39 PM, charlie caroff wrote:> > There are some good ideas here. Thanks. Here''s what I know so far: > > -- xmlparser (~.2 sec) is many many times faster than rexml (~1.9 > sec) or hpricot (1.3 sec), at least the way I''m kludging it so far > -- now that i''ve decided on xmlparser, for now, the biggest time lag > is getting the content over the ''net via net/http (~1.0 - 1.8 sec) -- > that is from the beginning of the request, to the time the content is > completely retrieved > -- i don''t know where the time lag is coming from > -- it would be terrific to reuse a connection that is grabbing many > feeds from the same source, any hints on that? > -- i don''t know whether throughput is an issue, and I don''t know how > to break that down with net/http > > Is there some kind of tutorial or guide you know about where I can > learn how to extend, say, curl to work with ruby, so I can grab > content with curl? > > Charlie > > > On Aug 15, 2:09 pm, Paul Hoehne <phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org> wrote: >> When you say faster, what do you mean? >> >> Is it a throughput issue (i.e. # of docs/sec)? >> Is it a latency issue (i.e. from the start of retrieving the doc to >> the time the doc gets back is too long)? >> Is is an xml parsing issue (i.e. parsing that many documents is slow >> and loading the server)? >> >> One way to fix the first kind of problem might be spawn a process to >> down-load each document instead of downloading the documents in >> series. But that''s not really a Net/HTTP issue. Ruby is a nicely >> extensible language that you can plug in C modules that may be faster >> than the equivalent ruby code. Sometimes you can find modules that >> rely on C code instead of ruby code to perform a given task, and they >> may be faster. >> >> One problem with TCP performance, especially with SSL, may be paying >> for the connection handshake. If you retrieve all your docs from the >> same address(es), you might want to engineer a solution that avoids >> having to setup and tear-down connections for each doc. >> >> On Aug 15, 2007, at 4:51 PM, Davi wrote: >> >>> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu: >>>> Hi, >> >>>> I''m grabbing xml feeds with net/http, and I''m wondering if there is >>>> anything else out there that''s faster. Any suggestions? >> >>> Someone have suggested me to use Hpricot. >> >>> HTH, >>> -- >>> Davi Vidal >>> -- >>> E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org >>> MSN : davivi...-uAjRD0nVeow@public.gmane.org >>> GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org >>> Skype : davi vidal >>> ICQ : 138815296 > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
charlie caroff
2007-Aug-15 22:33 UTC
Re: net/http vs . . . curl? anything else? what''s fastest
Thanks again. Your suggestions are a little beyond me, so it''s going to take me some time to figure them out. I tried curb vs. net/http, and the results are almost identical.>From what I can tell, net/http uses http 1.1 by default. I triedlooping through and grabbing three different xml documents, with both curb and net/http, but the second and third tries were just as slow as the first. It would be nice if there were some "built-in" way to reuse connections with curb or net/http, and I''m going to investigate that. Also, I would like to point out new times for the various parsers. I was including the times it took to print to STDOUT in my parsing times: rexml: 1.6 sec hpricot: .25 sec xmlparser: .02 sec Here are the config options I used to install curb, in case some other newbie using FreeBSD 6.2, or some other config-needing OS, runs into the installation problems I did. 1. after the gem install curb fails, chdir to ./ext 2. ruby extconf.rb --with-curl-lib=/usr/local/lib --with-curl- include=/usr/local/include/ (or your path/to/lib or path/to/include) Charlie On Aug 15, 2:54 pm, Paul Hoehne <phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org> wrote:> This is just a suggestion: > > If you''re grabbing documents from the same place, you might want to > cobble your own server specific to sending xml documents. (A one- > trick pony that''s very good and fast at its trick). On the "client" > side you have a local service that establishes a connection to your > remote service. A process on your client contacts the local server > which uses a connection in a pool of connections between the local > server and the remote server. When the document is retrieved from > the remote server, the connection is put back into the pool and the > document is returned to the process that requested it from the local > server. Because the local server, proxy if you will, pools its > connections you only pay for the ssl/tcp connection once. However, > this may require more work than you''re willing to do - depending on > the degree of performance you need. > > HTTP is also supposed to be able to re-use a connection, so just > keeping your http connections around, or pooling them, might help as > well. > > Local Process ----Get doc---> Local server ---Get doc---> Remote server. > > On Aug 15, 2007, at 5:39 PM, charlie caroff wrote: > > > > > There are some good ideas here. Thanks. Here''s what I know so far: > > > -- xmlparser (~.2 sec) is many many times faster than rexml (~1.9 > > sec) or hpricot (1.3 sec), at least the way I''m kludging it so far > > -- now that i''ve decided on xmlparser, for now, the biggest time lag > > is getting the content over the ''net via net/http (~1.0 - 1.8 sec) -- > > that is from the beginning of the request, to the time the content is > > completely retrieved > > -- i don''t know where the time lag is coming from > > -- it would be terrific to reuse a connection that is grabbing many > > feeds from the same source, any hints on that? > > -- i don''t know whether throughput is an issue, and I don''t know how > > to break that down with net/http > > > Is there some kind of tutorial or guide you know about where I can > > learn how to extend, say, curl to work with ruby, so I can grab > > content with curl? > > > Charlie > > > On Aug 15, 2:09 pm, Paul Hoehne <phoe...-CEHHuaDLTIwbPXlZ+wYYRg@public.gmane.org> wrote: > >> When you say faster, what do you mean? > > >> Is it a throughput issue (i.e. # of docs/sec)? > >> Is it a latency issue (i.e. from the start of retrieving the doc to > >> the time the doc gets back is too long)? > >> Is is an xml parsing issue (i.e. parsing that many documents is slow > >> and loading the server)? > > >> One way to fix the first kind of problem might be spawn a process to > >> down-load each document instead of downloading the documents in > >> series. But that''s not really a Net/HTTP issue. Ruby is a nicely > >> extensible language that you can plug in C modules that may be faster > >> than the equivalent ruby code. Sometimes you can find modules that > >> rely on C code instead of ruby code to perform a given task, and they > >> may be faster. > > >> One problem with TCP performance, especially with SSL, may be paying > >> for the connection handshake. If you retrieve all your docs from the > >> same address(es), you might want to engineer a solution that avoids > >> having to setup and tear-down connections for each doc. > > >> On Aug 15, 2007, at 4:51 PM, Davi wrote: > > >>> Em Quarta 15 Agosto 2007 17:24, charlie caroff escreveu: > >>>> Hi, > > >>>> I''m grabbing xml feeds with net/http, and I''m wondering if there is > >>>> anything else out there that''s faster. Any suggestions? > > >>> Someone have suggested me to use Hpricot. > > >>> HTH, > >>> -- > >>> Davi Vidal > >>> -- > >>> E-mail: davivi...-UiHwsRqXctc1RhZgQKG/ig@public.gmane.org > >>> MSN : davivi...-uAjRD0nVeow@public.gmane.org > >>> GTalk : davivi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org > >>> Skype : davi vidal > >>> ICQ : 138815296--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Mohit Sindhwani
2007-Aug-16 03:27 UTC
Re: net/http vs . . . curl? anything else? what''s fastest
charlie caroff wrote:> Hi, > > I''m grabbing xml feeds with net/http, and I''m wondering if there is > anything else out there that''s faster. Any suggestions? > > Charlie > >Hi Charlie It appears that you are doing quite a few things similar to me! In my case, I''m using curl to grab XML files over HTTP. I''m not sure which is faster but in my case, I have to get a few megabytes of data every 5 minutes, so speed is not that critical. For what it''s worth, Ruby + curl has served me well. Cheers, Mohit. 8/16/2007 | 11:27 AM. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---