On Jan 22, 2011, at 12:39 PM, Chris Armstrong wrote:
> Hi there,
>
> I need to connect to an url to download and process an XML document.
> Then run through the XML document and save elements in the database.
>
> There are many howto''s on the internet regarding parsing xml files
> with
> SAX opening a file on the filesystem and reading through it. But I
> could
> not find an example of how to read an URL while processing the xml.
>
> SAX will be useless if the content from the URL has to be downloaded
> completely before processing it. The RAM will still fill up.
Not sure if this will help, but if you use the open-uri gem, you can
open a file from a URL. I use it in a converter I''m working on right
at this moment:
require ''rubygems''
require ''nokogiri''
require ''open-uri''
#here I''m loading the xsd from W3 directly
xsd =
Nokogiri::XML::Schema(open(''http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd''))
...etc...
I''m not at all sure that this will save you on RAM, I''m
loading temp
files in another part of this script (from the filesystem) and ripping
through them with regular expressions one line at a time, but after
all that''s done, I open the partially-transformed file with Nokogiri
in one large bite and do all sorts of things to it. Some of these
files are 10 - 20MB of XML text. It''s currently working fine inside a
hard limit of 2GB of RAM. I wouldn''t be surprised if Nokogiri does
some very clever things to manage its memory footprint, because it
certainly works much more efficiently than the previous generation of
this system, which used XSLT and Saxon, and crapped out over 6MB of
input.
Walter
>
> Is there somebody that has a solution for this problem or maybe a
> sample
> snippet on how to deal with this in Ruby or Rails? I don''t care if
> it is
> libxml, rexml or something else as long less RAM will be used.
>
> Thanks for your help.
> Chris
>
> --
> Posted via http://www.ruby-forum.com/.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Ruby on Rails: Talk" group.
> To post to this group, send email to rubyonrails-
> talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> .
> For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
> .
>
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.