Hello, I''m grabbing a few xml documents, representing 100 - 300 database rows in total, parsing them, and inserting them into my mysql database. I''m using rexml/document, parsing each entire xml document (some are 100k), and iterating through them, line by line, and putting them into my database, line by line. It is painfully slow. What is the best strategy for grabbing xml, parsing it, and getting it into my database? Charlie --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Use a faster parser? hpricot comes to mind, part of it is written in C. (It styles itself an HTML parser, but it should be able to do what you want). Vish On 8/13/07, charlie caroff <charlie.caroff-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Hello, > > I''m grabbing a few xml documents, representing 100 - 300 database > rows in total, parsing them, and inserting them into my mysql > database. I''m using rexml/document, parsing each entire xml document > (some are 100k), and iterating through them, line by line, and putting > them into my database, line by line. > > It is painfully slow. > > What is the best strategy for grabbing xml, parsing it, and getting it > into my database? > > Charlie > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
charlie caroff wrote:> Hello, > > I''m grabbing a few xml documents, representing 100 - 300 database > rows in total, parsing them, and inserting them into my mysql > database. I''m using rexml/document, parsing each entire xml document > (some are 100k), and iterating through them, line by line, and putting > them into my database, line by line. > > It is painfully slow. > > What is the best strategy for grabbing xml, parsing it, and getting it > into my database? > > Charlie > >How slow? In my opinion (I''ve used XML in Ruby rather than in Rails), you should do one of the following: 1. Switch to C for the parsing. You can get it to dump out something like a CSV (and then don''t use the built-in CSV parser - use File and split to parse that) 2. Do not load the whole document into memory. Use the stream parser mode instead. I have a 5MB XML file with about 50,000 records and it takes about 6 minutes to load on my laptop before I can do anything with it. Also, my Ruby process'' memory usage increases to about 450MB. In stream parsing mode, it is much faster. This may or may not be fast enough for you, so try this (it''s easy to do) and then switch to #1 if it doesn''t help enough. Hope this helps. Cheers, Mohit. 8/13/2007 | 9:13 AM. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Thanks. What about getting the data into my database? Is it best to do it line by line as the stream comes in? Charlie On Aug 12, 6:14 pm, Mohit Sindhwani <mo_m...-RxrYI66vbj0AvxtiuMwx3w@public.gmane.org> wrote:> charlie caroff wrote: > > Hello, > > > I''m grabbing a few xml documents, representing 100 - 300 database > > rows in total, parsing them, and inserting them into my mysql > > database. I''m using rexml/document, parsing each entire xml document > > (some are 100k), and iterating through them, line by line, and putting > > them into my database, line by line. > > > It is painfully slow. > > > What is the best strategy for grabbing xml, parsing it, and getting it > > into my database? > > > Charlie > > How slow? In my opinion (I''ve used XML in Ruby rather than in Rails), > you should do one of the following: > 1. Switch to C for the parsing. You can get it to dump out something > like a CSV (and then don''t use the built-in CSV parser - use File and > split to parse that) > 2. Do not load the whole document into memory. Use the stream parser > mode instead. I have a 5MB XML file with about 50,000 records and it > takes about 6 minutes to load on my laptop before I can do anything with > it. Also, my Ruby process'' memory usage increases to about 450MB. In > stream parsing mode, it is much faster. This may or may not be fast > enough for you, so try this (it''s easy to do) and then switch to #1 if > it doesn''t help enough. > > Hope this helps. > > Cheers, > Mohit. > 8/13/2007 | 9:13 AM.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
charlie caroff wrote:> Thanks. What about getting the data into my database? Is it best to > do it line by line as the stream comes in? > > Charlie >Hi Charlie, I''m not doing what you are working on. Is this a 1-time task, something that happens every now and then, or something that is a frequent occurrence? Is it done in the backend or is it done by the user uploading an XML file? Basically, if you''re doing it line by line, it''s simple - read a line, break up the data into individual parameters, create an ActiveRecord object with those parameters, save it to the database. Repeat! I''m not sure what kind of performance you will get - depending on how frequently you need to do this (and what the response time needs to be), this may meet your needs. If not, you could use a number of ways to handle it, including using a module in some other language, if required. Cheers, Mohit. 8/13/2007 | 11:29 PM. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Using a single insert is obviously faster. AR doesn''t have any inbuilt support for this so you might have to write custom SQL in the model. http://www.desilva.biz/mysql/insert.html Vish On 8/13/07, charlie caroff <charlie.caroff-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Thanks. What about getting the data into my database? Is it best to > do it line by line as the stream comes in? > > Charlie > > On Aug 12, 6:14 pm, Mohit Sindhwani <mo_m...-RxrYI66vbj0AvxtiuMwx3w@public.gmane.org> wrote: > > charlie caroff wrote: > > > Hello, > > > > > I''m grabbing a few xml documents, representing 100 - 300 database > > > rows in total, parsing them, and inserting them into my mysql > > > database. I''m using rexml/document, parsing each entire xml document > > > (some are 100k), and iterating through them, line by line, and putting > > > them into my database, line by line. > > > > > It is painfully slow. > > > > > What is the best strategy for grabbing xml, parsing it, and getting it > > > into my database? > > > > > Charlie > > > > How slow? In my opinion (I''ve used XML in Ruby rather than in Rails), > > you should do one of the following: > > 1. Switch to C for the parsing. You can get it to dump out something > > like a CSV (and then don''t use the built-in CSV parser - use File and > > split to parse that) > > 2. Do not load the whole document into memory. Use the stream parser > > mode instead. I have a 5MB XML file with about 50,000 records and it > > takes about 6 minutes to load on my laptop before I can do anything with > > it. Also, my Ruby process'' memory usage increases to about 450MB. In > > stream parsing mode, it is much faster. This may or may not be fast > > enough for you, so try this (it''s easy to do) and then switch to #1 if > > it doesn''t help enough. > > > > Hope this helps. > > > > Cheers, > > Mohit. > > 8/13/2007 | 9:13 AM. > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---