David Kahn
2011-Mar-06 14:36 UTC
Import via AR of large csv file/# of recoeds (100''s of thousands)
Hi all, So I have a project where I need to regularly import some very large files -- 200-500mb with 100-1m records. All files are csv''s. Currently I have been using CSV.parse (Rails 3 version of FasterCSV). I have things working but the process takes a long time as right now I import record by record. I am also validating and normalizing data (i.e. account name comes in on csv and I look up the account name in the AR model and normalize the field to the account id in the same process in which I am importing the data). Would like any suggestions of how to make this process faster and more solid. My thoughts: 1) Seperate the validation steps from import step. First import all data, then after all is imported and I have verified that the # of rows in my model match the # in the file then proceed. This will modularize the process more but also if validation fails not make me need to reload all the data into the db to re-validate once corrections have been made elsewhere in the system. 2) Consider using tools to wholesale import csv data into my db (Postgres): a) I see a project out there called ActiveRecord-Import ( https://github.com/zdennis/activerecord-import) b) I have found the COPY_FROM command for AR ( http://www.misuse.org/science/2007/02/16/importing-to-postgresql-from-activerecord-amp-rails/ ) Just want to see if anyone has dealt with such masses of data and have any other recommendations. This project does not need to be db agnostic really. Running Rails 3, Ruby 1.9.2, deployed on Ubuntu server 10.0.4, postgresql... Best, David -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Xavier Noria
2011-Mar-06 18:08 UTC
Re: Import via AR of large csv file/# of recoeds (100''s of thousands)
The fastest way to import data into a database from a file is the builtin commands to so. Like COPY FROM/TO. We are talking orders of magnitude. Is that possible in your case, perhaps after some CSV normalization? -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
David Kahn
2011-Mar-06 18:24 UTC
Re: Import via AR of large csv file/# of recoeds (100''s of thousands)
On Sun, Mar 6, 2011 at 12:08 PM, Xavier Noria <fxn-xlncskNFVEJBDgjK7y7TUQ@public.gmane.org> wrote:> The fastest way to import data into a database from a file is the > builtin commands to so. Like COPY FROM/TO. We are talking orders of > magnitude. Is that possible in your case, perhaps after some CSV > normalization? >Yeah, thanks. Actually this is the route I am investigating right now, but doing a gross import of the csv, then doing the normalization in Ruby. I am working on getting it working by calling the command via ActiveRecord --- ActiveRecord::Base.connection.execute(sql).> > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > For more options, visit this group at > http://groups.google.com/group/rubyonrails-talk?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.