thr3ads.net - Rails - Importing Database into Rails App database? [Nov 2006]

If this information is useful, please help other people find it:
Share via:

eDave

2006-Nov-29 02:43 UTC

Importing Database into Rails App database?

I''m working on an app that uses info from a government database. For a
variety of reasons (not the least of which is that their database is in
MS Access), I can''t simply write a wrapper around the legacy database.
Instead, we''re looking to have an "import" script in the
Rails
environment (it''s fine if it''s run from the server and through
the web)
that will open up a connection to another MySQL schema and selectively
import data into the Rail app''s database.

What''s a good way to go about this? From the message board, it looks
like the best route is to make a script that instantiates an
ActiveRecord::Base connection, and then loads information out of the
government database (exported to MySQL) into the appropriate
ActiveRecord objects, and saves these objects to the new database. I''m
unclear as to how to set this up though. Will Rails magically load the
environment so I can just create a new Foo object and save it, or will
I need to require some files? Where is an appropriate place to put this
script? The lib folder, vendor folder, elsewhere?

Any other suggestions on how to go about this would be equally
appreciated. Thanks!


--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Andrew Stewart

2006-Dec-04 10:08 UTC

head link

Re: Importing Database into Rails App database?

Hello,

On 29 Nov 2006, at 02:43, eDave wrote:> I''m working on an app that uses info from a government database.
For a
> variety of reasons (not the least of which is that their database  
> is in
> MS Access), I can''t simply write a wrapper around the legacy
database.
> Instead, we''re looking to have an "import" script in the
Rails
> environment (it''s fine if it''s run from the server and
through the
> web)
> that will open up a connection to another MySQL schema and selectively
> import data into the Rail app''s database.
>
> What''s a good way to go about this? From the message board, it
looks
> like the best route is to make a script that instantiates an
> ActiveRecord::Base connection, and then loads information out of the
> government database (exported to MySQL) into the appropriate
> ActiveRecord objects, and saves these objects to the new database.
I''m
> unclear as to how to set this up though. Will Rails magically load the
> environment so I can just create a new Foo object and save it, or will
> I need to require some files? Where is an appropriate place to put  
> this
> script? The lib folder, vendor folder, elsewhere?
You have two main options and your situation will determine which is  
better for you.

In both options below, the first CSV step assumes you just do a  
simple ''export to CSV'' from Access rather than writing a
script to
pull out only the data you want.  If you do the latter instead you  
can skip the first CSV step.

Option 1
--------
Legacy DB (Access) -> CSV (legacy schema) -> CSV (new schema) ->  
MySQL -> validate with ActiveRecord[2]

You would write the script to convert between the first set of CSV  
files and the second set.  You would then use MySQL''s batch loading  
facility[1] to lever the data into the database.  Finally you would  
use a script like [2] to validate all the data via ActiveRecord.

Option 2
--------
Legacy DB (Access) -> CSV (legacy schema) -> ActiveRecord

You would write the script to read the legacy CSV files and create  
ActiveRecord objects.  The data are validated along the way.  In  
order for your script to work with the AR objects, include your / 
models directory on the load path.  E.g:

$ ruby -I lib:/path/to/your/app/models lib/data_migration.rb

Trade-Offs
----------
Option 1 is far more scalable as your data volume grows.  Option 2  
can be sped up with ActiveRecord::Extensions[3], but Option 1 is  
still faster.

Option 2 is DRY with respect to your new schemas.  In option 1 you  
specify your new schemas twice: once in the new CSV files and again  
in your ActiveRecord model objects.  In option 2 you only specify  
them in your model objects.

Option 1 allows/requires you to set primary and foreign keys  
explicitly, which enables you to preserve the original data''s keys.   
Option 2 allows AR to set the keys automatically.  This is simpler  
though requires an extra field if you want to preserve the original  
primary key of each record.

Option 2 catches invalid records much earlier.

Conclusion
----------
Option 1 is the only realistic option for ''large'' volumes of
data.
Option 2 is more elegant and easier to use, so I would prefer it  
where practical (''small'' volumes of data).

The boundary between small and large volumes of data depends on the  
speed of your CPU, your patience and the length of time you would  
spend writing your migration script: I have found in the past that  
option 2 loads data slower but that is outweighed by the time saved  
having fewer hoops to jump through.

Tip
---
Assuming you''re cleaning your data along the way, write unit tests  
for every cleaning method you write.  It''ll save you time and give  
you confidence.  And write some unit tests for your post-migration  
MySQL database: choose a few records before you start and assert  
they''re present and correct (via ActiveRecord) in your new database.

Good luck!
Andy Stewart

[1] http://blog.hasmanythrough.com/articles/2006/08/27/validate-all- 
your-records
[2] http://dev.mysql.com/doc/refman/4.1/en/loading-tables.html
[3] http://www.continuousthinking.com/are/activerecord-extensions-0-0-5

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Andrew Stewart

2006-Dec-04 10:21 UTC

head link

Re: Importing Database into Rails App database?

On 4 Dec 2006, at 10:08, Andrew Stewart wrote:
[snip]> You would write the script to read the legacy CSV files and create
> ActiveRecord objects.  The data are validated along the way.  In
> order for your script to work with the AR objects, include your /
> models directory on the load path.  E.g:
>
> $ ruby -I lib:/path/to/your/app/models lib/data_migration.rb[snip]

Forgot to mention that you also need in your script to connect to  
your database via ActiveRecord.  This does the trick:

   ActiveRecord::Base.establish_connection(
     :adapter  => settings[''adapter''],
     :host     => settings[''host''],
     :username => settings[''username''],
     :password => settings[''password''],
     :database => settings[''database''])

where settings are pulled out of my Rails app''s config/database.yml  
for the appropriate environment.

Regards,
Andy Stewart

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Anthony Eden

2006-Dec-04 11:41 UTC

head link

Re: Importing Database into Rails App database?

Andy,

Your email is a very good write up - you should publish it somewhere
perminantly so it can be referenced in the future.

I just wanted to add that I have the beginnings of a Ruby ETL tool in
the ActiveWarehouse Subversion repository. The description of the ETL
tool is here:

http://activewarehouse.rubyforge.org/etl/

The code in there is relatively new, but it may be of use.

V/r
Anthony Eden

On 12/4/06, Andrew Stewart
<boss-rAnrN1QAkGGttCpgsWEBFlaTQe2KTcn/@public.gmane.org>
wrote:>
> Hello,
>
> On 29 Nov 2006, at 02:43, eDave wrote:
> > I''m working on an app that uses info from a government
database. For a
> > variety of reasons (not the least of which is that their database
> > is in
> > MS Access), I can''t simply write a wrapper around the legacy
database.
> > Instead, we''re looking to have an "import" script
in the Rails
> > environment (it''s fine if it''s run from the server
and through the
> > web)
> > that will open up a connection to another MySQL schema and selectively
> > import data into the Rail app''s database.
> >
> > What''s a good way to go about this? From the message board,
it looks
> > like the best route is to make a script that instantiates an
> > ActiveRecord::Base connection, and then loads information out of the
> > government database (exported to MySQL) into the appropriate
> > ActiveRecord objects, and saves these objects to the new database.
I''m
> > unclear as to how to set this up though. Will Rails magically load the
> > environment so I can just create a new Foo object and save it, or will
> > I need to require some files? Where is an appropriate place to put
> > this
> > script? The lib folder, vendor folder, elsewhere?
>
> You have two main options and your situation will determine which is
> better for you.
>
> In both options below, the first CSV step assumes you just do a
> simple ''export to CSV'' from Access rather than writing a
script to
> pull out only the data you want.  If you do the latter instead you
> can skip the first CSV step.
>
> Option 1
> --------
> Legacy DB (Access) -> CSV (legacy schema) -> CSV (new schema) ->
> MySQL -> validate with ActiveRecord[2]
>
> You would write the script to convert between the first set of CSV
> files and the second set.  You would then use MySQL''s batch
loading
> facility[1] to lever the data into the database.  Finally you would
> use a script like [2] to validate all the data via ActiveRecord.
>
> Option 2
> --------
> Legacy DB (Access) -> CSV (legacy schema) -> ActiveRecord
>
> You would write the script to read the legacy CSV files and create
> ActiveRecord objects.  The data are validated along the way.  In
> order for your script to work with the AR objects, include your /
> models directory on the load path.  E.g:
>
> $ ruby -I lib:/path/to/your/app/models lib/data_migration.rb
>
> Trade-Offs
> ----------
> Option 1 is far more scalable as your data volume grows.  Option 2
> can be sped up with ActiveRecord::Extensions[3], but Option 1 is
> still faster.
>
> Option 2 is DRY with respect to your new schemas.  In option 1 you
> specify your new schemas twice: once in the new CSV files and again
> in your ActiveRecord model objects.  In option 2 you only specify
> them in your model objects.
>
> Option 1 allows/requires you to set primary and foreign keys
> explicitly, which enables you to preserve the original data''s
keys.
> Option 2 allows AR to set the keys automatically.  This is simpler
> though requires an extra field if you want to preserve the original
> primary key of each record.
>
> Option 2 catches invalid records much earlier.
>
> Conclusion
> ----------
> Option 1 is the only realistic option for ''large'' volumes
of data.
> Option 2 is more elegant and easier to use, so I would prefer it
> where practical (''small'' volumes of data).
>
> The boundary between small and large volumes of data depends on the
> speed of your CPU, your patience and the length of time you would
> spend writing your migration script: I have found in the past that
> option 2 loads data slower but that is outweighed by the time saved
> having fewer hoops to jump through.
>
> Tip
> ---
> Assuming you''re cleaning your data along the way, write unit tests
> for every cleaning method you write.  It''ll save you time and give
> you confidence.  And write some unit tests for your post-migration
> MySQL database: choose a few records before you start and assert
> they''re present and correct (via ActiveRecord) in your new
database.
>
> Good luck!
> Andy Stewart
>
> [1] http://blog.hasmanythrough.com/articles/2006/08/27/validate-all-
> your-records
> [2] http://dev.mysql.com/doc/refman/4.1/en/loading-tables.html
> [3] http://www.continuousthinking.com/are/activerecord-extensions-0-0-5
>
> >
>

-- 
Cell: 808 782-5046
Current Location: Melbourne, FL

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Andrew Stewart

2006-Dec-05 08:46 UTC

head link

Re: Importing Database into Rails App database?

On 4 Dec 2006, at 11:41, Anthony Eden wrote:> Andy,
>
> Your email is a very good write up - you should publish it somewhere
> perminantly so it can be referenced in the future.
Thank you for your kind words.  I have followed your advice and  
written up my email here:

http://blog.airbladesoftware.com/2006/12/5/migrating-data-into-rails

> I just wanted to add that I have the beginnings of a Ruby ETL tool in
> the ActiveWarehouse Subversion repository. The description of the ETL
> tool is here:
>
> http://activewarehouse.rubyforge.org/etl/
>
> The code in there is relatively new, but it may be of use.
This does look useful.  I have included ETL in a ''further
reading''
section in my article.

Regards,
Andy Stewart


--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Possibly Parallel Threads

Search for more possibly parallel threads

Rails - Nov 2006 - Importing Database into Rails App database?

Importing Database into Rails App database?

Re: Importing Database into Rails App database?

Re: Importing Database into Rails App database?

Re: Importing Database into Rails App database?

Re: Importing Database into Rails App database?

Possibly Parallel Threads