I''m working on a new rails project that will have a very, very large database. Initial insert will be in the hundreds of thousands of records. The database we are getting the information from is Microsoft SQL based, and we have access to tab delimited update files daily. We''re trying to write import scripts and use a well designed database on our end. The data they are giving us is horribly designed. There are dozens of fields in the database tables that hold multiple data, there are several tables that are 90% similar and could be all consolidated into one table, etc. I just have a couple questions about normalization and the way we''re designing the database. First, as far as server resources go... would normalizing the following cause higher or lower server loads: The fields we are getting have multiple records inserted into each field. For instance, a single field for an option might have "A, E, F, X, Z" inserted which means that 5 different options are configured for that selection. We want to have a lookup table in addition to a table which explains what each option is, but I''m afriad it will add a lot of computing time to the queries. Second, the system is going to have a simple CMS, very similar to what Shopify uses. We''ll allow the creation of both pages and blogs. It appears that Shopify is using one database table for the pages [b]and[/b] blogs and it''s classifying a blog as a certain type of page, one which can have it''s own entries (other pages). So all 3, blogs, pages and blog posts, are all in the "pages" table. Is this the way to go? It seems simple and elegant to me, but my programmer thinks it''s confusing and that at the very least blog entries should be separated into their own table. I''m not sure what the best way to go is, so I''m open for advice. -- Posted via http://www.ruby-forum.com/.
On 6/12/06, Brandon Eley <beley@mac.com> wrote:> First, as far as server resources go... would normalizing the following > cause higher or lower server loads:The job of an RDBMS is to join tables based upon queries. If you intelligently craft your tables and indexes, you''ll be light years ahead of pulling back raw data. :: shrug :: I''m having a rough time defining what it would look like from the abstract description; but my rule of thumb is at least third normal form. I like to stick to 3.5, personally. Lookup tables cause very little performance hit in a schema; remember RDBMS packages are optimized for this sort of thing; it''s what they do. :)> Second, the system is going to have a simple CMS, very similar to what > Shopify uses. We''ll allow the creation of both pages and blogs. It > appears that Shopify is using one database table for the pages > [b]and[/b] blogs and it''s classifying a blog as a certain type of page, > one which can have it''s own entries (other pages). So all 3, blogs, > pages and blog posts, are all in the "pages" table.I would say this depends on how tightly integrated blog pages are to standard pages. :: shrug :: I haven''t studied shopify much other than to drool at the pretty front pages and a couple of others, so I can''t really comment on that. But again, I prefer to separate things out as much as possible unless they are logically related. You don''t want logic to have to parse and separate data that should have been in two separate tables. -Curtis
>>>>> "Brandon" == Brandon Eley <beley@mac.com> writes:> I''m working on a new rails project that will have a very, very large > database. Initial insert will be in the hundreds of thousands of > records.Generally speaking, that''s a pretty small database. You''re not really going to seriously exercise a good database engine until you reach at least several million new entries per day. So don''t worry about that part. Your database will deal just fine.> First, as far as server resources go... would normalizing the following > cause higher or lower server loads:Again speaking generally, normalizing your data will make it easier for your database to do the job it was designed for. That is, it will almost certainly improve your performance. At worst, you''ll have to spend some time pondering exactly how to set up your indexes to best match your queries. -- Calle Dybedahl <calle@cyberpomo.com> http://www.livejournal.com/users/cdybedahl/ "Last week was a nightmare, never to be repeated - until this week" -- Tom, a.s.r