i.v.r.
2005-Nov-01 23:15 UTC
Question about database design and normalization - Help with Rails app implementation
Hi, After learning what I needed from Rails, I''ve decided to re-implement one of my sites (the main one) using Rails, instead of PHP. It''s kind of funny (how fate works), because when I was designing the new architecture of the site I was creating something very, very similar to Rails, although at the time I didn''t knew anything about the MVC philosophy, so, upon finding Rails I felt quite comfortable with it. Anyway, so here I am, designing the new database structure, and I don''t know what would be the best way of doing it, what level of normalization would be best. In the original version, the main table has about 230,000 rows, and I have a separate table for storing addresses of those rows. Now I''m going to put some of the address data on the same table (like zip/postal code), and other data in separate tables. I''m thinking, performance-wise, if it would be ok to have separate tables for states, counties, colonies, and even streets. The database will be hit about 20 times every minute (potentially 5 times more), each time retrieving data from *all* those tables. I wonder what do you think of this (I''m no db expert). Usually, the app will be doing searches on all those tables, but not always using all of them. For example, I may want to retrieve all rows in the main table that belong to a specific row in the counties table, so I don''t really care about the street or colony, but I''ll have to include those in the query anyway, since the main table will only point to the streets table (having it point to all tables would be unnecessary me thinks). You know, upon writing this, I realize I''m on the right track, because when I do searches using more than one table, i.e. a street on a specific colony, I can narrow the street search to only the ones that belong to the specified colony, instead of searching all the rows on a table that has all the address fields. However, searches that deep are not a common part of the website. Usually searches are only made on the state/county level, so I wonder if I''m indeed on the right track. Also, currently the PHP application is doing quite well in a shared hosting environment. I wonder if that will be the same case with Rails or if a VPS is in order. Any thoughts are appreciated. Regards, Ivan V.
Tom Mornini
2005-Nov-03 04:22 UTC
Re: Question about database design and normalization - Help with Rails app implementation
On Nov 1, 2005, at 6:15 PM, i.v.r. wrote:> Anyway, so here I am, designing the new database structure, and I > don''t know what would be the best way of doing it, what level of > normalization would be best. In the original version, the main > table has about 230,000 rows, and I have a separate table for > storing addresses of those rows. Now I''m going to put some of the > address data on the same table (like zip/postal code), and other > data in separate tables.It should only be in a separate table if there are more than one address per main record.> I''m thinking, performance-wise, if it would be ok to have separate > tables for states, counties, colonies, and even streets. > > The database will be hit about 20 times every minute (potentially 5 > times more), each time retrieving data from *all* those tables.20 times a minute? That is absolutely dead asleep. 20 times a second would probably be fine with correct indexes in place on reasonably current hardware.> I wonder what do you think of this (I''m no db expert). Usually, the > app will be doing searches on all those tables, but not always > using all of them. For example, I may want to retrieve all rows in > the main table that belong to a specific row in the counties table, > so I don''t really care about the street or colony, but I''ll have to > include those in the query anyway, since the main table will only > point to the streets table (having it point to all tables would be > unnecessary me thinks).Just make sure the foreign key columns are indexed, and you''ll be fine. 230,000 rows will only be a problem if you write SELECT statements against columns that aren''t indexed. And, at 20 times a minute, even that is likely not a problem. :-) -- -- Tom Mornini