Ashley Moran
2006-Jun-30 23:26 UTC
[Rails] Three ActiveRecord features we want at work - any comments/suggestions?
Hi I thought I would run this past the Rails list before I start doing any work because I don''t want to re-invent any wheels. Basically the situation I''m in is that I work for a company that is currently working on two big non-Rails projects used for financial purposes (one sells vehicles and vehicle finance, the other is an internal broker management system used by lenders and brokers). After about two months of badgering we''re starting to use Rails for little projects. Assuming these trials (there are at least 3 going ahead) are successful we may use Rails on the bigger projects. Now, both of our large projects have grown organically, and it shows in the database as inconsistencies and redundancy. In future I want to do things RIGHT. There as three significant ways I want to work in future to limit the chance of problems: - Eliminate ALL nulls in the database. IE, every column in every table will be constrained NOT NULL. - Easily store temporal (historical) data, and not rely on ad-hoc schemes for recording previous information - Use class-table inheritance What I''m interested in is how feasible the possible solutions to these sound, and whether they could be implemented as acts_as plugins or would have to patch ActiveRecord directly. I''ve found that developers coming from C# or Java to Ruby and Rails get Rails-spoilt, in that Rails does so much for them they expect it to do EVERYTHING. So I want my solutions to be as straightforward and self-explanatory as possible. I''ll only look at eliminating NULLs here because it turns out explaining even one of these things is a task in itself!!! I''m sure my first wish will raise a few eyebrows :) The sad truth is that NULLs invalidate pretty much all of the relational database theory that SQL is (supposedly) based on. I won''t dwell on this, if you aren''t convinced read pretty much anything by Chris Dates, Fabian Pascal or Hugh Darwen (eg http://www.comp.leeds.ac.uk/pd31/Notes/ Darwen.pdf) All three authors have suggested different solutions to the problem, but the one I like best is proposed in the PDF by Darwen, referenced above. Basically instead of using a flag (ie NULL) in each table to indicate a missing value, you use a separate table to record why the information is not available. For example consider the following table: SurveyResult: { id, name, address, age, sex, survey_date, feedback_comments, follow_up_comments } assuming the following business rules: A survey result must contain name, address, feedback_comments and survey_date A survey result must have name and address deleted two years after survey_date (to comply with some data law) A survey respondent may demand that any of name, address, age, sex are deleted one year after survey_date (to comply with some other law) and the following procedure: Researchers attempt to collect follow_up_comments one month after survey_date Now here are "valid" SurveyResults for today''s date (2006-06-30): { 1, ''Billy'', ''1 My St'', 30, ''M'', ''2006-06-15'', ''Not bad'', NULL } { 2, ''Bobby'', ''2 Your Ave'', NULL, NULL, ''2006-01-01'', ''Very good'', ''Okay'' } { 3, ''Bert'', NULL, NULL, NULL, ''2005-01-01'', ''Excellent'', NULL } { 4, NULL, NULL, NULL, NULL, ''2004-01-01'', ''Very good'', ''Got worse'' } { 5, NULL, NULL, NULL, NULL, ''2004-01-02'', ''So so'', ''Much improved'' } In the context of the business rules, the first two record are unambiguous, but the rest lead you to ask the following (unanswerable) questions: - Has Bert requested his personal information deleted, or was it never recorded? - Why does Bert not have follow-up comments, as it is more than 1 month since he was surveyed? - How many applicants have requested their age deleted? However, in the absence of the business rules (which you don''t want to code into every query you write!), given any particular record you can''t say why a certain bit of information is missing. The solution I like is to store the optional information in two satellite tables per attribute, eg: SurveyResults_Name: { survey_result_id, name } SurveyResults_Name_Missing: { survey_result_id, reason } reason could be a id or a string, I haven''t decided which one I prefer yet. (Reason could be pretty much anything - ''unknown'', ''not_provided'', ''deleted_data_protection_act'', ''requested_deleted'', ''pending'', ''not applicable'' etc. Fabian Pascal thinks that ''not applicable'' actually means you have two distinct types of entity, but that leads us on to CTI. I didn''t give an example of N/A here because I''m not convinced it is fundamentally different when you are dealing with ORM). Obviously, there will be an entry in exactly one of these two tables for each survey result. Now on to the ActiveRecord bit (finally). What I want to do is merge in the attributes so I can access them in the Main class like this: sr = SurveyResult.find(:first) puts sr.name sr.name = ''William'' and what effectively happens is: sr = SurveyResult.find(:first) puts sr.survey_results_name.name sr.survey_results_name.name = ''William'' Now I''ve sort of dreamed up syntax for this, along the lines of: class SurveyResult < ActiveRecord::Base #has_one :survey_result_name # get rid of this acts_as_optional :name end sr = SurveyResult.find(:first) sr.name = ''William'' sr.name_status => :present sr.name_missing :requested_deleted sr.name_status => :requested_deleted The *_status and *_missing methods I suppose you can create easy enough in the acts_as_optional call or via an extension to method_missing. I''m not sure what to do in the case of a call to "sr.name" if the value is actually missing. I couldn''t return a missing symbol because that could be confused as a value. The best solution I can think of is to raise an AttributeMissingException or some such. Now I don''t know how much thought has been given to problems like this. Maybe merging tables like this has already been solved somewhere? I haven''t dug very far into the ActiveRecord code to know if this could be implemented without hacking the core. (A plugin would be a better solution initially.) I won''t go into it now because this e-mail is already coma-inducingly long, but a solution to my second problem (storing temporal data) can be achieved by timestamping the satellite tables used to record missing information. (I''ve only read about missing information and temporal data separately, but I can''t see a reason why they can''t be combined.) And finally, class-table inheritance seems like an extension of the general table-merging problem, but obviously more complex than including just a single field. I know this has been discussed before but it doesn''t look like much work has gone into it yet. Feedback very much appreciated on this. Hopefully I will get enough time at work to add one, two, or all three of these features into Rails but I''d like some guidance first. Cheers Ashley