Ok, this is a hard one and I just thought I''d see if people on the list had any suggestions on how they would approach this, I''ve not started to implement the rails to handle this yet as I''m still at the paper planning phase. I''m building an app that has to handle availability for travel packages. The complexity comes in that a person can choose a start date for a package and how long they wish to stay. The packages themselves have availability for each start date. The availability needs to be calculated such that if person A chooses date 1 which has an availability of 2 places and wishes to stay for 4 weeks, but the 4th week doesn''t have any availability, then the user is only offered a choice of 1-3 weeks for that particular date. If there is availability from the 5th week to the 10th week then the user could choose the 5th week as a start date and book for up to 5 weeks from then, but if they took up availability on the 7th week then no-one else could book the same package (5 weeks from the 5th week) and would only be offered a choice of up to 2 weeks if they wanted to choose the 5th week as a start date. The way this has been approached in the current system is to create a separate table that stores availability day by day for each package updated by a stored procedure after a booking is taken. However this completely violates data normalisation rules and potentially creates the update/delete errors that can arise from duplicating data and not keeping it atomic. Hope thats clear, I can try to clarify if not. David David Smalley w: http://davidsmalley.com/blog
Hi David.> The way this has been approached in the current system is to create > a separate table that stores availability day by day for each > package updated by a stored procedure after a booking is taken. > However this completely violates data normalisation rules and > potentially creates the update/delete errors that can arise from > duplicating data and not keeping it atomic.Could you sketch out this part in pseudo-SQL? *m
On 20 Feb 2006, at 09:49, Manuel Holtgrewe wrote:> Hi David. > >> The way this has been approached in the current system is to >> create a separate table that stores availability day by day for >> each package updated by a stored procedure after a booking is >> taken. However this completely violates data normalisation rules >> and potentially creates the update/delete errors that can arise >> from duplicating data and not keeping it atomic. > Could you sketch out this part in pseudo-SQL?It uses the following tables and attributes: Booking: (id,........, package_id, status) [The booking only takes up availability if the status is set to active] Package: (id,......) Availabilities: (id, package_id, datefrom, dateto, capacity) DailyAvailabilties: (id, package_id, date, available, allocated, daysremaining) PackageDailyAvailability: (id, availabilities_id, date, available, allocated, daysremaining) The current database is horrendously complicated and very overworked in my opinion. There are countless tables I just can''t figure out how they link due to an inconsistent naming scheme and lack of documentation. The previous developers are unwilling to revel how they implemented it (as they are being replaced!) and the stored procedures that update availability are encrypted in SQL server. I know there must be a better approach to what they have done! It''s just very hard to suss out, especially once you read what they''ve done and started to get thoroughly confused. David ---------------------- David Smalley w: http://davidsmalley.com/blog
Begin forwarded message:> On 20 Feb 2006, at 09:49, Manuel Holtgrewe wrote: > >> Hi David. >> >>> The way this has been approached in the current system is to >>> create a separate table that stores availability day by day for >>> each package updated by a stored procedure after a booking is >>> taken. However this completely violates data normalisation rules >>> and potentially creates the update/delete errors that can arise >>> from duplicating data and not keeping it atomic. >> Could you sketch out this part in pseudo-SQL? > > It uses the following tables and attributes: > > Booking: (id,........, package_id, status) [The booking only takes > up availability if the status is set to active] > Package: (id,......) > Availabilities: (id, package_id, datefrom, dateto, capacity) > DailyAvailabilties: (id, package_id, date, available, allocated, > daysremaining) > PackageDailyAvailability: (id, availabilities_id, date, available, > allocated, daysremaining)There also appears to be a table which links a booking to an availability. BookingDetails(id, booking_id, availabilities_id) I can''t really make head-nor-tails of the SQL dump I have been given so apologies if this is inconsistent. --------------------- David Smalley w: http://davidsmalley.com/blog
Am 20.02.2006 um 11:16 schrieb David Smalley:> It uses the following tables and attributes: > > Booking: (id,........, package_id, status) [The booking only takes > up availability if the status is set to active] > Package: (id,......) > Availabilities: (id, package_id, datefrom, dateto, capacity) > DailyAvailabilties: (id, package_id, date, available, allocated, > daysremaining) > PackageDailyAvailability: (id, availabilities_id, date, available, > allocated, daysremaining) > > The current database is horrendously complicated and very > overworked in my opinion. There are countless tables I just can''t > figure out how they link due to an inconsistent naming scheme and > lack of documentation. The previous developers are unwilling to > revel how they implemented it (as they are being replaced!) and the > stored procedures that update availability are encrypted in SQL > server. > > I know there must be a better approach to what they have done! It''s > just very hard to suss out, especially once you read what they''ve > done and started to get thoroughly confused.The first thing I see is that you are dealing with very complicated data. In math-speak you are dealing with "bags" of intervals (a bag being a "multiset" which can contain the same element more than once). Intervals are complicated but that you might have multiple, identical ones does not make things simpler. Let''s only consider a single package. Your data looks like this when abstracted [2006-01-01, 2006-02-23] [2006-02-12, 2006-03-23] [2006-03-23, 2006-04-23] [2006-01-13, 2006-06-23] Let''s say these are four rooms or slots for this package in your booking system. Now, if you get a booking - say of the second interval, you might end up with data like the following (assuming someone books from 2006-02-14 to 2006-03-04): [2006-01-01, 2006-02-23] [2006-02-12, 2006-02-13], [2006-03-05, 2006-03-23] [2006-03-23, 2006-04-23] [2006-01-13, 2006-06-23] As we can see, you are getting more intervals - d''oh. You can imagine how much more complicated all this will get with multiple packages and of course with more slots (I assume there are more than hundred concurrent bookings for a package for medium sized hotels). As far as I can see, you can approach storing the general data in four ways. (I) First, you can store intervals, but since there might be more than one identical interval, this might end up ugly (storing the same thing twice is a nono in relational databases). You''d get something like this: interval: (from, to, count) Now, if person A books an interval in the middle of the other interval, you might end up with the following intervals: 1.) the interval before the booking with count = old_count 2.) the interval after the booking with count = old_count 3.) the interval with the same interval of the boking with count = old_count-1 Whee, and if person B books the interval in 1., you will have to merge both intervals back together to keep consistency. (II) So storing the free slots seems not to be the best way to do it. What about storing the booked intervals? We might end up with the following: package := (id, maximum_bookings) booking := (id, booking, interval) So when checking if we have free slots at a given time T, we do something like (in Pseudo-SQL): SELECT COUNT(id) FROM booking WHERE booking.interval CONTAINS %T% This seems pretty clean to me. You might want to extend this with a "slots_booked" column in booking and replace the COUNT(id) with a SUM (slots_booked). (III, IV) Since the intervals have a finite count of elements you could also store the days (or weeks if you can only book full weeks in your system) and make them have a "availability" count as in (I) or store only the booked days as in (II). Since your database will kind of explode (356 days * n slots per package * m packages), I''d recommend (II). I don''t know how fast such "date in interval" queries are in your database system. I''d implement all that stuff and if it turns out that this query is a bottle neck, you might want to have a "cache" of sorts rebuilt by stored procedures (or in your code) on storing elements. Of course, this is kind of a hack and maybe your RDBMS handles those interval questions really well with an index. Regards, Manuel
On 20 Feb 2006, at 11:36, Manuel Holtgrewe wrote:> > The first thing I see is that you are dealing with very complicated > data. In math-speak you are dealing with "bags" of intervals (a bag > being a "multiset" which can contain the same element more than > once). Intervals are complicated but that you might have multiple, > identical ones does not make things simpler. ><snip>> Regards, > > Manuel > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/railsThanks, I''ll digest this David Smalley w: http://davidsmalley.com/blog
On 20 Feb 2006, at 11:36, Manuel Holtgrewe wrote:> >> It uses the following tables and attributes: >> >> Booking: (id,........, package_id, status) [The booking only takes >> up availability if the status is set to active] >> Package: (id,......) >> Availabilities: (id, package_id, datefrom, dateto, capacity) >> DailyAvailabilties: (id, package_id, date, available, allocated, >> daysremaining) >> PackageDailyAvailability: (id, availabilities_id, date, available, >> allocated, daysremaining) >> >> The current database is horrendously complicated and very >> overworked in my opinion. There are countless tables I just can''t >> figure out how they link due to an inconsistent naming schemeI just had a thought. Surely this kind of problem lends itself to temporal database theory. That is to say that if the available booking dates were stored as interval datatypes along with capacity, I could perform an SQL count to see how many rows intersect with a particular interval. This would return the number of bookings there were on a particular date, minus this from the total capacity for a date and we''d have the availability for a particular date. Anyone had any experience of temporal support in Rails? I see that Mysql 5 has improved interval datatype support. Would I have to write a patch for activerecord that dealt with interval datatypes and could perform the pack/unpack operations necessary for testing an intersection? David Smalley w: http://davidsmalley.com/blog -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060221/6f1fd6b2/attachment.html
On Feb 21, 2006, at 22:56 , David Smalley wrote:> Anyone had any experience of temporal support in Rails? I see that > Mysql 5 has improved interval datatype support.Little late to the party. Temporal representations are an interest of mine, but I''m unfamiliar with MySQL. I don''t see intervals in the MySQL documentation and didn''t find anything when I googled. Could you provide some links for where I might find some more information for this wrt MySQL?> Would I have to write a patch for activerecord that dealt with > interval datatypes and could perform the pack/unpack operations > necessary for testing an intersection?With just a start and end point demarcating an interval its possible to do interval comparisons, so you wouldn''t necessarily have to write pack/unpack code. You may be interested in a book by Snodgrass that''s available as a PDF download on temporal databases in SQL: http://www.cs.arizona.edu/people/rts/tdbbook.pdf Michael Glaesemann grzm myrealbox com
I just want to point something out: if you google Lisp or certain topics in artificial intelligence, you get job ads for a company in Massachusetts which solves a similar class of problems. Some kind of travel-booking company. They are literally working on the travelling salesman problem. The travelling salesman problem is a graph theory problem in math/computer science, and all I can tell you beyond that is that it''s a bit over my head and some very clever people consider it to be quite difficult. I''m not sure if what you''re dealing with here is on the same level, but it might be something to look into. -- Giles Goat Boy http://gilesmakesmusic.blogspot.com http://gileswritescode.blogspot.com
Hi, not sure if it helps, but I remember a friend writing his master''s thesis in CS on optimizing/estimating duration of processes in a workflow system. I think he used something called "critical path analysis". Maybe it''s appliable to you problem. Otherwise it sounds like a travelling salesman problem (graph theory), which belongs to a class of problems which are hard to compute in a way that throwing faster computers on them doesn''t really help. Your best bet is to use heuristics (intelligent guessing) to find your way through the problem space. The "A*" algorithm comes to my mind, but I only remember fragments of that university stuff today... http://en.wikipedia.org/wiki/Critical_path http://en.wikipedia.org/wiki/Travelling_salesman http://en.wikipedia.org/wiki/A%2A Various projects which might be related by looking at the description (search for "graph"): http://rubyforge.org/projects/rgraph/ http://rubyforge.org/projects/wildberry/ http://rubyforge.org/projects/horai/ http://rubyforge.org/projects/rgl/ Maybe it also helps re-asking you question on the ruby-talk mailing list. There are quite some people who are good in solving mathematical complex problems. HTH, Timo Am 20.02.2006 um 10:25 schrieb David Smalley:> Ok, this is a hard one and I just thought I''d see if people on the > list had any suggestions on how they would approach this, I''ve not > started to implement the rails to handle this yet as I''m still at > the paper planning phase. > > I''m building an app that has to handle availability for travel > packages. > > The complexity comes in that a person can choose a start date for a > package and how long they wish to stay. The packages themselves > have availability for each start date. The availability needs to be > calculated such that if person A chooses date 1 which has an > availability of 2 places and wishes to stay for 4 weeks, but the > 4th week doesn''t have any availability, then the user is only > offered a choice of 1-3 weeks for that particular date. If there is > availability from the 5th week to the 10th week then the user could > choose the 5th week as a start date and book for up to 5 weeks from > then, but if they took up availability on the 7th week then no-one > else could book the same package (5 weeks from the 5th week) and > would only be offered a choice of up to 2 weeks if they wanted to > choose the 5th week as a start date. > > The way this has been approached in the current system is to create > a separate table that stores availability day by day for each > package updated by a stored procedure after a booking is taken. > However this completely violates data normalisation rules and > potentially creates the update/delete errors that can arise from > duplicating data and not keeping it atomic. > > Hope thats clear, I can try to clarify if not. > > David > > > > David Smalley > w: http://davidsmalley.com/blog > > > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails