This question may be so obvious that even Marnen won''t answer it... :) I need to store a set of relatively complex objects in the db -- in my case, these are discrete cumulative distribution functions: they take a long time to construct, but I want the lookup to be fast. Since the lookup is essentially a binary search, a B-Tree or some variant would be a sensible data structure. I have choices: (a) I store entire YAML''d B-Trees, (b) I store simple data structure (e.g. arrays of numeric pairs) and spend time constructing a B-Tree from it, or (c) choose another approach that someone on this list points out. Can this forum offer any advice on size / speed tradeoffs for YAML''d objects? If unpacking a YAML''d object is fast, then (a), storing the entire B-Tree is probably the best approach. If slow, then perhaps I''m better off with (b), storing a minimal data structure and reconstructing the B-Tree when I need it. Thoughts? - ff -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Marnen Laibow-Koser
2010-Dec-02 15:17 UTC
Re: persisting complex (and possibly large) objects
Fearless Fool wrote in post #965610:> This question may be so obvious that even Marnen won''t answer it... :)I can''t resist a challenge like that. :D> > I need to store a set of relatively complex objects in the db -- in my > case, these are discrete cumulative distribution functions: they take a > long time to construct, but I want the lookup to be fast. Since the > lookup is essentially a binary search, a B-Tree or some variant would be > a sensible data structure. > > I have choices: (a) I store entire YAML''d B-Trees, (b) I store simple > data structure (e.g. arrays of numeric pairs) and spend time > constructing a B-Tree from it, or (c) choose another approach that > someone on this list points out.How about C? Nested sets are generally a great way of storing arbitrary trees in SQL databases for easy retrieval. The awesome_nested_set plugin makes this easy to do in Rails. If you''re unfamiliar with nested sets, I recommend http://dev.mysql.com/tech-resources/articles/hierarchical-data.html as a good overview. Or do I misunderstand? What''s the nature of the tree you''d like to store?> > Can this forum offer any advice on size / speed tradeoffs for YAML''d > objects? If unpacking a YAML''d object is fast, then (a), storing the > entire B-Tree is probably the best approach. If slow, then perhaps I''m > better off with (b), storing a minimal data structure and reconstructing > the B-Tree when I need it.Speed may not be your immediate concern with YAML: the big problem with that approach is that it bypasses the structure of the database and makes the data very difficult to query. It''s a decent last resort, but it *is* a last resort.> > Thoughts? > > - ffBest, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On 2 December 2010 06:28, Fearless Fool <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> I need to store a set of relatively complex objects in the db -- in my > case, these are discrete cumulative distribution functions: they take a > long time to construct, but I want the lookup to be fast. Since the > lookup is essentially a binary search, a B-Tree or some variant would be > a sensible data structure. > > Thoughts?I''ve an inkling this is out of my comfort zone, but my initial thoughts (feel free to totally disregard...) are: a) serialize? or is that what you meant by storing a "YAML''d B-Tree"? b) if these are complex objects that need to be stored in the DB, can you not use a DB designed to store complex objects (like Mongo)? c) Marnen''s nested set sounds like a potential good solution too (given what I think I understand from your description...) -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Fearless Fool wrote in post #965610:> This question may be so obvious that even Marnen won''t answer it... :) > > I need to store a set of relatively complex objects in the db -- in my > case, these are discrete cumulative distribution functions: they take a > long time to construct, but I want the lookup to be fast. Since the > lookup is essentially a binary search, a B-Tree or some variant would be > a sensible data structure. > > I have choices: (a) I store entire YAML''d B-Trees, (b) I store simple > data structure (e.g. arrays of numeric pairs) and spend time > constructing a B-Tree from it, or (c) choose another approach that > someone on this list points out. > > - ffTough to call... What''s your intended usage scenario? Depending on the resolution required from the CDF, could you just store built distributions in a table and query against it? With normalized inputs and the right indices, the query should be fast, and trading off record count versus accuracy, you could always do a linear interpolation between any two stored points. Need higher interpolated accuracy? Then increase the resolution of stored points to minimize the interpolated gap. Ugh... I gave up the practice of sadistics years ago, and now you''re making me remember it... Curse you Red Baron! -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
This may surprise the forum, but Marnen is right :) :) -- it makes sense to simply structure the CDFs as SQL queries. I know how to write CDF functions in a procedural language but haven''t yet attempted them using SQL, but @railsdog''s post reminded me that it won''t be too hard (thank you). CDF functions are easily implemented as tables of linear segments -- I can store the vertices and the slopes, so linear interpolation between the breakpoints will be fast. And -- doh! -- a google search for "cumulative distribution function sql" gives me a real jump start. Why didn''t I think of that sooner? Thanks, y''all. - ff -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
@marnen, @railsdog: I owe you a big thank you. Once I realized that looking up the CDF was really just piecewise linear approximation, then casting it into pure SQL was easy. In fact, it gave me a chance to test out the new AREL-based query interface. If anyone is interested, it''s surprisingly simple: Assume you store x0, y0, and slope for each line segment like this: ActiveRecord::Schema.define do create_table(:lin_segs, :force => true) do |t| t.integer :tbl_id t.float :x0 t.float :y0 t.float :m end end The AREL-style query (replete with linear interpolation) is simply: def lookup(x, table_id) seg = LineSegment.where(:table_id => table_id).order(''x0 DESC'').where("x0 < #{x}").limit(1).first seg.y0 + (x - seg.x0) * seg.m end If you insert "bumpers" at each end of your table (an entry with x0 LARGEST_NEGATIVE_FLOAT and an entry with x0 = max X), then you don''t even need additional logic to handle out-of-bounds x values. Thanks for the nudge. I''m getting more comfortable thinking in SQL. - ff -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.