Howdy railsters, I have a problem which at first seemed to be very simple, but now I''m quite stumped. I''d appreciate your help with this if possible. I''ve scoured the web and the mailing list for several months now, and I haven''t found an acceptable solution. Perhaps you''ve run into this? I have two models, associated with a HABTM (actually using has_many :through on both ends, along with a join table). I need to retrieve all ModelA''s that is associated with BOTH of two ModelB''s. I do NOT want all ModelA''s for ModelB_1 concatenated with all ModelA''s for ModelB_2. I literally want all ModelA''s that are associated with BOTH ModelB_1 and ModelB_2. It is not limited to only 2 ModelB''s, it may be up to 50 ModelB''s, so this must scale. I can describe the problem using a variety of analogies, that I think better describes my problem than the previous paragraph: * Find all books that were written by all 3 authors together. * Find all movies that had the following 4 actors in them. * Find all blog posts that belonged to BOTH the Rails and Ruby categories for each post. * Find all users that had all 5 of the following tags: funny, thirsty, smart, thoughtful, and quick. (silly example!) * Find all people that have worked in both San Francisco AND San Jose AND New York AND Paris in their lifetimes. I''ve thought of a variety of ways to accomplish this, but they''re grossly inefficient and very frowned upon. Taking an analogy above, say the last one, you could do something like query for all the people in each city, then find items in each array that exist across each array. That''s a minimum of 5 queries, all the data of those queries transfered back to the app, then the app has to intensively compare all 5 arrays to each other (loops galore!). That''s nasty, right? Another possible solution would be to chain the finds on top of each other, which would essentially do the same as above, but won''t eliminate the multiple queries and processing. Also, how would you dynamicize the chain if you had user submitted checkboxes or values that could be as high as 50 options? Seems dirty. You''d need a loop. And again, that would intensify the search duration. Obviously, if possible, we''d like to have the database perform this for us, so, people have suggested to me that I simply put multiple conditions in. Unfortunately, you can only do an OR with HABTM typically. Another solution I''ve run across is to use a search engine, like sphinx or UltraSphinx. For my particular situation, I feel this is overkill, and I''d rather avoid it. I still feel there should be a solution that will let a user craft a query for an arbitrary number of ModelB''s and find all ModelA''s. How would *you* solve this problem? Thanks a bunch, Kevin (I''ve subsequently cross-posted this to StackOverflow''s website since they have a target audience that encompasses more than Rails, and is still valid in those areas --- http://stackoverflow.com/questions/1095571/habtm-finds-with-and-joins-not-or )
I''ll give you what advice I do know. I''m not sure it will fully help you with your situation but it may help you to rethink your strategies. First, make sure your tables are normalized before assigning associations to them. If you are going to work with HABTM then 3NF or greater.. The larger the query the better. Smaller queries are worse than one enormously large query because rails caches that query for use and doesn''t have to go out and do another.. and another.. It will be less of a problem to process the data once you have it so I wouldn''t worry about data processing at this point. It''s better to just get the design and associations going. Without seeing your models, it''s more difficult to guess what may be right or wrong from a design point. You might want to state exactly how many models you have, what tables and relationships you currently have associated which will help tie into your original topic. -- Posted via http://www.ruby-forum.com/.
Thanks for the pointers. I''ve been designing databases for close to 15 years, and have been working with Rails for over 4 years. I''ve certainly used HABTM and has_many :through for many projects without issue. I already have the data constructed correctly, but I do appreciate the hand holding effort anyway. This is a very particular processing need. Should there be a wildly useful way to restructure the database without over denormalizing/normalizing what I already have, I''m open to it, but I doubt very much that''s where the problem lies. For sake of the conversation here are some fictitious models that match my scenario identically: TABLES --------------------------------------------------------------- features id:integer name:string schools id:integer name:string features_schools feature_id:integer school_id:integer MODELS --------------------------------------------------------------- class School < ActiveRecord::Base has_and_belongs_to_many :features end class Feature < ActiveRecord::Base has_and_belongs_to_many :schools end How would I pull out all the schools that have ALL of the following fictitious features: ''Wheelchair Access'', ''Playground'', ''Sandbark'', ''Library'', ''Computer Lab'', and ''Testing Center''? Note that it would be unacceptable to return schools that do not have one of those associations. -Kevin On Jul 7, 2009, at 8:18 PM, Älphä Blüë wrote:> > I''ll give you what advice I do know. I''m not sure it will fully help > you with your situation but it may help you to rethink your > strategies. > > First, make sure your tables are normalized before assigning > associations to them. If you are going to work with HABTM then 3NF or > greater.. > > The larger the query the better. Smaller queries are worse than one > enormously large query because rails caches that query for use and > doesn''t have to go out and do another.. and another.. > > It will be less of a problem to process the data once you have it so I > wouldn''t worry about data processing at this point. It''s better to > just > get the design and associations going. > > Without seeing your models, it''s more difficult to guess what may be > right or wrong from a design point. You might want to state exactly > how > many models you have, what tables and relationships you currently have > associated which will help tie into your original topic. > > -- > Posted via http://www.ruby-forum.com/. > > >
Hi Try this Feature.find(:all,:conditions => [''name in (?,?,?,?,?,?)'',''Wheelchair Access'',''Playground'',''Sandbark'',''Library'',''Computer Lab'',''Testing Center'']).each {|f| arr = arr && f.schools} Sijo -- Posted via http://www.ruby-forum.com/.
Hi arr = [] Feature.find(:all,:conditions => [''name in (?,?,?,?,?,?)'',''Wheelchair Access'',''Playground'',''Sandbark'',''Library'',''Computer Lab'',''Testing Center'']).each {|f| arr = arr && f.schools} Now arr have the required value.Is that you want? Sijo -- Posted via http://www.ruby-forum.com/.
On Wed, Jul 8, 2009 at 1:21 AM, Sijo Kg<rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > Hi > > arr = [] > Feature.find(:all,:conditions => [''name in (?,?,?,?,?,?)'',''Wheelchair > Access'',''Playground'',''Sandbark'',''Library'',''Computer Lab'',''Testing > Center'']).each {|f| arr = arr && f.schools} > > Now arr have the required value.Is that you want?First off, I don''t think that works. arr = arr && f.schools will end up with arr being set only to the set of schools associated with the last feature instance found. I think you meant to use Array#& which performs an array intersection, instead of the && logical operator. But just changing that will end up with an empty arr since [] & [1] => [] i.e the intersection of an empty set with anything is the empty set. I believe that something like this is closer to the right meaning: arr = Schools.find(:all) Feature.find(:all, :conditions => [''name in (?,?,?,?,?,?)'',''Wheelchair Access'',''Playground'',''Sandbark'',''Library'',''Computer Lab'',''Testing Center'']).each {|f| arr = arr & f.schools} That''s going to do 2+n queries though, 1 to find the schools, 1 to find the n features with one of the names, and then a query to find each of those features schools. Now I think that adding :include => :schools to the Feature.find will get this down to 2 queries. But I''m not sure that this will turn out to be the most efficient way to do it, since it potentially instantiates lots of duplicate School objects and doing the Set intersection in Ruby won''t be as efficient as letting SQL do it. Not that I can come up with a way to let SQL do it, without thinking harder than I want to this morning. <G> -- Rick DeNatale Blog: http://talklikeaduck.denhaven2.com/ Twitter: http://twitter.com/RickDeNatale WWR: http://www.workingwithrails.com/person/9021-rick-denatale LinkedIn: http://www.linkedin.com/in/rickdenatale
I''m not sure that doing it in SQL is always going to be faster - there''s got to be some performance penalty for doing a join with the same table that many times. One favorite trick that can sometimes make things like this more efficient is to specifically request only the ids of the relevant objects, do the intersection, and then pull in full objects only for the resulting set. So something like this (things is an array of feature names): record_ids = Feature.find_by_name(things[0]).school_ids things[1..-1].each do |thing| record_ids &= Feature.find_by_name(thing).school_ids end @schools = School.find(record_ids) Note that if you want eliminate some queries, you could pre-collect the Feature objects with a call to Feature.find(:all, :conditions => { :name => things }), which will find them all in one go. You''ll find other example of this kind of code in the Rails association preload logic, and several other places from what I recall. --Matt Jones On Jul 8, 7:33 am, Rick DeNatale <rick.denat...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Now I think that adding :include => :schools to the Feature.find will > get this down to 2 queries. But I''m not sure that this will turn out > to be the most efficient way to do it, since it potentially > instantiates lots of duplicate School objects and doing the Set > intersection in Ruby won''t be as efficient as letting SQL do it. > > Not that I can come up with a way to let SQL do it, without thinking > harder than I want to this morning. <G> >
On Jul 8, 2009, at 11:24 AM, Matt Jones wrote:> > I''m not sure that doing it in SQL is always going to be faster - > there''s got to be some performance penalty for doing a join with the > same table that many times. > > One favorite trick that can sometimes make things like this more > efficient is to specifically request only the ids of the relevant > objects, do the intersection, and then pull in full objects only for > the resulting set. So something like this (things is an array of > feature names): > > record_ids = Feature.find_by_name(things[0]).school_ids > things[1..-1].each do |thing| > record_ids &= Feature.find_by_name(thing).school_ids > end > @schools = School.find(record_ids) > > Note that if you want eliminate some queries, you could pre-collect > the Feature objects with a call to Feature.find(:all, :conditions => > { :name => things }), which will find them all in one go. >However, the problem with this solution is that you still need several queries and a post-retrieval processing of whether or not the many features share a list of schools in common. Across 40 features, this would be too heavy and unusable in a production environment. -Kevin
To further show an example of trying to do this, here is a ''real world'' example. However, in this particular case, I''m only showing two features, but in real usage, a user might select as many as 50 features or so. mysql> SELECT c.park_id, a.name, c.feature_id, b.name FROM parks a, features b, features_parks c WHERE (c.park_id = a.id) AND (c.feature_id = b.id) AND (c.feature_id = 21); +---------+-------------------------+------------+--------+ | park_id | name | feature_id | name | +---------+-------------------------+------------+--------+ | 3120 | Mitchell Park | 21 | Swings | | 21385 | Boulware Park | 21 | Swings | | 2351 | Bowden Park | 21 | Swings | | 21561 | Cameron Park | 21 | Swings | | 24791 | Ramos Park | 21 | Swings | | 2585 | Eleanor Park | 21 | Swings | | 2627 | Flood County Park | 21 | Swings | | 2374 | Burgess Park | 21 | Swings | | 22986 | Holbrook-Palmer Park | 21 | Swings | | 2799 | Huddart Park | 21 | Swings | | 3354 | Rinconada Park | 21 | Swings | | 3693 | Werry Park | 21 | Swings | | 26152 | Willow Oaks Park | 21 | Swings | | 23559 | Live Oak Manor Park | 21 | Swings | | 24295 | Oak Meadow Park | 21 | Swings | | 25562 | Sunnyvale Baylands Park | 21 | Swings | | 21354 | Cornelis Bol Park | 21 | Swings | | 21459 | Bubb Park | 21 | Swings | | 21910 | Cooper Park | 21 | Swings | | 24859 | Rengstorff Park | 21 | Swings | | 24043 | Monroe Mini Park | 21 | Swings | | 24867 | Rex Manor Playground | 21 | Swings | +---------+-------------------------+------------+--------+ 22 rows in set (0.00 sec) mysql> SELECT c.park_id, a.name, c.feature_id, b.name FROM parks a, features b, features_parks c WHERE (c.park_id = a.id) AND (c.feature_id = b.id) AND (c.feature_id = 51); +---------+--------------------------+------------+---------+ | park_id | name | feature_id | name | +---------+--------------------------+------------+---------+ | 3120 | Mitchell Park | 51 | Sandpit | | 2351 | Bowden Park | 51 | Sandpit | | 2374 | Burgess Park | 51 | Sandpit | | 2474 | Coyote Point County Park | 51 | Sandpit | | 24867 | Rex Manor Playground | 51 | Sandpit | +---------+--------------------------+------------+---------+ 5 rows in set (0.00 sec) So if I was trying to fetch all parks that have BOTH ''Swings'' and a ''Sandpit'', I''d expect to see: Mitchell Park Bowden Park Burgess Park Rex Manor Playground If this was limited to only looking for 2 or 3 features at a time, even on a hefty DB of 300,000 records, this wouldn''t be too difficult. You would fetch all parks for each of the 2 or 3 features, then eliminate all but the ones they share in common (exclusionary merging of some sort). However, with 50 features, this becomes unusable, since you''d have tons of redundancies when retrieving the data, and you''d have to recursively iterate over 50 features. Any way to combine the queries into one, and subquery it to eliminate ones that don''t have all the features requested in the query? -Kevin On Jul 7, 2009, at 5:44 PM, Kevin Elliott wrote:> I have two models, associated with a HABTM (actually using > has_many :through on both ends, along with a join table). I need to > retrieve all ModelA''s that is associated with BOTH of two ModelB''s. I > do NOT want all ModelA''s for ModelB_1 concatenated with all ModelA''s > for ModelB_2. I literally want all ModelA''s that are associated with > BOTH ModelB_1 and ModelB_2. It is not limited to only 2 ModelB''s, it > may be up to 50 ModelB''s, so this must scale. > > I can describe the problem using a variety of analogies, that I think > better describes my problem than the previous paragraph: > > * Find all books that were written by all 3 authors together. > * Find all movies that had the following 4 actors in them. > * Find all blog posts that belonged to BOTH the Rails and Ruby > categories for each post. > * Find all users that had all 5 of the following tags: funny, > thirsty, smart, thoughtful, and quick. (silly example!) > * Find all people that have worked in both San Francisco AND San Jose > AND New York AND Paris in their lifetimes. > > I''ve thought of a variety of ways to accomplish this, but they''re > grossly inefficient and very frowned upon. > > Taking an analogy above, say the last one, you could do something like > query for all the people in each city, then find items in each array > that exist across each array. That''s a minimum of 5 queries, all the > data of those queries transfered back to the app, then the app has to > intensively compare all 5 arrays to each other (loops galore!). That''s > nasty, right? > > Another possible solution would be to chain the finds on top of each > other, which would essentially do the same as above, but won''t > eliminate the multiple queries and processing. Also, how would you > dynamicize the chain if you had user submitted checkboxes or values > that could be as high as 50 options? Seems dirty. You''d need a loop. > And again, that would intensify the search duration. > > Obviously, if possible, we''d like to have the database perform this > for us, so, people have suggested to me that I simply put multiple > conditions in. Unfortunately, you can only do an OR with HABTM > typically. > > Another solution I''ve run across is to use a search engine, like > sphinx or UltraSphinx. For my particular situation, I feel this is > overkill, and I''d rather avoid it. I still feel there should be a > solution that will let a user craft a query for an arbitrary number of > ModelB''s and find all ModelA''s. > > > How would *you* solve this problem? > > > Thanks a bunch, > Kevin > > (I''ve subsequently cross-posted this to StackOverflow''s website since > they have a target audience that encompasses more than Rails, and is > still valid in those areas --- http://stackoverflow.com/questions/1095571/habtm-finds-with-and-joins-not-or > )--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Thanks to the guidance of an answer submitted by LucaM on the StackOverflow post, I was able to come up with a SQL query and ActiveRecord find query that does it. SQL: SELECT c.park_id, a.name FROM parks a, features b, features_parks c WHERE (c.park_id = a.id) AND (c.feature_id = b.id) AND (c.feature_id IN (21, 51, 15, 24)) GROUP BY a.id HAVING count(*)=4; Rails: feature_ids = [21, 51, 15, 24] @parks = Park.all( :joins => :features, :group => "parks.id", :select => "parks.*", :conditions => ["features.id in (?)", feature_ids], :having => "count(*)=#{feature_ids.size}" ) The key was grouping and using having to ensure that the items each had the max number of features that we''re searching on. I hope this helps someone. -Kevin On Jul 8, 2009, at 12:22 PM, Kevin Elliott wrote:> To further show an example of trying to do this, here is a ''real > world'' example. However, in this particular case, I''m only showing > two features, but in real usage, a user might select as many as 50 > features or so. > > mysql> SELECT c.park_id, a.name, c.feature_id, b.name FROM parks a, > features b, features_parks c WHERE (c.park_id = a.id) AND > (c.feature_id = b.id) AND (c.feature_id = 21); > +---------+-------------------------+------------+--------+ > | park_id | name | feature_id | name | > +---------+-------------------------+------------+--------+ > | 3120 | Mitchell Park | 21 | Swings | > | 21385 | Boulware Park | 21 | Swings | > | 2351 | Bowden Park | 21 | Swings | > | 21561 | Cameron Park | 21 | Swings | > | 24791 | Ramos Park | 21 | Swings | > | 2585 | Eleanor Park | 21 | Swings | > | 2627 | Flood County Park | 21 | Swings | > | 2374 | Burgess Park | 21 | Swings | > | 22986 | Holbrook-Palmer Park | 21 | Swings | > | 2799 | Huddart Park | 21 | Swings | > | 3354 | Rinconada Park | 21 | Swings | > | 3693 | Werry Park | 21 | Swings | > | 26152 | Willow Oaks Park | 21 | Swings | > | 23559 | Live Oak Manor Park | 21 | Swings | > | 24295 | Oak Meadow Park | 21 | Swings | > | 25562 | Sunnyvale Baylands Park | 21 | Swings | > | 21354 | Cornelis Bol Park | 21 | Swings | > | 21459 | Bubb Park | 21 | Swings | > | 21910 | Cooper Park | 21 | Swings | > | 24859 | Rengstorff Park | 21 | Swings | > | 24043 | Monroe Mini Park | 21 | Swings | > | 24867 | Rex Manor Playground | 21 | Swings | > +---------+-------------------------+------------+--------+ > 22 rows in set (0.00 sec) > > > mysql> SELECT c.park_id, a.name, c.feature_id, b.name FROM parks a, > features b, features_parks c WHERE (c.park_id = a.id) AND > (c.feature_id = b.id) AND (c.feature_id = 51); > +---------+--------------------------+------------+---------+ > | park_id | name | feature_id | name | > +---------+--------------------------+------------+---------+ > | 3120 | Mitchell Park | 51 | Sandpit | > | 2351 | Bowden Park | 51 | Sandpit | > | 2374 | Burgess Park | 51 | Sandpit | > | 2474 | Coyote Point County Park | 51 | Sandpit | > | 24867 | Rex Manor Playground | 51 | Sandpit | > +---------+--------------------------+------------+---------+ > 5 rows in set (0.00 sec) > > > So if I was trying to fetch all parks that have BOTH ''Swings'' and a > ''Sandpit'', I''d expect to see: > > Mitchell Park > Bowden Park > Burgess Park > Rex Manor Playground > > If this was limited to only looking for 2 or 3 features at a time, > even on a hefty DB of 300,000 records, this wouldn''t be too > difficult. You would fetch all parks for each of the 2 or 3 > features, then eliminate all but the ones they share in common > (exclusionary merging of some sort). However, with 50 features, this > becomes unusable, since you''d have tons of redundancies when > retrieving the data, and you''d have to recursively iterate over 50 > features. > > Any way to combine the queries into one, and subquery it to > eliminate ones that don''t have all the features requested in the > query? > > -Kevin > > On Jul 7, 2009, at 5:44 PM, Kevin Elliott wrote: > >> I have two models, associated with a HABTM (actually using >> has_many :through on both ends, along with a join table). I need to >> retrieve all ModelA''s that is associated with BOTH of two ModelB''s. I >> do NOT want all ModelA''s for ModelB_1 concatenated with all ModelA''s >> for ModelB_2. I literally want all ModelA''s that are associated with >> BOTH ModelB_1 and ModelB_2. It is not limited to only 2 ModelB''s, it >> may be up to 50 ModelB''s, so this must scale. >> >> I can describe the problem using a variety of analogies, that I think >> better describes my problem than the previous paragraph: >> >> * Find all books that were written by all 3 authors together. >> * Find all movies that had the following 4 actors in them. >> * Find all blog posts that belonged to BOTH the Rails and Ruby >> categories for each post. >> * Find all users that had all 5 of the following tags: funny, >> thirsty, smart, thoughtful, and quick. (silly example!) >> * Find all people that have worked in both San Francisco AND San >> Jose >> AND New York AND Paris in their lifetimes. >> >> I''ve thought of a variety of ways to accomplish this, but they''re >> grossly inefficient and very frowned upon. >> >> Taking an analogy above, say the last one, you could do something >> like >> query for all the people in each city, then find items in each array >> that exist across each array. That''s a minimum of 5 queries, all the >> data of those queries transfered back to the app, then the app has to >> intensively compare all 5 arrays to each other (loops galore!). >> That''s >> nasty, right? >> >> Another possible solution would be to chain the finds on top of each >> other, which would essentially do the same as above, but won''t >> eliminate the multiple queries and processing. Also, how would you >> dynamicize the chain if you had user submitted checkboxes or values >> that could be as high as 50 options? Seems dirty. You''d need a loop. >> And again, that would intensify the search duration. >> >> Obviously, if possible, we''d like to have the database perform this >> for us, so, people have suggested to me that I simply put multiple >> conditions in. Unfortunately, you can only do an OR with HABTM >> typically. >> >> Another solution I''ve run across is to use a search engine, like >> sphinx or UltraSphinx. For my particular situation, I feel this is >> overkill, and I''d rather avoid it. I still feel there should be a >> solution that will let a user craft a query for an arbitrary number >> of >> ModelB''s and find all ModelA''s. >> >> >> How would *you* solve this problem? >> >> >> Thanks a bunch, >> Kevin >> >> (I''ve subsequently cross-posted this to StackOverflow''s website since >> they have a target audience that encompasses more than Rails, and is >> still valid in those areas --- http://stackoverflow.com/questions/1095571/habtm-finds-with-and-joins-not-or >> ) > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---