Patrick Doyle
2009-Sep-28 21:32 UTC
What is the most efficient way to split a table into 2 groups?
I have the following: @lot = Lot.find(params[:id]) part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id]) I guess I should mention that Lot :belongs_to => :part I was looking at the log following the execution of these two statements and I saw something like this: Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13) Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2) Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2) It looked a bit silly to me -- first I grab a record from the "parts" table with an ID of 2, then I grab all the records from the parts table whose ID is not 2. I played around a little with :include clauses, thinking that I should, at least, be able to fetch the record from the "lots" table and the "parts" table simultaneously (with something like a joins clause and a "lots.part_id = parts.id" WHERE clause), but didn''t get anywhere with that. I will end up leaving this the way it is (most likely), especially since this isn''t the right phase of development to be worrying about optimization, but I am curious how one might do this most efficiently. Is it most efficient to grab 1 record where record.id = blah and then all the (rest of the) records where record.id <> blah? Is it more efficient to grab all the records at once and write some ruby code to select the one record from the rest? If so, what would that code look like? I don''t like this: everything = Part.all the_one = everything.select {|x| x.id == 2} the_rest = everything.reject {|x| x.id != 2} That''s going to iterate over all of the records. twice! in interpreted code! Any thoughts, ideas, or snide remarks? --wpd
Frederick Cheung
2009-Sep-28 21:42 UTC
Re: What is the most efficient way to split a table into 2 groups?
On Sep 28, 10:32 pm, Patrick Doyle <wpds...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> I will end up leaving this the way it is (most likely), especially > since this isn''t the right phase of development to be worrying about > optimization, but I am curious how one might do this most efficiently. > > Is it most efficient to grab 1 record where record.id = blah and then > all the (rest of the) records where record.id <> blah? > > Is it more efficient to grab all the records at once and write some > ruby code to select the one record from the rest? If so, what would > that code look like? I don''t like this:Try both, benchmark them - don''t take some random person on the internet''s word for it!> > everything = Part.all > the_one = everything.select {|x| x.id == 2} > the_rest = everything.reject {|x| x.id != 2} > > That''s going to iterate over all of the records. twice! in interpreted code!partition will do it in one Fred> > Any thoughts, ideas, or snide remarks? > > --wpd
Marnen Laibow-Koser
2009-Sep-28 22:07 UTC
Re: What is the most efficient way to split a table into 2 g
Patrick Doyle wrote:> I have the following: > > @lot = Lot.find(params[:id]) > > part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id]) > > I guess I should mention that > > Lot :belongs_to => :part > > I was looking at the log following the execution of these two > statements and I saw something like this: > > Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13) > Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2) > Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2) > > It looked a bit silly to me -- first I grab a record from the "parts" > table with an ID of 2, then I grab all the records from the parts > table whose ID is not 2.I think what''s happening here is this: you''re calling @lot.part.id, so Rails needs to load @lot.part, which accounts for the extra query. To fix, use the :joins option on Lot.find, or simply call @lot.part_id.> > I played around a little with :include clauses, thinking that I > should, at least, be able to fetch the record from the "lots" table > and the "parts" table simultaneously (with something like a joins > clause and a "lots.part_id = parts.id" WHERE clause), but didn''t get > anywhere with that.:includes is useless here. :joins will do the trick.> > I will end up leaving this the way it is (most likely), especially > since this isn''t the right phase of development to be worrying about > optimization, but I am curious how one might do this most efficiently. > > Is it most efficient to grab 1 record where record.id = blah and then > all the (rest of the) records where record.id <> blah? > > Is it more efficient to grab all the records at once and write some > ruby code to select the one record from the rest?That''s what I think I''d do.> If so, what would > that code look like? I don''t like this: > > everything = Part.all > the_one = everything.select {|x| x.id == 2} > the_rest = everything.reject {|x| x.id != 2} > > That''s going to iterate over all of the records. twice! in interpreted > code!So just iterate once, and test for the special value as you go. If you can''t do that, then make 2 DB queries.> > Any thoughts, ideas, or snide remarks? > > --wpdBest, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser
2009-Sep-28 22:18 UTC
Re: What is the most efficient way to split a table into 2 g
Frederick Cheung wrote: [...]>> That''s going to iterate over all of the records. �twice! �in interpreted code! > > partition will do it in oneHey, that''s good to know! If you need to do the partitioning in the DB, you could use ORDER BY abs(id - :magic_id). That way the first record will be the magic one.> > FredBest, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Patrick Doyle
2009-Sep-29 12:27 UTC
Re: What is the most efficient way to split a table into 2 groups?
On Mon, Sep 28, 2009 at 5:42 PM, Frederick Cheung <frederick.cheung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:>> On Sep 28, 10:32 pm, Patrick Doyle <wpds...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > partition will do it in one >Thanks Fred, that''s exactly what I was looking for.> Try both, benchmark them - don''t take some random person on the > internet''s word for it!Ahh, but you are not "some random person on the internet". You are an active member of this community who regularly answers questions for newbies and unfortunates who can be too lazy to look things up themselves. As such, I have come to appreciate and respect your answers, especially the ones that say, "the method for which you are looking is #partition, but it''s up to you to determine if it makes things go faster or not" :-) Thanks again for all the help. --wpd
Patrick Doyle
2009-Sep-29 12:30 UTC
Re: What is the most efficient way to split a table into 2 g
On Mon, Sep 28, 2009 at 6:18 PM, Marnen Laibow-Koser <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Frederick Cheung wrote: > [...] >>> That''s going to iterate over all of the records. �twice! �in interpreted code! >> >> partition will do it in one > > Hey, that''s good to know! > > If you need to do the partitioning in the DB, you could use ORDER BY > abs(id - :magic_id). That way the first record will be the magic one.Oh that''s clever! I like that! Thanks. --wpd
Patrick Doyle
2009-Sep-29 16:29 UTC
Re: What is the most efficient way to split a table into 2 g
On Mon, Sep 28, 2009 at 6:07 PM, Marnen Laibow-Koser <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Patrick Doyle wrote: >> I have the following: >> >> @lot = Lot.find(params[:id]) >> >> part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id]) >> >> I guess I should mention that >> >> Lot :belongs_to => :part >> >> I was looking at the log following the execution of these two >> statements and I saw something like this: >> >> Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13) >> Part Load (0.3ms) SELECT * FROM "parts" WHERE ("parts"."id" = 2) >> Part Load (0.9ms) SELECT * FROM "parts" WHERE (id <> 2) >> >> It looked a bit silly to me -- first I grab a record from the "parts" >> table with an ID of 2, then I grab all the records from the parts >> table whose ID is not 2. > > I think what''s happening here is this: you''re calling @lot.part.id, so > Rails needs to load @lot.part, which accounts for the extra query. To > fix, use the :joins option on Lot.find, or simply call @lot.part_id. >I should let this be, but now I''m curious... what does :joins actually do for me? (in script/console)>> l=Lot.find(13, :joins => :part)produces (in the logfile) SELECT "lots".* FROM "lots" INNER JOIN "parts" ON "parts".id "lots".part_id WHERE ("lots"."id" = 13) and that looks like what I wanted. But when I follow that up with>> l.partI see in the logfile: SELECT * from "parts" WHERE ("parts"."id" = 2) so it seems that the inner join was wasted. Is this a development vs. production thing? (I know, I''m not supposed to worry about optimization now, but I figure if I learn how to write generally more optimal code by default, it will only help). If I try:>> l=Lot.find(13, :include => :part)I see in the logfile: SELECT * FROM "lots" WHERE ("lots"."id" = 13) SELECT * FROM "parts" WHERE ("parts".id" = 2) so neither the :joins nor the :include options seem to help, although the :include option prevents the subsequent database access when I finally access l.part Again, this is not stopping me from anything, and I really should leave the optimization phase for later, but I am curious about the intended behavior of :include and :joins, and puzzled by the apparent behavior. --wpd