thr3ads.net - Rails - Activerecord include queries

If this information is useful, please help other people find it:
Share via:

bluescreen303-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org

2008-Jan-03 19:50 UTC

Activerecord include queries - performance

Hi,

I noticed ActiveRecord uses some join queries to perform :include
stuff.
While this certainly fixes the N+1 problem, I''m wondering about the
amount of sparse data this generates.

Example:
Task has many TodoItems
Both Task and TodoItem only have a name to keep things simple.

Now, when I do :
Task.find(:all, :include => :todo_items)

it executes:
SELECT tasks."id" AS t0_r0, tasks."name" AS t0_r1,
todo_items."id" AS
t1_r0, todo_items."name" AS t1_r1, todo_items."task_id" AS
t1_r2 FROM
tasks LEFT OUTER JOIN todo_items ON todo_items.task_id = tasks.id

which will transfer back:
1|task 1|1|item 1|1
1|task 1|2|item 2|1
1|task 1|3|item 4|1
1|task 1|4|item 3|1
2|task 2|5|item 1|2
2|task 2|6|item 2|2
2|task 2|7|item 3|2
2|task 2|8|item 4|2
3|task 3|9|item 1|3
3|task 3|10|item 2|3
3|task 3|11|item 3|3
3|task 3|12|item 4|3

As you can see, all fields for tasks are repeated for every todo item.
While in this small example this wouldn''t matter at all, my guess is
that it _will_ matter more when there are more fields to a task, more
items, or deeper nested joins. If I want to join data from say, 5
tables, all task data will get repeated a lot of times. While using
mysql through a socket interface, I think it''s neglectable (since
queries that large will be quite slow anyway), but when connection to
an external sql server, all this sparse data has to get transferred
over the network.

Of course it''s still a lot faster than not using the include, since
looping over tasks would then generate:
SELECT * FROM tasks
SELECT * FROM todo_items WHERE (todo_items.task_id = 1)
SELECT * FROM todo_items WHERE (todo_items.task_id = 2)
SELECT * FROM todo_items WHERE (todo_items.task_id = 3)

(n+1) queries which is worse.

But how about this:
tasks=Task.find(:all)
todos = TodoItem.find(:all, :conditions => { :task_id => tasks.map {|
x| x.id} })

what will generate:
SELECT * FROM tasks
SELECT * FROM todo_items WHERE (todo_items."task_id" IN (1,2,3))

which is 1 query per table.
of course after this you''d have to find a way to
''link'' the retrieved
todos to the corresponding tasks, but I guess activerecord would be
able to handle this (since it does it for the current, left-outer-join
behaviour as well).

Now, I''m no database expert so I might be overlooking something.
Also I haven''t really done any benchmarks.
But like I said, it looks to me as if the current ActiveRecord
behaviour is quite sparse for transfering/parsing the data as soon as
you 3 or 4 tables, with long rows, and many "children".
Am I overlooking something?

Any thoughts on this matter would be nice.

Thanks,
Mathijs


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Frederick Cheung

2008-Jan-03 20:00 UTC

head link

Re: Activerecord include queries - performance

On 3 Jan 2008, at 19:50, bluescreen303-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
wrote:
>
> Hi,
>
> I noticed ActiveRecord uses some join queries to perform :include
> stuff.
> While this certainly fixes the N+1 problem, I''m wondering about
the
> amount of sparse data this generates.
>
It''s not ideal at all (and compounded by some inefficiencies in the  
way the data that is returned is parsed). There is a ticket/plan to  
reimplement this as separate queries.

Fred> Example:
> Task has many TodoItems
> Both Task and TodoItem only have a name to keep things simple.
>
> Now, when I do :
> Task.find(:all, :include => :todo_items)
>
> it executes:
> SELECT tasks."id" AS t0_r0, tasks."name" AS t0_r1,
todo_items."id" AS
> t1_r0, todo_items."name" AS t1_r1, todo_items."task_id"
AS t1_r2 FROM
> tasks LEFT OUTER JOIN todo_items ON todo_items.task_id = tasks.id
>
> which will transfer back:
> 1|task 1|1|item 1|1
> 1|task 1|2|item 2|1
> 1|task 1|3|item 4|1
> 1|task 1|4|item 3|1
> 2|task 2|5|item 1|2
> 2|task 2|6|item 2|2
> 2|task 2|7|item 3|2
> 2|task 2|8|item 4|2
> 3|task 3|9|item 1|3
> 3|task 3|10|item 2|3
> 3|task 3|11|item 3|3
> 3|task 3|12|item 4|3
>
> As you can see, all fields for tasks are repeated for every todo item.
> While in this small example this wouldn''t matter at all, my guess
is
> that it _will_ matter more when there are more fields to a task, more
> items, or deeper nested joins. If I want to join data from say, 5
> tables, all task data will get repeated a lot of times. While using
> mysql through a socket interface, I think it''s neglectable (since
> queries that large will be quite slow anyway), but when connection to
> an external sql server, all this sparse data has to get transferred
> over the network.
>
> Of course it''s still a lot faster than not using the include,
since
> looping over tasks would then generate:
> SELECT * FROM tasks
> SELECT * FROM todo_items WHERE (todo_items.task_id = 1)
> SELECT * FROM todo_items WHERE (todo_items.task_id = 2)
> SELECT * FROM todo_items WHERE (todo_items.task_id = 3)
>
> (n+1) queries which is worse.
>
> But how about this:
> tasks=Task.find(:all)
> todos = TodoItem.find(:all, :conditions => { :task_id => tasks.map {|
> x| x.id} })
>
> what will generate:
> SELECT * FROM tasks
> SELECT * FROM todo_items WHERE (todo_items."task_id" IN (1,2,3))
>
> which is 1 query per table.
> of course after this you''d have to find a way to
''link'' the retrieved
> todos to the corresponding tasks, but I guess activerecord would be
> able to handle this (since it does it for the current, left-outer-join
> behaviour as well).
>
> Now, I''m no database expert so I might be overlooking something.
> Also I haven''t really done any benchmarks.
> But like I said, it looks to me as if the current ActiveRecord
> behaviour is quite sparse for transfering/parsing the data as soon as
> you 3 or 4 tables, with long rows, and many "children".
> Am I overlooking something?
>
> Any thoughts on this matter would be nice.
>
> Thanks,
> Mathijs
>
>
> >

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Gregory Seidman

2008-Jan-03 23:55 UTC

head link

Re: Activerecord include queries - performance

On Thu, Jan 03, 2008 at 11:50:46AM -0800,
bluescreen303-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
wrote:> I noticed ActiveRecord uses some join queries to perform :include
> stuff.
> While this certainly fixes the N+1 problem, I''m wondering about
the
> amount of sparse data this generates.
[...]> Now, I''m no database expert so I might be overlooking something.
> Also I haven''t really done any benchmarks.
> But like I said, it looks to me as if the current ActiveRecord
> behaviour is quite sparse for transfering/parsing the data as soon as
> you 3 or 4 tables, with long rows, and many "children".
> Am I overlooking something?
> 
> Any thoughts on this matter would be nice.
You are unconsciously assuming that the cost of sending extra data is
usually greater than the cost of associating records outside the database,
not to mention multiple database optimize and execute cycles. A properly
indexed database is going to perform the association of records very
efficiently for you, since that''s one of the things for which RDBMS
software is designed and optimized. Note that YMMV with poorly written
RDBMSs; I''ve had MySQL''s query optimizer (not the execution
engine, the
query optimizer) completely hang when trying to deal with joins across a
lot of tables (18, in this case).

The real cost of sending lots of data is, of course, more than just
bandwidth. There is the cost of manifesting the result rows in the RDBMS
and serializing them, which can cause thrashing if the result set is really
big and have an even greater cost. There is the parsing cost for the database
driver to bring the received data into memory. There is the cost of hydrating
the models from the result set in memory.

These costs must be balanced against the smaller costs of the same thing
done multiple times, including query overhead, to produce the multiple
result sets, then performing the record association in memory.
Incidentally, if you have the right indices for a WHERE IN clause to be
efficient, you have the right indices for the joins to be efficient, so
that''s a wash.

The real question is where the line is such that the cost of dealing with
the number of rows, M, produced by a multi-table join is greater than the
cost of multiple queries and associating records in code. I think you need
a whole lot of superfluous rows to win with multiple queries, but I could
be wrong. This is where benchmarking comes in. Please do investigate.
> Thanks,
> Mathijs--Greg


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Rails - Jan 2008 - Activerecord include queries - performance

Activerecord include queries - performance

Re: Activerecord include queries - performance

Re: Activerecord include queries - performance