Santos, Jose Renato G
2006-Dec-06 01:35 UTC
[Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
This is a set of patches to improve performance of find_domain_by_id(). find_domain_by_id shows up high in profiles for network I/O intensive workloads. Most of the cost for this function comes from 3 main functions (of aproximate equal costs): 1)read_lock(), 2)read_unlock() and 3)get_domain(). These patches replace the lock used for accessing domain_list and domain_hash with a lock free RCU scheme. Experiments confirm that the cost of find_domain_by_id() is in fact reduced by 2/3. The patches apply cleanly to changeset 12732. Renato Patches: 1/2 - Import linux RCU code into Xen 2/2 - replace domlist_lock operations by RCU operations Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Dec-06 08:43 UTC
Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
On 6/12/06 1:35 am, "Santos, Jose Renato G" <joserenato.santos@hp.com> wrote:> > This is a set of patches to improve performance of find_domain_by_id(). > find_domain_by_id shows up high in profiles for network I/O intensive > workloads. > Most of the cost for this function comes from 3 main functions (of > aproximate equal costs): 1)read_lock(), 2)read_unlock() and > 3)get_domain(). > These patches replace the lock used for accessing domain_list and > domain_hash with a lock free RCU scheme. Experiments confirm that the > cost of find_domain_by_id() is in fact reduced by 2/3. > The patches apply cleanly to changeset 12732.Do you have numbers for performance improvement on a macro benchmark? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy
2006-Dec-06 10:41 UTC
Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
I also spotted find_domain_by_id() showing up rather high in network intensive workloads. The CPU overhead of our network I/O path is pretty large so it''s worth trying to address and if I remember, that one was oddly rather high on the list of low hanging fruits. Find_domain_by_id() is called from __gnttab_map_grant_ref() which is typically called N times on an array of grant ops from gnttab_map_grant_ref(). Perhaps we could find a way to optimize the common case here and only lookup and hold the domain once per OP array instead of once per op in the multi op? We could also cleanup some code while there: if ( unlikely((rd = find_domain_by_id(op->dom)) == NULL) ) { vvvvvvvvvvvvvvvvvvvvvvvv if ( rd != NULL ) put_domain(rd); ^^^^^^^^^^^^^^^^^^^^^^^^ WTF??? DPRINTK("Could not find domain %d\n", op->dom); op->status = GNTST_bad_domain; return; } It''s a bit puzzling to me that grabbing the lock adds such an overhead. Is this purely a lock operation overhead or is there contention on the lock cache line (could find this out by profiling for data cache line misses)? Cheers, Emmanuel. On Tue, Dec 05, 2006 at 07:35:37PM -0600, Santos, Jose Renato G wrote:> > This is a set of patches to improve performance of find_domain_by_id(). > find_domain_by_id shows up high in profiles for network I/O intensive > workloads. > Most of the cost for this function comes from 3 main functions (of > aproximate equal costs): 1)read_lock(), 2)read_unlock() and > 3)get_domain(). > These patches replace the lock used for accessing domain_list and > domain_hash with a lock free RCU scheme. Experiments confirm that the > cost of find_domain_by_id() is in fact reduced by 2/3. > The patches apply cleanly to changeset 12732. > > Renato > > Patches: > 1/2 - Import linux RCU code into Xen > 2/2 - replace domlist_lock operations by RCU operations > > Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparao, Padmashree K
2006-Dec-06 11:52 UTC
RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
Keir, Earlier this summer I had done some experiments with Iperf and found that find_domain_id was consuming 7% of cpu while do_grant_table_op was consuming 10.5%. A quick fix to figure what the upper bound of performance gain was to remove the hash table lookups in find_domain_by_id and we saw a 12% improvement in perf receive performance. The results of these finding were published in our paper at VTDC. In that paper we also discuss how the inter vm communication by grant tables was consuming a considerable amount of time and what could be possibly done to optimize. I will send the paper is a separate note to you and Renato. Thanks - Padma -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser Sent: Wednesday, December 06, 2006 12:44 AM To: Santos, Jose Renato G; xen-devel@lists.xensource.com Cc: Yoshio Turner; Jose Renato Santos; G John Janakiraman Subject: Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2] On 6/12/06 1:35 am, "Santos, Jose Renato G" <joserenato.santos@hp.com> wrote:> > This is a set of patches to improve performance offind_domain_by_id().> find_domain_by_id shows up high in profiles for network I/O intensive > workloads. > Most of the cost for this function comes from 3 main functions (of > aproximate equal costs): 1)read_lock(), 2)read_unlock() and > 3)get_domain(). > These patches replace the lock used for accessing domain_list and > domain_hash with a lock free RCU scheme. Experiments confirm that the > cost of find_domain_by_id() is in fact reduced by 2/3. > The patches apply cleanly to changeset 12732.Do you have numbers for performance improvement on a macro benchmark? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Dec-06 12:12 UTC
Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
Okay, sounds good. I don''t want the whole Linux RCU mechanism in Xen particularly. Linux may well need the complexity because it leans on RCU quite heavily these days, but Xen will do so a whole lot less I''m sure and something much simpler should suffice. I''ll look into this -- the patches you sent provide a good starting point, thanks! -- Keir On 6/12/06 11:52, "Apparao, Padmashree K" <padmashree.k.apparao@intel.com> wrote:> Keir, > > Earlier this summer I had done some experiments with Iperf and found > that find_domain_id was consuming 7% of cpu while do_grant_table_op was > consuming 10.5%. A quick fix to figure what the upper bound of > performance gain was to remove the hash table lookups in > find_domain_by_id and we saw a 12% improvement in perf receive > performance. > The results of these finding were published in our paper at VTDC. > In that paper we also discuss how the inter vm communication by grant > tables was consuming a considerable amount of time and what could be > possibly done to optimize. > > I will send the paper is a separate note to you and Renato. > > Thanks > - Padma > > > > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > Sent: Wednesday, December 06, 2006 12:44 AM > To: Santos, Jose Renato G; xen-devel@lists.xensource.com > Cc: Yoshio Turner; Jose Renato Santos; G John Janakiraman > Subject: Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() > [0/2] > > On 6/12/06 1:35 am, "Santos, Jose Renato G" <joserenato.santos@hp.com> > wrote: > >> >> This is a set of patches to improve performance of > find_domain_by_id(). >> find_domain_by_id shows up high in profiles for network I/O intensive >> workloads. >> Most of the cost for this function comes from 3 main functions (of >> aproximate equal costs): 1)read_lock(), 2)read_unlock() and >> 3)get_domain(). >> These patches replace the lock used for accessing domain_list and >> domain_hash with a lock free RCU scheme. Experiments confirm that the >> cost of find_domain_by_id() is in fact reduced by 2/3. >> The patches apply cleanly to changeset 12732. > > Do you have numbers for performance improvement on a macro benchmark? > > -- Keir > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Santos, Jose Renato G
2006-Dec-07 03:49 UTC
RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
Keir, I have already removed the following from linux RCU code: 1) "bh" version of RCU functions, not needed in Xen since sofirqs cannot preempt hypervisor 2) synchronization calls that block waiting for readers to complete. (we just need asyncronous call back) 3) support for hotplug CPUs (Xen does not support it). Looking at the code again, the following could also be removed: 1) logic and functions associated with limiting the RCU batch size This is probably not needed since we do not expect a lot of RCU activity 2) Functions which are not being used: 2.1) rcu_batch_after 2.2) rcu_needs_cpu 2.3) rcu_batches_completed Can''t think of anything else. If you want I can remove these and submit a revised patch. It seems that there is not much more we could remove. Not sure if you have something else in mind... Regards Renato> -----Original Message----- > From: Keir Fraser [mailto:keir@xensource.com] > Sent: Wednesday, December 06, 2006 4:13 AM > To: Apparao, Padmashree K; Keir Fraser; Santos, Jose Renato > G; xen-devel@lists.xensource.com > Cc: Turner, Yoshio; Jose Renato Santos; G John Janakiraman > Subject: Re: [Xen-devel] [PATCH] Reduce overhead in > find_domain_by_id() [0/2] > > > Okay, sounds good. I don''t want the whole Linux RCU mechanism > in Xen particularly. Linux may well need the complexity > because it leans on RCU quite heavily these days, but Xen > will do so a whole lot less I''m sure and something much > simpler should suffice. I''ll look into this -- the patches > you sent provide a good starting point, thanks! > > -- Keir > > On 6/12/06 11:52, "Apparao, Padmashree K" > <padmashree.k.apparao@intel.com> > wrote: > > > Keir, > > > > Earlier this summer I had done some experiments with Iperf > and found > > that find_domain_id was consuming 7% of cpu while do_grant_table_op > > was consuming 10.5%. A quick fix to figure what the upper bound of > > performance gain was to remove the hash table lookups in > > find_domain_by_id and we saw a 12% improvement in perf receive > > performance. > > The results of these finding were published in our paper at VTDC. > > In that paper we also discuss how the inter vm > communication by grant > > tables was consuming a considerable amount of time and what > could be > > possibly done to optimize. > > > > I will send the paper is a separate note to you and Renato. > > > > Thanks > > - Padma > > > > > > > > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir > > Fraser > > Sent: Wednesday, December 06, 2006 12:44 AM > > To: Santos, Jose Renato G; xen-devel@lists.xensource.com > > Cc: Yoshio Turner; Jose Renato Santos; G John Janakiraman > > Subject: Re: [Xen-devel] [PATCH] Reduce overhead in > > find_domain_by_id() [0/2] > > > > On 6/12/06 1:35 am, "Santos, Jose Renato G" > <joserenato.santos@hp.com> > > wrote: > > > >> > >> This is a set of patches to improve performance of > > find_domain_by_id(). > >> find_domain_by_id shows up high in profiles for network > I/O intensive > >> workloads. > >> Most of the cost for this function comes from 3 main functions (of > >> aproximate equal costs): 1)read_lock(), 2)read_unlock() and > >> 3)get_domain(). > >> These patches replace the lock used for accessing domain_list and > >> domain_hash with a lock free RCU scheme. Experiments > confirm that the > >> cost of find_domain_by_id() is in fact reduced by 2/3. > >> The patches apply cleanly to changeset 12732. > > > > Do you have numbers for performance improvement on a macro > benchmark? > > > > -- Keir > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Santos, Jose Renato G
2006-Dec-07 04:07 UTC
RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
> -----Original Message----- > From: Emmanuel Ackaouy [mailto:ack@xensource.com] > Sent: Wednesday, December 06, 2006 2:42 AM > To: Santos, Jose Renato G > Cc: xen-devel@lists.xensource.com; Turner, Yoshio; Jose > Renato Santos; G John Janakiraman > Subject: Re: [Xen-devel] [PATCH] Reduce overhead in > find_domain_by_id() [0/2] > > I also spotted find_domain_by_id() showing up rather high in > network intensive workloads. The CPU overhead of our network > I/O path is pretty large so it''s worth trying to address and > if I remember, that one was oddly rather high on the list of > low hanging fruits. > > Find_domain_by_id() is called from __gnttab_map_grant_ref() > which is typically called N times on an array of grant ops > from gnttab_map_grant_ref(). Perhaps we could find a way to > optimize the common case here and only lookup and hold the > domain once per OP array instead of once per op in the multi op? > > We could also cleanup some code while there: > > if ( unlikely((rd = find_domain_by_id(op->dom)) == NULL) ) > { > vvvvvvvvvvvvvvvvvvvvvvvv > if ( rd != NULL ) > put_domain(rd); > ^^^^^^^^^^^^^^^^^^^^^^^^ WTF??? > DPRINTK("Could not find domain %d\n", op->dom); > op->status = GNTST_bad_domain; > return; > } > > It''s a bit puzzling to me that grabbing the lock adds such an > overhead. Is this purely a lock operation overhead or is > there contention on the lock cache line (could find this out > by profiling for data cache line misses)? >Yes, this is due to cache contention on the lock. There is also cache contention on the domain refcnt used by get_domain(). I just implemented a percpu version of the reference count that avoids cache contention and the cost of find_domain_by_id() is reduced either further. Currently, find_domain_by_id() consumes approximately 3.05% of the total CPU cycles for a TCP TX micro benchmark. With the RCU scheme this is reduced to 1.16%. And with a per cpu reference count mechanism, this is reduced to 0.31%. I can submit a patch for the percpu reference count after I clean up the code a bit. Regards Renato> Cheers, > Emmanuel. > > On Tue, Dec 05, 2006 at 07:35:37PM -0600, Santos, Jose Renato G wrote: > > > > This is a set of patches to improve performance of > find_domain_by_id(). > > find_domain_by_id shows up high in profiles for network I/O > intensive > > workloads. > > Most of the cost for this function comes from 3 main functions (of > > aproximate equal costs): 1)read_lock(), 2)read_unlock() and > > 3)get_domain(). > > These patches replace the lock used for accessing domain_list and > > domain_hash with a lock free RCU scheme. Experiments > confirm that the > > cost of find_domain_by_id() is in fact reduced by 2/3. > > The patches apply cleanly to changeset 12732. > > > > Renato > > > > Patches: > > 1/2 - Import linux RCU code into Xen > > 2/2 - replace domlist_lock operations by RCU operations > > > > Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com> > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Dec-07 12:01 UTC
Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
On 7/12/06 03:49, "Santos, Jose Renato G" <joserenato.santos@hp.com> wrote:> Can''t think of anything else. If you want I can remove these and submit > a revised patch. > It seems that there is not much more we could remove. Not sure if you > have something else in mind...I suppose the patch isn''t actually as big as I first imagined. I''ll review it and probably apply it pretty much as is. It''s just that RCU has always seemed rather over complicated to me (lots of different queues, for example) for what should be a rather simple concept to implement. This is possibly just ignorance on my part. :-) As for per-cpu refcounts, I suspect a better scheme would be find_domain_by_id_noref(). Idea being that often we take a reference for a short period of time (in particular, scope of one function) and with delayed destruction we can now safely use a found domain pointer with no refcnt increment. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Santos, Jose Renato G
2006-Dec-08 04:17 UTC
RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
> -----Original Message----- > From: Keir Fraser [mailto:keir@xensource.com] > Sent: Thursday, December 07, 2006 4:02 AM > To: Santos, Jose Renato G; Keir Fraser; Apparao, Padmashree > K; xen-devel@lists.xensource.com > Cc: Turner, Yoshio; Jose Renato Santos; G John Janakiraman > Subject: Re: [Xen-devel] [PATCH] Reduce overhead in > find_domain_by_id() [0/2] > > On 7/12/06 03:49, "Santos, Jose Renato G" > <joserenato.santos@hp.com> wrote: > > > Can''t think of anything else. If you want I can remove these and > > submit a revised patch. > > It seems that there is not much more we could remove. Not > sure if you > > have something else in mind... > > I suppose the patch isn''t actually as big as I first > imagined. I''ll review it and probably apply it pretty much as > is. It''s just that RCU has always seemed rather over > complicated to me (lots of different queues, for example) for > what should be a rather simple concept to implement. This is > possibly just ignorance on my part. :-) > >Yes, the multiple queues seems somewhat complicated, but I think we need them to be able to handle multiple callbacks. Maybe we can get rid of of of them ("done")...> As for per-cpu refcounts, I suspect a better scheme would be > find_domain_by_id_noref(). Idea being that often we take a > reference for a short period of time (in particular, scope of > one function) and with delayed destruction we can now safely > use a found domain pointer with no refcnt increment. >Good point. This is indeed a much better scheme. I will work out a patch for find_domain_by_id_noref(). Thanks Renato> -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel