Hi all I''m implementing a TX zero-copy prototype for Xen netback. It is very common for several guests to connect through a bridge and communicate with each other. So in the RX path there is something like: if (page is from another domU) retrieve this page''s src_gref and owner src_dom grant copy this (src_dom,src_gref) to dest domU (dst_dom,dst_gref) Actually the code is doing grant copy from one gref to another gref, only that the src_gref has been already mapped in Dom0. Then we go down to hypervisor: __gnttab_copy { __acquire_grant_for_copy(src_gref) __acquire_grant_for_copy(dst_gref) ...copy... __release_grant_for_copy(src_gref) __release_grant_for_cooy(dst_gref) } __acquire_grant_for_copy(rd,gref) { act <- get active entry for gref if (!act->pin) { check stuff for transitive grant if (!act->pin) { set fields in act } } else { set owning_domain } } __release_grant_for_copy(rd,gref) { act <- get active entry for gref if (grant table version is 1) { use v1 stuff } else { td = act->trans_domain trans_gref = act->trans_gref } if (td != rd) { recursively release grant rcu_unlock_domain(td) } } Because src_gref is already mapped in Dom0, so its act->pin is not 0. When we come to __release_grant_for_copy, since we''re using version 2, so td = act->trans_domain, in which case it is NULL(?!). rd is not NULL, so (td != rd), we do a rcu_unlock_domain(NULL), which messes up the preemption count. Finally it triggers ASSERT(!in_atomic()) in do_softirq. I haven''t modified netfront to use transitive grant. I don''t know whether I found a bug or I did things in a wrong way. However rcu_unlocking NULL looks quite buggy to me, shouldn''t we at least guard against this case and fail earlier (in grant release code path)? Any advice is welcomed. Wei.
Paul Durrant
2012-Feb-29 10:39 UTC
Re: Question on grant copying a previous grant mapped page
That sounds like a basic bug in gnttab2 and it also sounds very similar to something I thought I already fixed. Paul> -----Original Message----- > From: Wei Liu (Intern) > Sent: 29 February 2012 10:36 > To: xen-devel > Cc: Wei Liu (Intern); Keir (Xen.org); Paul Durrant > Subject: Question on grant copying a previous grant mapped page > > Hi all > > I''m implementing a TX zero-copy prototype for Xen netback. It is very > common for several guests to connect through a bridge and communicate > with each other. So in the RX path there is something like: > > if (page is from another domU) > retrieve this page''s src_gref and owner src_dom > grant copy this (src_dom,src_gref) to dest domU (dst_dom,dst_gref) > > Actually the code is doing grant copy from one gref to another gref, only > that the src_gref has been already mapped in Dom0. > > Then we go down to hypervisor: > > __gnttab_copy > { > __acquire_grant_for_copy(src_gref) > __acquire_grant_for_copy(dst_gref) > ...copy... > __release_grant_for_copy(src_gref) > __release_grant_for_cooy(dst_gref) > } > > __acquire_grant_for_copy(rd,gref) > { > act <- get active entry for gref > if (!act->pin) { > check stuff for transitive grant > if (!act->pin) { > set fields in act > } > } else { > set owning_domain > } > } > > __release_grant_for_copy(rd,gref) > { > act <- get active entry for gref > if (grant table version is 1) { > use v1 stuff > } else { > td = act->trans_domain > trans_gref = act->trans_gref > } > if (td != rd) { > recursively release grant > rcu_unlock_domain(td) > } > } > > Because src_gref is already mapped in Dom0, so its act->pin is not 0. > When we come to __release_grant_for_copy, since we''re using version 2, so > td = act->trans_domain, in which case it is NULL(?!). rd is not NULL, so (td !> rd), we do a rcu_unlock_domain(NULL), which messes up the preemption > count. Finally it triggers ASSERT(!in_atomic()) in do_softirq. > > I haven''t modified netfront to use transitive grant. I don''t know whether I > found a bug or I did things in a wrong way. However rcu_unlocking NULL > looks quite buggy to me, shouldn''t we at least guard against this case and > fail earlier (in grant release code path)? > > Any advice is welcomed. > > > Wei.
On Wed, 2012-02-29 at 10:39 +0000, Paul Durrant wrote:> That sounds like a basic bug in gnttab2 and it also sounds very similar to something I thought I already fixed. > > PaulYou mean this one? changeset: 22994:299ed79acecf user: Keir Fraser <keir@xen.org> date: Tue Mar 08 16:30:30 2011 +0000 files: xen/common/grant_table.c xen/include/xen/grant_table.h description: Fix rcu domain locking for transitive grants When acquiring a transitive grant for copy then the owning domain needs to be locked down as well as the granting domain. This was being done, but the unlocking was not. The acquire code now stores the struct domain * of the owning domain (rather than the domid) in the active entry in the granting domain. The release code then does the unlock on the owning domain. Note that I believe I also fixed a bug where, for non-transitive grants the active entry contained a reference to the acquiring domain rather than the granting domain. From my reading of the code this would stop the release code for transitive grants from terminating its recursion correctly. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> This is the non-transitive case, but active entry contains no reference to any domain. Wei.