NISHIGUCHI Naoki
2008-Apr-22 02:42 UTC
[Xen-devel] [PATCH] x86: fix NULL function call in timer_softirq_action()
Hi, In VT-d enabled and SMP machine, when start HVM guests that was assigned device such as "pci = [''01:00.0'']", sometimes panic happens! This panic occurs because of NULL function call in timer_softirq_action(). Attached patch fixes this problem. This panic''s cause was find_first_bit() in vmx_dirq_assist(). In find_first_bit(__find_first_bit) function, "repe; scas" instruction and "bsf" instruction refer addresses of a bitmap. If clear_bit() is called to clear a bit of the bitmap between above instructions, eax register''s value is zero after execution of "bsf" instruction. As a result, the return value of find_first_bit() will be 0, 64, 128 or 192(on x86_64 arch). In this case, vmx_dirq_assist() calls set_timer() about the bit not to be set. If hvm_timer(timer structure) about the bit is not initialized, timer_softirq_action() will call zero address. Only in VT-d enabled and SMP machine, clear_bit() is called in pt_irq_time_out() on another cpu. Signed-off-by: Naoki Nishiguchi <nisiguti@jp.fujitsu.com> Regards, Naoki Nishiguchi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Apr-22 10:29 UTC
Re: [Xen-devel] [PATCH] x86: fix NULL function call in timer_softirq_action()
On 22/4/08 03:42, "NISHIGUCHI Naoki" <nisiguti@jp.fujitsu.com> wrote:> This panic''s cause was find_first_bit() in vmx_dirq_assist(). > In find_first_bit(__find_first_bit) function, "repe; scas" instruction > and "bsf" instruction refer addresses of a bitmap. If clear_bit() is > called to clear a bit of the bitmap between above instructions, eax > register''s value is zero after execution of "bsf" instruction. As a > result, the return value of find_first_bit() will be 0, 64, 128 or > 192(on x86_64 arch). > In this case, vmx_dirq_assist() calls set_timer() about the bit not to > be set. If hvm_timer(timer structure) about the bit is not initialized, > timer_softirq_action() will call zero address.Good catch. Actually our usage of BSF is generally bad because Intel does not guarantee the contents of the destination register when the source is zero (we are currently assuming the destination is left intact in that case). I will fix that up too. Also I think vmx_dirq_assist() is broken because it assumes that it ''owns'' the bit it finds. But actually two VCPUs can race to clear_bit() and both increment, for example, the mirq[irq].pending field. I think the loop body should start with if ( !test_and_clear_bit(...) ) continue. I will make that change also. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel