thr3ads.net - Xen devel - [Xen-devel] Problem with PV disk and iSCSI [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Gary Grebus

2008-Feb-08 19:54 UTC

[Xen-devel] Problem with PV disk and iSCSI

I''ve run into a problem on 3.1.2 with an HVM guest using PV disks.  In
dom0, the physical disk is accessed using iSCSI.  The symptom is that
applications in dom0 which are monitoring the iSCSI network interface
(e.g. tcpdump) die with EFAULT errors.

When the block I/O completes, it looks like blkback is doing a
GNTTABOP_unmap_grant_ref on a guest page, even though the dom0 kernel
has done get_page() on it and still holds references.  

The page had been passed through iSCSI into the network stack, so it
ends up referenced by one or more skb''s.  Because there was an
AF_PACKET
socket open, a clone of the skb ends up queued for an indeterminate
amount on that socket queue.  When the application finally gets around
to reading the data, the page is no longer mapped, and the read fails
trying to copy the data out of the kernel.

Has anyone else seen anything similar?  I mentioned tcpdump, but the
problem also shows up with dhcpcd, which needs to process packets at the
ethernet layer.  

I''m thinking blkback will have to make a dom0 copy of the page before
doing the unmap if there are still extra references?

	Gary
 
-- 
Gary Grebus
Virtual Iron Software, Inc.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan de Konink

2008-Feb-08 20:00 UTC

head link

Re: [Xen-devel] Problem with PV disk and iSCSI

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Gary Grebus schreef:> I''ve run into a problem on 3.1.2 with an HVM guest using PV disks.
In
> dom0, the physical disk is accessed using iSCSI.  The symptom is that
> applications in dom0 which are monitoring the iSCSI network interface
> (e.g. tcpdump) die with EFAULT errors.
> 
> When the block I/O completes, it looks like blkback is doing a
> GNTTABOP_unmap_grant_ref on a guest page, even though the dom0 kernel
> has done get_page() on it and still holds references.  
> 
> The page had been passed through iSCSI into the network stack, so it
> ends up referenced by one or more skb''s.  Because there was an
AF_PACKET
> socket open, a clone of the skb ends up queued for an indeterminate
> amount on that socket queue.  When the application finally gets around
> to reading the data, the page is no longer mapped, and the read fails
> trying to copy the data out of the kernel.
> 
> Has anyone else seen anything similar?  I mentioned tcpdump, but the
> problem also shows up with dhcpcd, which needs to process packets at the
> ethernet layer.  
> 
> I''m thinking blkback will have to make a dom0 copy of the page
before
> doing the unmap if there are still extra references?
I''m running the same setup. Are you using iSCSI over the same interface
as your Xen bridge?


Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHrLThYH1+F2Rqwn0RCgwNAJ4my+4sQvRxUzIIYp88GKY04I4j0wCfU0FN
0zHBUpww1N7mSaMV4CLnjEo=Q+wF
-----END PGP SIGNATURE-----

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kurt Hackel

2008-Feb-09 06:15 UTC

head link

Re: [Xen-devel] Problem with PV disk and iSCSI

Hi Gary,

On Fri, Feb 08, 2008 at 02:54:14PM -0500, Gary Grebus
wrote:> I''ve run into a problem on 3.1.2 with an HVM guest using PV disks.
In
> dom0, the physical disk is accessed using iSCSI.  The symptom is that
> applications in dom0 which are monitoring the iSCSI network interface
> (e.g. tcpdump) die with EFAULT errors.
> 
> When the block I/O completes, it looks like blkback is doing a
> GNTTABOP_unmap_grant_ref on a guest page, even though the dom0 kernel
> has done get_page() on it and still holds references.  
> 
> The page had been passed through iSCSI into the network stack, so it
> ends up referenced by one or more skb''s.  Because there was an
AF_PACKET
> socket open, a clone of the skb ends up queued for an indeterminate
> amount on that socket queue.  When the application finally gets around
> to reading the data, the page is no longer mapped, and the read fails
> trying to copy the data out of the kernel.
> 
> Has anyone else seen anything similar?  I mentioned tcpdump, but the
> problem also shows up with dhcpcd, which needs to process packets at the
> ethernet layer.  
> 
We''re seeing the same thing with 3.1.3.  When running iscsi in dom0
(over a xen bridge) presenting these via blkfront to the guest we see 
the same crash (below) while performing failover tests on the storage
controller.

Just as you said, the error occurs in skb_remove_foreign_references from
loopback_start_xmit.  It''s running all the foreign pages, attempting to
copy each locally when it dies on the source address (esi) of the
following memcpy:

115                 vaddr = kmap_skb_frag(&skb_shinfo(skb)->frags[i]);
116                 off = skb_shinfo(skb)->frags[i].page_offset;
117                 memcpy(page_address(page) + off,
118                       vaddr + off,
119                        skb_shinfo(skb)->frags[i].size);

c053f2f7:       0f b7 74 c8 18          movzwl 0x18(%eax,%ecx,8),%esi
c053f2fc:       0f b7 5c c8 1a          movzwl 0x1a(%eax,%ecx,8),%ebx
c053f301:       8b 44 24 0c             mov    0xc(%esp),%eax
c053f305:       e8 ba 09 f1 ff          call   0xc044fcc4  page_address
c053f30a:       89 d9                   mov    %ebx,%ecx
c053f30c:       c1 e9 02                shr    $0x2,%ecx
c053f30f:       8d 3c 30                lea    (%eax,%esi,1),%edi
c053f312:       03 74 24 04             add    0x4(%esp),%esi
c053f316:       f3 a5                   rep movsl %ds:(%esi),%es:(%edi)
<<<<<    memcpy
ds: 007b esi: c0df7000 es: 007b edi: ebffb000

It seems one of the skb->frags has been unmapped.

> I''m thinking blkback will have to make a dom0 copy of the page
before
> doing the unmap if there are still extra references?
>
Can the unmap be deferred, handled by the last reference holder?  Or
does this open up a potential security hole?


Thanks
kurt


Kurt Hackel
Oracle Corp.


==========================================
BUG: unable to handle kernel paging request at virtual address c0df7000
 printing eip:
c053f316
36d4c000 -> *pde = 00000000:c4237027
36c37000 -> *pme = 00000001:1bd14067
00d14000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
SMP 
Modules linked in: xt_physdev bridge autofs4 sunrpc dm_round_robin
ip_conntrack_netbios_ns ipt_REJECT xt_tcpudp xt_state ip_conntrack
nfnetlink
iptable_filter ip_tables x_tables ib_iser rdma_cm ib_addr ib_cm ib_sa
ib_mad
ib_core iscsi_tcp libiscsi scsi_transport_iscsi ocfs2(U) ocfs2_dlm(U)
ocfs2_nodemanager(U) configfs dm_mirror dm_multipath dm_mod video sbs
i2c_ec
button battery asus_acpi ac parport_pc lp parport joydev sg i2c_piix4
i2c_core
pcspkr k8_edac edac_mc tg3 ide_cd serio_raw serial_core cdrom qla2xxx
scsi_transport_fc sata_svw libata mptspi mptscsih mptbase
scsi_transport_spi
sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    3
EIP:    0061:[<c053f316>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18-8.1.6.0.18.el5xen #1) 
EIP is at loopback_start_xmit+0x107/0x2bd
eax: ebffb000   ebx: 00000578   ecx: 0000015e   edx: c065c800
esi: c0df7000   edi: ebffb000   ebp: f1134ea8   esp: c0701e6c
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, ti=c0701000 task=f77c05a0 task.ti=c0d2f000)
Stack: c9a13c00 c0df7000 00000001 c157ff60 c9a13800 f1134ea8 c9a13980
c9a13800 
       c059fc02 c9a13800 f1134ea8 c9a13980 0000000e c05a1768 c0dcf824
c0dcf800 
       f1134ea8 c05a5cfc c9a13800 ed20e040 00001fc2 00000000 f48703d4
f48703e8 
Call Trace:
 [<c059fc02>] dev_hard_start_xmit+0x198/0x1ee
 [<c05a1768>] dev_queue_xmit+0x24c/0x2e8
 [<c05a5cfc>] neigh_resolve_output+0x1b7/0x1e1
 [<c05bea8b>] ip_output+0x1c0/0x1f9
 [<c05be309>] ip_queue_xmit+0x390/0x3cf
 [<c059fc02>] dev_hard_start_xmit+0x198/0x1ee
 [<c05adbe6>] __qdisc_run+0x30/0x19a
 [<c05a17e6>] dev_queue_xmit+0x2ca/0x2e8
 [<f8640d48>] br_dev_queue_push_xmit+0x15b/0x17e [bridge]
 [<c05cbc6f>] tcp_transmit_skb+0x5e4/0x612
 [<f8641945>] br_handle_frame+0x146/0x15d [bridge]
 [<c05cc9ad>] tcp_retransmit_skb+0x4b7/0x595
 [<c05c5baf>] tcp_enter_loss+0x1a2/0x1ff
 [<c05cee58>] tcp_write_timer+0x3ff/0x5d3
 [<c05cea59>] tcp_write_timer+0x0/0x5d3
 [<c0427146>] run_timer_softirq+0x120/0x19b
 [<c0423162>] __do_softirq+0x73/0xe8
 [<c0406dda>] do_softirq+0x6e/0x102
 ====================== [<c0406d63>] do_IRQ+0xa5/0xae
 [<c052f040>] evtchn_do_upcall+0x85/0xde
 [<c04056a1>] hypervisor_callback+0x3d/0x45
 [<c040800e>] raw_safe_halt+0xc2/0xe8
 [<c040442a>] xen_idle+0x43/0x4f
 [<c04033b0>] cpu_idle+0xa1/0xbb
Code: 24 08 89 44 24 04 8b 85 a4 00 00 00 0f b7 74 c8 18 0f b7 5c c8 1a
8b 44
24 0c e8 ba 09 f1 ff 89 d9 c1 e9 02 8d 3c 30 03 74 24 04 <f3> a5 89 d9
83 e1 03
74 02 f3 a4 8b 44 24 04 ba 05 00 00 00 e8  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Feb-09 08:07 UTC

head link

Re: [Xen-devel] Problem with PV disk and iSCSI

On 9/2/08 06:15, "Kurt Hackel" <kurt.hackel@oracle.com> wrote:
>> I''m thinking blkback will have to make a dom0 copy of the page
before
>> doing the unmap if there are still extra references?
> 
> Can the unmap be deferred, handled by the last reference holder?  Or
> does this open up a potential security hole?
netback already does this kind of reference counting. It oughtn''t to be
hard
to check the page reference count in the blkback I/O completion handler and,
if non-zero, set up a callback for when the count does fall to zero. And
defer responding to the frontend until that time. Netback is even more
sophisticated in that it also sets a time out and if the page languishes for
too long with non-zero count, it''s able to forcibly copy-and-release
the
page. I don''t think we need to go that far for blkback however.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gary Grebus

2008-Feb-11 15:13 UTC

head link

Re: [Xen-devel] Problem with PV disk and iSCSI

On Fri, 2008-02-08 at 22:15 -0800, Kurt Hackel wrote:> Hi Gary,
> 
> On Fri, Feb 08, 2008 at 02:54:14PM -0500, Gary Grebus wrote:
> > I''ve run into a problem on 3.1.2 with an HVM guest using PV
disks.  In
> > dom0, the physical disk is accessed using iSCSI.  The symptom is that
> > applications in dom0 which are monitoring the iSCSI network interface
> > (e.g. tcpdump) die with EFAULT errors.
> > 
...> > 
> 
> We''re seeing the same thing with 3.1.3.  When running iscsi in
dom0
> (over a xen bridge) presenting these via blkfront to the guest we see 
> the same crash (below) while performing failover tests on the storage
> controller.
>
> Just as you said, the error occurs in skb_remove_foreign_references from
> loopback_start_xmit.  It''s running all the foreign pages,
attempting to
> copy each locally when it dies on the source address (esi) of the
> following memcpy:
That''s a different failure than I see, but looks like the same
underlying cause.  Our test used a dedicated iSCSI NIC, so netback
wasn''t involved.  I haven''t looked at how netback handles the
mapped
pages.
> 
> 115                 vaddr =
kmap_skb_frag(&skb_shinfo(skb)->frags[i]);
> 116                 off = skb_shinfo(skb)->frags[i].page_offset;
> 117                 memcpy(page_address(page) + off,
> 118                       vaddr + off,
> 119                        skb_shinfo(skb)->frags[i].size);
> 
> c053f2f7:       0f b7 74 c8 18          movzwl 0x18(%eax,%ecx,8),%esi
> c053f2fc:       0f b7 5c c8 1a          movzwl 0x1a(%eax,%ecx,8),%ebx
> c053f301:       8b 44 24 0c             mov    0xc(%esp),%eax
> c053f305:       e8 ba 09 f1 ff          call   0xc044fcc4  page_address
> c053f30a:       89 d9                   mov    %ebx,%ecx
> c053f30c:       c1 e9 02                shr    $0x2,%ecx
> c053f30f:       8d 3c 30                lea    (%eax,%esi,1),%edi
> c053f312:       03 74 24 04             add    0x4(%esp),%esi
> c053f316:       f3 a5                   rep movsl %ds:(%esi),%es:(%edi)
> <<<<<    memcpy
> ds: 007b esi: c0df7000 es: 007b edi: ebffb000
> 
> It seems one of the skb->frags has been unmapped.
> 
> 
> > I''m thinking blkback will have to make a dom0 copy of the
page before
> > doing the unmap if there are still extra references?
> >
> 
> Can the unmap be deferred, handled by the last reference holder?  Or
> does this open up a potential security hole?
> When the initial block I/O completes, blkfront is going to remove the
grant, so I think you would have to defer notifying blkfront as well.
That doesn''t see workable, since the guest could see the I/O take an
extremely long time, and trigger some timeout.  I think there has to be
a copy made at some point.

	/gary




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gary Grebus

2008-Feb-11 15:26 UTC

head link

Re: [Xen-devel] Problem with PV disk and iSCSI

On Sat, 2008-02-09 at 08:07 +0000, Keir Fraser wrote:> On 9/2/08 06:15, "Kurt Hackel" <kurt.hackel@oracle.com>
wrote:
> 
> >> I''m thinking blkback will have to make a dom0 copy of the
page before
> >> doing the unmap if there are still extra references?
> > 
> > Can the unmap be deferred, handled by the last reference holder?  Or
> > does this open up a potential security hole?
> 
> netback already does this kind of reference counting. It oughtn''t
to be hard
> to check the page reference count in the blkback I/O completion handler
and,
> if non-zero, set up a callback for when the count does fall to zero. And
> defer responding to the frontend until that time. Netback is even more
> sophisticated in that it also sets a time out and if the page languishes
for
> too long with non-zero count, it''s able to forcibly
copy-and-release the
> page. I don''t think we need to go that far for blkback however.
In the failure I''m seeing, the skb could sit on a socket queue
indefinitely.  The application reading the socket could be blocked for
some other reason.  blkback can''t defer responding to blkfront
(completing the guest I/O).

I think blkback needs to assume that a completion with a non-zero page
reference count means it needs to make a copy, or implement a timeout
like netback.

	Gary

-- 
Gary Grebus
Virtual Iron Software, Inc.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Feb-11 15:56 UTC

head link

Re: [Xen-devel] Problem with PV disk and iSCSI

On 11/2/08 15:26, "Gary Grebus" <ggrebus@virtualiron.com> wrote:
>> netback already does this kind of reference counting. It
oughtn''t to be hard
>> to check the page reference count in the blkback I/O completion handler
and,
>> if non-zero, set up a callback for when the count does fall to zero.
And
>> defer responding to the frontend until that time. Netback is even more
>> sophisticated in that it also sets a time out and if the page
languishes for
>> too long with non-zero count, it''s able to forcibly
copy-and-release the
>> page. I don''t think we need to go that far for blkback
however.
> 
> In the failure I''m seeing, the skb could sit on a socket queue
> indefinitely.  The application reading the socket could be blocked for
> some other reason.  blkback can''t defer responding to blkfront
> (completing the guest I/O).
> 
> I think blkback needs to assume that a completion with a non-zero page
> reference count means it needs to make a copy, or implement a timeout
> like netback.
Either way, most of the infrastructure you need should be there, and you can
crib from netback to work out how to use it.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Feb 2008 - Problem with PV disk and iSCSI

[Xen-devel] Problem with PV disk and iSCSI

Re: [Xen-devel] Problem with PV disk and iSCSI

Re: [Xen-devel] Problem with PV disk and iSCSI

Re: [Xen-devel] Problem with PV disk and iSCSI

Re: [Xen-devel] Problem with PV disk and iSCSI

Re: [Xen-devel] Problem with PV disk and iSCSI

Re: [Xen-devel] Problem with PV disk and iSCSI