thr3ads.net - Xen devel - Re: kernel panic in skb_copy

If this information is useful, please help other people find it:
Share via:

Alex Bligh

2013-Jun-30 09:13 UTC

Re: kernel panic in skb_copy_bits

--On 28 June 2013 12:17:43 +0800 Joe Jin <joe.jin@oracle.com> wrote:
> Find a similar issue
> http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen
> developer as well.
I thought this sounded familiar. I haven''t got the start of this
thread, but what version of Xen are you running and what device
model? If before 4.3, there is a page lifetime bug in the kernel
(not the xen code) which can affect anything where the guest accesses
the host''s block stack and that in turn accesses the networking
stack (it may in fact be wider than that). So, e.g. domU on
iCSSI will do it. It tends to get triggered by a TCP retransmit
or (on NFS) the RPC equivalent. Essentially block operation
is considered complete, returning through xen and freeing the
grant table entry, and yet something in the kernel (e.g. tcp
retransmit) can still access the data. The nature of the bug
is extensively discussed in that thread - you''ll also find
a reference to a thread on linux-nfs which concludes it
isn''t an nfs problem, and even some patches to fix it in the
kernel adding reference counting.

A workaround is to turn off O_DIRECT use by Xen as that ensures
the pages are copied. Xen 4.3 does this by default.

I believe fixes for this are in 4.3 and 4.2.2 if using the
qemu upstream DM. Note these aren''t real fixes, just a workaround
of a kernel bug.

To fix on a local build of xen you will need something like this:
https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
and something like this (NB: obviously insert your own git
repo and commit numbers)
https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca

Also note those fixes are (technically) unsafe for live migration
unless there is an ordering change made in qemu''s block open
call.

Of course this might be something completely different.

--
Alex Bligh

Alex Bligh

2013-Jun-30 09:35 UTC

head link

Re: kernel panic in skb_copy_bits

--On 30 June 2013 10:13:35 +0100 Alex Bligh <alex@alex.org.uk> wrote:
> The nature of the bug
> is extensively discussed in that thread - you''ll also find
> a reference to a thread on linux-nfs which concludes it
> isn''t an nfs problem, and even some patches to fix it in the
> kernel adding reference counting.
Some more links for anyone interested in fixing the kernel bug:

 http://lists.xen.org/archives/html/xen-devel/2013-01/msg01618.html
 http://www.spinics.net/lists/linux-nfs/msg34913.html
 http://www.spinics.net/lists/netdev/msg224106.html

-- 
Alex Bligh

Joe Jin

2013-Jul-01 03:18 UTC

head link

Re: kernel panic in skb_copy_bits

On 06/30/13 17:13, Alex Bligh wrote:> 
> 
> --On 28 June 2013 12:17:43 +0800 Joe Jin <joe.jin@oracle.com> wrote:
> 
>> Find a similar issue
>> http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen
>> developer as well.
> 
> I thought this sounded familiar. I haven''t got the start of this
> thread, but what version of Xen are you running and what device
> model? If before 4.3, there is a page lifetime bug in the kernel
> (not the xen code) which can affect anything where the guest accesses
> the host''s block stack and that in turn accesses the networking
> stack (it may in fact be wider than that). So, e.g. domU on
> iCSSI will do it. It tends to get triggered by a TCP retransmit
> or (on NFS) the RPC equivalent. Essentially block operation
> is considered complete, returning through xen and freeing the
> grant table entry, and yet something in the kernel (e.g. tcp
> retransmit) can still access the data. The nature of the bug
> is extensively discussed in that thread - you''ll also find
> a reference to a thread on linux-nfs which concludes it
> isn''t an nfs problem, and even some patches to fix it in the
> kernel adding reference counting.
Do you know if have a fix for above? so far we also suspected the
grant page be unmapped earlier, we using 4.1 stable during our test.
> 
> A workaround is to turn off O_DIRECT use by Xen as that ensures
> the pages are copied. Xen 4.3 does this by default.
> 
> I believe fixes for this are in 4.3 and 4.2.2 if using the
> qemu upstream DM. Note these aren''t real fixes, just a workaround
> of a kernel bug.
The guest is pvm, and disk model is xvbd, guest config file as below:

vif = [''mac=00:21:f6:00:00:01,bridge=c0a80b00'']
OVM_simple_name = ''Guest#1''
disk =
[''file:/OVS/Repositories/0004fb000003000091e9eae94d1e907c/VirtualDisks/0004fb0000120000f78799dad800ef47.img,xvda,w'',
''phy:/dev/mapper/360060e8010141870058b415700000002,xvdb,w'',
''phy:/dev/mapper/360060e8010141870058b415700000003,xvdc,w'']
bootargs = ''''
uuid = ''0004fb00-0006-0000-2b00-77a4766001ed''
on_reboot = ''restart''
cpu_weight = 27500
OVM_os_type = ''Oracle Linux 5''
cpu_cap = 0
maxvcpus = 8
OVM_high_availability = False
memory = 4096
OVM_description = ''''
on_poweroff = ''destroy''
on_crash = ''restart''
bootloader = ''/usr/bin/pygrub''
guest_os_type = ''linux''
name = ''0004fb00000600002b0077a4766001ed''
vfb =
[''type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us'']
vcpus = 8
OVM_cpu_compat_group = ''''
OVM_domain_type = ''xen_pvm''
> 
> To fix on a local build of xen you will need something like this:
>
https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
> and something like this (NB: obviously insert your own git
> repo and commit numbers)
>
https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
> 
I think this only for pvhvm/hvm?


Thanks,
Joe> Also note those fixes are (technically) unsafe for live migration
> unless there is an ordering change made in qemu''s block open
> call.
> 
> Of course this might be something completely different.
>

Ian Campbell

2013-Jul-01 08:11 UTC

head link

Re: kernel panic in skb_copy_bits

On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:> > A workaround is to turn off O_DIRECT use by Xen as that ensures
> > the pages are copied. Xen 4.3 does this by default.
> > 
> > I believe fixes for this are in 4.3 and 4.2.2 if using the
> > qemu upstream DM. Note these aren''t real fixes, just a
workaround
> > of a kernel bug.
> 
> The guest is pvm, and disk model is xvbd, guest config file as below:
Do you know which disk backend? The workaround Alex refers to went into
qdisk but I think blkback could still suffer from a variant of the
retransmit issue if you run it over iSCSI.
> > To fix on a local build of xen you will need something like this:
> >
https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
> > and something like this (NB: obviously insert your own git
> > repo and commit numbers)
> >
https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
> > 
> 
> I think this only for pvhvm/hvm?
No, the underlying issue affects any PV device which is run over a
network protocol (NFS, iSCSI etc). In effect a delayed retransmit can
cross over the deayed ack and cause I/O to be completed while
retransmits are pending, such as is described in
http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS
variant). The problem is that because Xen PV drivers often unmap the
page on I/O completion you get a crash (page fault) on the retransmit.

The issue also affects native but in that case the symptom is "just" a
corrupt packet on the wire. I tried to address this with my "skb
destructor" series but unfortunately I got bogged down on the details,
then I had to take time out to look into some other stuff and never
managed to get back into it. I''d be very grateful if there was someone
who could pick up that work (Alex gave some useful references in another
reply to this thread)

Some PV disk backends (e.g. blktap2) have worked around this by using
grant copy instead of grant map, others (e.g. qdisk) have disabled
O_DIRECT so that the pages are copied into the dom0 page cache and
transmitted from there.

We were discussing recently the possibility of mapping all ballooned out
pages to a single read-only scratch page instead of leaving them empty
in the page tables, this would cause the Xen case to revert to the
native case. I think Thanos was going to take a look into this.

Ian.

Alex Bligh

2013-Jul-01 08:29 UTC

head link

Re: kernel panic in skb_copy_bits

Joe,
> Do you know if have a fix for above? so far we also suspected the
> grant page be unmapped earlier, we using 4.1 stable during our test.
A true fix? No, but I posted a patch set (see later email message
for a link) that you could forward port. The workaround is:
>> A workaround is to turn off O_DIRECT use by Xen as that ensures
>> the pages are copied. Xen 4.3 does this by default.
>>
>> I believe fixes for this are in 4.3 and 4.2.2 if using the
>> qemu upstream DM. Note these aren''t real fixes, just a
workaround
>> of a kernel bug.
>
> The guest is pvm, and disk model is xvbd, guest config file as below:
...> I think this only for pvhvm/hvm?
I don''t have much experience outside pvhvm/hvm, but I believe it
should work for any device.

Testing was simple - just find all (*) the references to O_DIRECT
in your device model and remove them!

(*)=you could be less lazy than me and find the right ones.

I am guessing it will be the same ones though.

-- 
Alex Bligh

Joe Jin

2013-Jul-01 13:00 UTC

head link

Re: kernel panic in skb_copy_bits

On 07/01/13 16:11, Ian Campbell wrote:> On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
>>> A workaround is to turn off O_DIRECT use by Xen as that ensures
>>> the pages are copied. Xen 4.3 does this by default.
>>>
>>> I believe fixes for this are in 4.3 and 4.2.2 if using the
>>> qemu upstream DM. Note these aren''t real fixes, just a
workaround
>>> of a kernel bug.
>>
>> The guest is pvm, and disk model is xvbd, guest config file as below:
> 
> Do you know which disk backend? The workaround Alex refers to went into
> qdisk but I think blkback could still suffer from a variant of the
> retransmit issue if you run it over iSCSI.
The backend is xen-blkback on iSCSI storage.
 > 
>>> To fix on a local build of xen you will need something like this:
>>>
https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
>>> and something like this (NB: obviously insert your own git
>>> repo and commit numbers)
>>>
https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
>>>
>>
>> I think this only for pvhvm/hvm?
> 
> No, the underlying issue affects any PV device which is run over a
> network protocol (NFS, iSCSI etc). In effect a delayed retransmit can
> cross over the deayed ack and cause I/O to be completed while
> retransmits are pending, such as is described in
> http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS
> variant). The problem is that because Xen PV drivers often unmap the
> page on I/O completion you get a crash (page fault) on the retransmit.
> To prevent iSCSI call sendpage() reuse the page we disabled the sg from NIC,
per test result the panic went. This also confirmed the page be unmpped by
grant system, the symptom as same as nfs panic.

> The issue also affects native but in that case the symptom is
"just" a
> corrupt packet on the wire. I tried to address this with my "skb
> destructor" series but unfortunately I got bogged down on the details,
> then I had to take time out to look into some other stuff and never
> managed to get back into it. I''d be very grateful if there was
someone
> who could pick up that work (Alex gave some useful references in another
> reply to this thread)
> 
> Some PV disk backends (e.g. blktap2) have worked around this by using
> grant copy instead of grant map, others (e.g. qdisk) have disabled
> O_DIRECT so that the pages are copied into the dom0 page cache and
> transmitted from there.
The work around as same as we disable sg from NIC(disable it sendpage will
create own page copy rather than reuse the page).

Thanks,
Joe> 
> We were discussing recently the possibility of mapping all ballooned out
> pages to a single read-only scratch page instead of leaving them empty
> in the page tables, this would cause the Xen case to revert to the
> native case. I think Thanos was going to take a look into this.
> 
> Ian.
>

Joe Jin

2013-Jul-04 08:55 UTC

head link

Re: kernel panic in skb_copy_bits

On 07/01/13 16:11, Ian Campbell wrote:> On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
>>> A workaround is to turn off O_DIRECT use by Xen as that ensures
>>> the pages are copied. Xen 4.3 does this by default.
>>>
>>> I believe fixes for this are in 4.3 and 4.2.2 if using the
>>> qemu upstream DM. Note these aren''t real fixes, just a
workaround
>>> of a kernel bug.
>>
>> The guest is pvm, and disk model is xvbd, guest config file as below:
> 
> Do you know which disk backend? The workaround Alex refers to went into
> qdisk but I think blkback could still suffer from a variant of the
> retransmit issue if you run it over iSCSI.
> 
>>> To fix on a local build of xen you will need something like this:
>>>
https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
>>> and something like this (NB: obviously insert your own git
>>> repo and commit numbers)
>>>
https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
>>>
>>
>> I think this only for pvhvm/hvm?
> 
> No, the underlying issue affects any PV device which is run over a
> network protocol (NFS, iSCSI etc). In effect a delayed retransmit can
> cross over the deayed ack and cause I/O to be completed while
> retransmits are pending, such as is described in
> http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS
> variant). The problem is that because Xen PV drivers often unmap the
> page on I/O completion you get a crash (page fault) on the retransmit.
> 
Can we do it by remember grant page refcount when mapping, and when unmap
check if page refcount as same as mapping?  This change will limited in
xen-blkback.

Another way is add new page flag like PG_send, when sendpage() be called,
set the bit, when page be put, clear the bit. Then xen-blkback can wait
on the pagequeue.

Thanks,
Joe
> The issue also affects native but in that case the symptom is
"just" a
> corrupt packet on the wire. I tried to address this with my "skb
> destructor" series but unfortunately I got bogged down on the details,
> then I had to take time out to look into some other stuff and never
> managed to get back into it. I''d be very grateful if there was
someone
> who could pick up that work (Alex gave some useful references in another
> reply to this thread)
> 
> Some PV disk backends (e.g. blktap2) have worked around this by using
> grant copy instead of grant map, others (e.g. qdisk) have disabled
> O_DIRECT so that the pages are copied into the dom0 page cache and
> transmitted from there.
> 
> We were discussing recently the possibility of mapping all ballooned out
> pages to a single read-only scratch page instead of leaving them empty
> in the page tables, this would cause the Xen case to revert to the
> native case. I think Thanos was going to take a look into this.
> 
> Ian.
> 

-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing

Ian Campbell

2013-Jul-04 08:59 UTC

head link

Re: kernel panic in skb_copy_bits

On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote:> On 07/01/13 16:11, Ian Campbell wrote:
> > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
> >>> A workaround is to turn off O_DIRECT use by Xen as that
ensures
> >>> the pages are copied. Xen 4.3 does this by default.
> >>>
> >>> I believe fixes for this are in 4.3 and 4.2.2 if using the
> >>> qemu upstream DM. Note these aren''t real fixes, just
a workaround
> >>> of a kernel bug.
> >>
> >> The guest is pvm, and disk model is xvbd, guest config file as
below:
> > 
> > Do you know which disk backend? The workaround Alex refers to went
into
> > qdisk but I think blkback could still suffer from a variant of the
> > retransmit issue if you run it over iSCSI.
> > 
> >>> To fix on a local build of xen you will need something like
this:
> >>>
https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
> >>> and something like this (NB: obviously insert your own git
> >>> repo and commit numbers)
> >>>
https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
> >>>
> >>
> >> I think this only for pvhvm/hvm?
> > 
> > No, the underlying issue affects any PV device which is run over a
> > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can
> > cross over the deayed ack and cause I/O to be completed while
> > retransmits are pending, such as is described in
> > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS
> > variant). The problem is that because Xen PV drivers often unmap the
> > page on I/O completion you get a crash (page fault) on the retransmit.
> > 
> 
> Can we do it by remember grant page refcount when mapping, and when unmap
> check if page refcount as same as mapping?  This change will limited in
> xen-blkback.
> 
> Another way is add new page flag like PG_send, when sendpage() be called,
> set the bit, when page be put, clear the bit. Then xen-blkback can wait
> on the pagequeue.
These schemes don''t work when you have multiple simultaneous I/Os
referencing the same underlying page.
> 
> Thanks,
> Joe
> 
> > The issue also affects native but in that case the symptom is
"just" a
> > corrupt packet on the wire. I tried to address this with my "skb
> > destructor" series but unfortunately I got bogged down on the
details,
> > then I had to take time out to look into some other stuff and never
> > managed to get back into it. I''d be very grateful if there
was someone
> > who could pick up that work (Alex gave some useful references in
another
> > reply to this thread)
> > 
> > Some PV disk backends (e.g. blktap2) have worked around this by using
> > grant copy instead of grant map, others (e.g. qdisk) have disabled
> > O_DIRECT so that the pages are copied into the dom0 page cache and
> > transmitted from there.
> > 
> > We were discussing recently the possibility of mapping all ballooned
out
> > pages to a single read-only scratch page instead of leaving them empty
> > in the page tables, this would cause the Xen case to revert to the
> > native case. I think Thanos was going to take a look into this.
> > 
> > Ian.
> > 
> 
>

Eric Dumazet

2013-Jul-04 09:34 UTC

head link

Re: kernel panic in skb_copy_bits

On Thu, 2013-07-04 at 09:59 +0100, Ian Campbell wrote:> On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote:
> > 
> > Another way is add new page flag like PG_send, when sendpage() be
called,
> > set the bit, when page be put, clear the bit. Then xen-blkback can
wait
> > on the pagequeue.
> 
> These schemes don''t work when you have multiple simultaneous I/Os
> referencing the same underlying page.
So this is a page property, still the patches I saw tried to address
this problem adding networking stuff (destructors) in the skbs.

Given that a page refcount can be transfered between entities, say using
splice() system call, I do not really understand why the fix would imply
networking only.

Let''s try to fix it properly, or else we must disable zero copies
because they are not reliable.

Why sendfile() doesn''t have the problem, but vmsplice()+splice() do
have
this issue ?

As soon as a page fragment reference is taken somewhere, the only way to
properly reuse the page is to rely on put_page() and page being freed.

Adding workarounds in TCP stack to always copy the page fragments in
case of a retransmit is partial solution, as the remote peer could be
malicious and send ACK _before_ page content is actually read by the
NIC.

So if we rely on networking stacks to give the signal for page reuse, we
can have major security issue.

Ian Campbell

2013-Jul-04 09:52 UTC

head link

Re: kernel panic in skb_copy_bits

On Thu, 2013-07-04 at 02:34 -0700, Eric Dumazet wrote:> On Thu, 2013-07-04 at 09:59 +0100, Ian Campbell wrote:
> > On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote:
> > > 
> > > Another way is add new page flag like PG_send, when sendpage() be
called,
> > > set the bit, when page be put, clear the bit. Then xen-blkback
can wait
> > > on the pagequeue.
> > 
> > These schemes don''t work when you have multiple simultaneous
I/Os
> > referencing the same underlying page.
> 
> So this is a page property, still the patches I saw tried to address
> this problem adding networking stuff (destructors) in the skbs.
> 
> Given that a page refcount can be transfered between entities, say using
> splice() system call, I do not really understand why the fix would imply
> networking only.
> 
> Let''s try to fix it properly, or else we must disable zero copies
> because they are not reliable.
> 
> Why sendfile() doesn''t have the problem, but vmsplice()+splice()
do have
> this issue ?
Might just be that no one has observed it with vmsplice()+splice()? Most
of the time this happens silently and you''ll probably never notice,
it''s
just the behaviour of Xen which escalates the issue into one you can
see.
> As soon as a page fragment reference is taken somewhere, the only way to
> properly reuse the page is to rely on put_page() and page being freed.
Xen''s out of tree netback used to fix this by a destructor call back on
page free, but that was a core mm patch in the hot memory free path
which wasn''t popular, and it doesn''t solve anything for the
non-Xen
instances of this issue.
> Adding workarounds in TCP stack to always copy the page fragments in
> case of a retransmit is partial solution, as the remote peer could be
> malicious and send ACK _before_ page content is actually read by the
> NIC.
> 
> So if we rely on networking stacks to give the signal for page reuse, we
> can have major security issue.
If you ignore the Xen case and consider just the native case then the
issue isn''t page reuse in the sense of getting mapped into another
process, it''s the same page in the same process but the process has
written something new to the buffer, e.g.
	memset(buf, 0xaa, 4096);
	write(fd, buf, 4096)
	memset(buf, 0x55, 4096);
(where fd is O_DIRECT on NFS) Can result in 0x55 being seen on the wire
in the TCP retransmit.

If the retransmit is at the RPC layer then you get a resend of the NFS
write RPC, but the XDR sequence stuff catches that case (I think, memory
is fuzzy).

If the retransmit is at the TCP level then the TCP sequence/ack will
cause the receiver to ignore the corrupt version, but if you replace the
second memset with write_critical_secret_key(buf), then you have an
information leak.

Ian.

Eric Dumazet

2013-Jul-04 10:12 UTC

head link

Re: kernel panic in skb_copy_bits

On Thu, 2013-07-04 at 10:52 +0100, Ian Campbell wrote:
> Might just be that no one has observed it with vmsplice()+splice()? Most
> of the time this happens silently and you''ll probably never
notice, it''s
> just the behaviour of Xen which escalates the issue into one you can
> see.
The point I wanted to make is that nobody can seriously use vmsplice(),
unless the memory is never reused by the application, or the application
doesn''t care of security implications.

Because an application has no way to know when it''s safe to reuse the
area for another usage.

[ Unless it uses the obscure and complex pagemap stuff
(Documentation/vm/pagemap.txt), but its not asynchronous signaling and
not pluggable into epoll()/poll()/select()) ]
> Xen''s out of tree netback used to fix this by a destructor call
back on
> page free, but that was a core mm patch in the hot memory free path
> which wasn''t popular, and it doesn''t solve anything for
the non-Xen
> instances of this issue.
It _is_ a mm core patch which is needed, if we ever want to fix this
problem.

It looks like a typical COW issue to me.

If the page content is written while there is still a reference on this
page, we should allocate a new page and copy the previous content.

And this has little to do with networking.

Alex Bligh

2013-Jul-04 12:57 UTC

head link

Re: kernel panic in skb_copy_bits

--On 4 July 2013 03:12:10 -0700 Eric Dumazet <eric.dumazet@gmail.com>
wrote:
> It looks like a typical COW issue to me.
>
> If the page content is written while there is still a reference on this
> page, we should allocate a new page and copy the previous content.
>
> And this has little to do with networking.
I suspect this would get more attention if we could make Ian''s case
below trigger (a) outside Xen, (b) outside networking.
> 	memset(buf, 0xaa, 4096);
> 	write(fd, buf, 4096)
> 	memset(buf, 0x55, 4096);
> (where fd is O_DIRECT on NFS) Can result in 0x55 being seen on the wire
> in the TCP retransmit.
We know this should fail using O_DIRECT+NFS. We''ve had reports
suggesting
it fails in O_DIRECT+iSCSI. However, that''s been with a kernel panic
(under Xen) rather than data corruption as per the above.

Historical trawling suggests this is an issue with DRDB (see Ian''s
original thread from the mists of time).

I don''t quite understand why we aren''t seeing corruption with
standard
ATA devices + O_DIRECT and no Xen involved at all.

My memory is a bit misty on this but I had thought the reason why
this would NOT be solved simply by O_DIRECT taking a reference to
the page was that the O_DIRECT I/O completed (and thus the reference
would be freed up) before the networking stack had actually finished
with the page. If the O_DIRECT I/O did not complete until the
page was actually finished with, we wouldn''t see the problem in the
first place. I may be completely off base here.

-- 
Alex Bligh

Xen devel - Jun 2013 - Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits

Re: kernel panic in skb_copy_bits