thr3ads.net - Xen devel - [Xen-devel] ext3 directory corruption under Xen [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Christopher S. Aker

2008-Jun-23 16:15 UTC

[Xen-devel] ext3 directory corruption under Xen

We''ve been seeing a rash of ext3 directory corruption occurring under 
Xen.  All but one of the reports have been with filesystems formatted 
with 1024 blocksize.  We have one report, that''s potentialy the same 
bug, occurring on a filesystem with 4096 blocksize (either way, it was 
some type of corruption in that case).  In all cases, the filesystems 
were mounted with ext3''s default journaling mode.  No quotas or
anything
else other than the default ext3 mount options.

It''s happened on a number of different hosts, all of the same hardware 
and software configuration (Xen 3.2 64bit, 32bit pae dom0, 32bit pae 
domUs.  LVM backend with 3ware hardware RAID-1).  Some of those hosts 
were previously running non-virtaulized Linux and UML, using the 
identical guest images, and under that configuration never experienced 
this problem.

This has occurred under both 2.6.18-xenbits and the more recent pv_ops 
based kernels (2.6.24, 2.6.25), which I presume are all using the same 
blkfront driver code.

The common workloads from the reports seems to be active maildirs and rsync.

The initial errors reported back are all from fs/ext3/dir.c, in 
ext3_check_dir_entry(). Most commonly hit is the "rec_len % 4 != 0" 
check.  We''ve seen other checks trigger, but my assumption is that
those
happen after more stuff gets whacked out.

Eventually the fs will go read-only.  In extreme cases, the fs is chewed 
through enough that data is lost.

It''s tricky to track down the trigger because you can only detect the 
corruption after it''s happened.  Our attempts to reproduce this using 
various filesystem thrashing scripts haven''t yielded a reliable way to 
trigger it, however we have been successful in triggering it twice -- in 
two weeks :( .

My hope is that this triggers an "a-hah" from someone in LKML or Xen 
land who has experience with this code, or that this is a known issue 
and a fix already lives.

We''re scared.  Please help.

Thanks,
-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kurt Hackel

2008-Jun-23 18:16 UTC

head link

Re: [Xen-devel] ext3 directory corruption under Xen

Hi,

Is your 32bit pae domU paravirt or hvm?  We have seen similar ext3
corruptions on rhel3 and rhel4 32pae hvm guests, one which appeared to be
triggered by a shadow optimization for pae.

thanks
kurt

On Mon, Jun 23, 2008 at 12:15:33PM -0400, Christopher S. Aker
wrote:> We''ve been seeing a rash of ext3 directory corruption occurring
under Xen.
> All but one of the reports have been with filesystems formatted with 1024 
> blocksize.  We have one report, that''s potentialy the same bug,
occurring
> on a filesystem with 4096 blocksize (either way, it was some type of 
> corruption in that case).  In all cases, the filesystems were mounted with 
> ext3''s default journaling mode.  No quotas or anything else other
than the
> default ext3 mount options.
>
> It''s happened on a number of different hosts, all of the same
hardware and
> software configuration (Xen 3.2 64bit, 32bit pae dom0, 32bit pae domUs.  
> LVM backend with 3ware hardware RAID-1).  Some of those hosts were 
> previously running non-virtaulized Linux and UML, using the identical guest
> images, and under that configuration never experienced this problem.
>
> This has occurred under both 2.6.18-xenbits and the more recent pv_ops 
> based kernels (2.6.24, 2.6.25), which I presume are all using the same 
> blkfront driver code.
>
> The common workloads from the reports seems to be active maildirs and 
> rsync.
>
> The initial errors reported back are all from fs/ext3/dir.c, in 
> ext3_check_dir_entry(). Most commonly hit is the "rec_len % 4 !=
0" check.
> We''ve seen other checks trigger, but my assumption is that those
happen
> after more stuff gets whacked out.
>
> Eventually the fs will go read-only.  In extreme cases, the fs is chewed 
> through enough that data is lost.
>
> It''s tricky to track down the trigger because you can only detect
the
> corruption after it''s happened.  Our attempts to reproduce this
using
> various filesystem thrashing scripts haven''t yielded a reliable
way to
> trigger it, however we have been successful in triggering it twice -- in 
> two weeks :( .
>
> My hope is that this triggers an "a-hah" from someone in LKML or
Xen land
> who has experience with this code, or that this is a known issue and a fix 
> already lives.
>
> We''re scared.  Please help.
>
> Thanks,
> -Chris
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2008-Jun-23 19:53 UTC

head link

[Xen-devel] Re: ext3 directory corruption under Xen

adam radford wrote:> On Mon, Jun 23, 2008 at 9:15 AM, Christopher S. Aker
<caker@theshore.net> wrote:
> 
>> It''s happened on a number of different hosts, all of the same
hardware and
>> software configuration (Xen 3.2 64bit, 32bit pae dom0, 32bit pae domUs.
LVM
>> backend with 3ware hardware RAID-1).
 >> A driver patch for older kernels including XenServer-4.1 is available here:
> 
> http://www.3ware.com/KB/article.aspx?id=15243&cNode=6I1C6S
Thanks, but unfortunately, from that link:

"3ware 9000 series controllers are not affected by this issue."

... which is what we''re using.  This problem only appeared after 
rebooting these machines into Xen.  Some of affected boxes even ran 
2.6.18 (non Xen) for awhile without any problems.

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jun 2008 - ext3 directory corruption under Xen

[Xen-devel] ext3 directory corruption under Xen

Re: [Xen-devel] ext3 directory corruption under Xen

[Xen-devel] Re: ext3 directory corruption under Xen