thr3ads.net - Xen devel - [Xen-devel] ext4 BUG in dom0 Kernel 2.6.32.36 [Sep 2011]

If this information is useful, please help other people find it:
Share via:

MaoXiaoyun

2011-Sep-06 07:24 UTC

[Xen-devel] ext4 BUG in dom0 Kernel 2.6.32.36

Hi:

I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack below)
32.36 kernel commit:
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ae333e97552c81ab10395ad1ffc6d6daaadb144a

The bug only show up in our cluster environments which includes 300 physical
machines, one server will run into this bug per day.
Running ontop of every server, there are about 30 VMS, each of which has heavy
IO workload inside.(we are doing some kinds of stress tests)
 
We have our own distribute file system as the storage of cluster, every
VM''image file will be spilt into several files with equal size in
physical disk, and every creation of file use ext4 fallocation(fallocation size
1MB). So I believe there will be quite a lot of uninitialized
extent to be initialized during the test.
 
After go through the src code. Call routinue is 
ext4_da_sritepages->mpage_da_map_blocks->ext4_get_blocks->ext4_ext_get_blocks->
ext4_ext_handle_uninitialized_extents->ext4_ext_convert_to_initialized->ext4_ext_insert_extent
 
 
if ext4_ext_handle_uninitialized_extents is called, then line 3306 must be
satisfied.
that is we have in_range(iblock, ee_block, ee_len) = true.
so iblock >= ee_block
 
fs/ext4/extents.c
3306 <+++<+++if (in_range(iblock, ee_block, ee_len)) {
3307 <+++<+++<+++newblock = iblock - ee_block + ee_start;
3308 <+++<+++<+++/* number of remaining blocks in the extent */
3309 <+++<+++<+++allocated = ee_len - (iblock - ee_block);
3310 <+++<+++<+++ext_debug("%u fit into %u:%d -> %llu\n",
iblock,
3311 <+++<+++<+++<+++<+++ee_block, ee_len, newblock);
3312 
3313 <+++<+++<+++/* Do not put uninitialized extent in the cache */
3314 <+++<+++<+++if (!ext4_ext_is_uninitialized(ex)) {
3315 <+++<+++<+++<+++ext4_ext_put_in_cache(inode, ee_block,
3316 <+++<+++<+++<+++<+++<+++<+++ee_len, ee_start,
3317 <+++<+++<+++<+++<+++<+++<+++EXT4_EXT_CACHE_EXTENT);
3318 <+++<+++<+++<+++goto out;
3319 <+++<+++<+++}
3320 <+++<+++<+++ret = ext4_ext_handle_uninitialized_extents(handle,
3321 <+++<+++<+++<+++<+++inode, iblock, max_blocks, path,
3322 <+++<+++<+++<+++<+++flags, allocated, bh_result, newblock);
3323 <+++<+++<+++return ret;
3324 <+++<+++}
 
 
the newext is from line 2678, its ee_block is iblock + max_blocks
the nearex is path[depth].p_ext(line 1683) 
 
BUG_ON 1716 means iblock + max_blocks = ee_block.
So maybe that means we have iblock = ee_block and max_blocks = 0.
 
 
1716 <+++<+++BUG_ON(newext->ee_block == nearex->ee_block);
1717 <+++<+++len = (EXT_MAX_EXTENT(eh) - nearex) * sizeof(struct
ext4_extent);
1718 <+++<+++len = len < 0 ? 0 : len;
1719 <+++<+++ext_debug("insert %d:%llu:[%d]%d before: nearest 0x%p,
"
1720 <+++<+++<+++<+++"move %d from 0x%p to 0x%p\n",
1721 <+++<+++<+++<+++le32_to_cpu(newext->ee_block),
1722 <+++<+++<+++<+++ext_pblock(newext),
1723 <+++<+++<+++<+++ext4_ext_is_uninitialized(newext),
1724 <+++<+++<+++<+++ext4_ext_get_actual_len(newext),
1725 <+++<+++<+++<+++nearex, len, nearex + 1, nearex + 2);
1726 <+++<+++memmove(nearex + 1, nearex, len);
1727 <+++<+++path[depth].p_ext = nearex;
1728 <+++}
 
 
2678 <+++<+++ex3 = &newex;
2679 <+++<+++ex3->ee_block = cpu_to_le32(iblock + max_blocks);
2680 <+++<+++ext4_ext_store_pblock(ex3, newblock + max_blocks);
2681 <+++<+++ex3->ee_len = cpu_to_le16(allocated - max_blocks);
2682 <+++<+++ext4_ext_mark_uninitialized(ex3);
2683 <+++<+++err = ext4_ext_insert_extent(handle, inode, path, ex3, 0);
2684 <+++<+++if (err == -ENOSPC && may_zeroout) {
2685 <+++<+++<+++err =  ext4_ext_zeroout(inode, &orig_ex);
 
 
if max_blocks = 0; it means 2225, mpd->b_size >>
mpd->inode->i_blkbits is 0.
 
fs/ext4/inode.c
2220 static int mpage_da_map_blocks(struct mpage_da_data *mpd)
2221 {
2222 <+++int err, blks, get_blocks_flags;
2223 <+++struct buffer_head new;
2224 <+++sector_t next = mpd->b_blocknr;
2225 <+++unsigned max_blocks = mpd->b_size >>
mpd->inode->i_blkbits;
2226 <+++loff_t disksize = EXT4_I(mpd->inode)->i_disksize;
2227 <+++handle_t *handle = NULL;
2228 
 
 
Could it be possilbe, right now I am tring to reproduce this problem in a much
easiler way, any suggestion? 
 
Many thanks.
 
 
------------[ cut here ]------------
kernel BUG at fs/ext4/extents.c:1716!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/block/tapdevk/stat
CPU 3 
Modules linked in: xt_iprange xt_mac arptable_filter arp_tables xt_physdev
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
iptable_filter ip_tables bridge autofs4 ipmi_devintf ipmi_si ipmi_msghandler
lockd sunrpc bonding ipv6 8021q garp stp llc xenfs
dm_multipath fuse xen_netback xen_blkback blktap blkback_pagemap loop nbd video
output sbs sbshc parport_pc lp parport joydev ses
enclosure snd_seq_dummy snd_seq_oss bnx2 snd_seq_midi_event snd_seq
snd_seq_device dcdbas snd_pcm_oss snd_mixer_oss serio_raw snd_pcm
snd_timer snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr
shpchp [last unloaded: freq_table]
Pid: 9073, comm: flush-8:16 Not tainted 2.6.32.36xen #1 PowerEdge R710
RIP: e030:[<ffffffff811a6184>] [<ffffffff811a6184>]
ext4_ext_insert_extent+0xac1/0xbe0
RSP: e02b:ffff8801499cd580 EFLAGS: 00010246
RAX: 0000000000002948 RBX: 0000000000000000 RCX: ffff8801499cd780
RDX: ffff8801499cd360 RSI: ffff88007dedb310 RDI: 0000000000000017
RBP: ffff8801499cd650 R08: ffff8801499cd340 R09: ffff880063488930
R10: 000000018100f8bf R11: dead000000200200 R12: ffff88005a29700c
R13: ffff88005a297000 R14: ffff8801158198c0 R15: ffff88003e9ea1b0
FS: 00007fd3cc4bf6e0(0000) GS:ffff88002808f000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000042a09e CR3: 00000000bf3bd000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process flush-8:16 (pid: 9073, threadinfo ffff8801499cc000, task
ffff880149ad5b40)
Stack:
ffff8801499cd780 ffff88003e9ea180 ffff8801c5b47300 01ffffff81103c0c
<0> ffff88003e9ea180 000000017dedb2a0 ffff880115819800 ffff88007dedb2a0
<0> ffff8801499cd5d0 ffffffff811c12ea ffff8801499cd5f0 ffffffff811c16ea
Call Trace:
[<ffffffff811c12ea>] ? jbd_unlock_bh_journal_head+0x16/0x18
[<ffffffff811c16ea>] ? jbd2_journal_put_journal_head+0x4d/0x52
[<ffffffff811bb7d6>] ? jbd2_journal_get_write_access+0x31/0x38
[<ffffffff811a88e9>] ? __ext4_journal_get_write_access+0x4c/0x5f
[<ffffffff811a6ce3>] ext4_ext_handle_uninitialized_extents+0xa40/0xef5
[<ffffffff8100f175>] ? xen_force_evtchn_callback+0xd/0xf
[<ffffffff8100f8d2>] ? check_events+0x12/0x20
[<ffffffff81042fcf>] ? need_resched+0x23/0x2d
[<ffffffff811a74e1>] ext4_ext_get_blocks+0x265/0x6eb
[<ffffffff81042fcf>] ? need_resched+0x23/0x2d
[<ffffffff81188b55>] ext4_get_blocks+0x140/0x204
[<ffffffff81188d2f>] mpage_da_map_blocks+0xb7/0x681
[<ffffffff810d3b29>] ? find_get_pages_tag+0x48/0xcc
[<ffffffff8100f8d2>] ? check_events+0x12/0x20
[<ffffffff810da8df>] ? pagevec_lookup_tag+0x27/0x30
[<ffffffff810d87cc>] ? write_cache_pages+0x175/0x35e
[<ffffffff811893f0>] ? __mpage_da_writepage+0x0/0x164
[<ffffffff81103c0c>] ? kmem_cache_alloc+0x94/0xf6
[<ffffffff811bbc40>] ? jbd2_journal_start+0xa1/0xcd
[<ffffffff8119957f>] ? ext4_journal_start_sb+0xdc/0x111
[<ffffffff81186852>] ? ext4_meta_trans_blocks+0x74/0xce
[<ffffffff8118bc42>] ext4_da_writepages+0x47a/0x6a7
[<ffffffff810d8a00>] do_writepages+0x21/0x2a
[<ffffffff8112cdb8>] writeback_single_inode+0xc8/0x1e3
[<ffffffff8112d5e4>] writeback_inodes_wb+0x30b/0x37e
[<ffffffff8102f82d>] ? paravirt_end_context_switch+0x17/0x31
[<ffffffff8100b459>] ? xen_end_context_switch+0x1e/0x22
[<ffffffff8112d788>] wb_writeback+0x131/0x1bb
[<ffffffff81064029>] ? try_to_del_timer_sync+0x73/0x81
[<ffffffff8112d9ef>] wb_do_writeback+0x13c/0x153
[<ffffffff8106425b>] ? process_timeout+0x0/0x10
[<ffffffff810e78d1>] ? bdi_start_fn+0x0/0xd0
[<ffffffff8112da32>] bdi_writeback_task+0x2c/0xb3
[<ffffffff810e793b>] bdi_start_fn+0x6a/0xd0
[<ffffffff810754b7>] kthread+0x6e/0x76
[<ffffffff81013daa>] child_rip+0xa/0x20
[<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
[<ffffffff81013da0>] ? child_rip+0x0/0x20
Code: 8d 04 85 f4 ff ff ff 85 c0 0f 49 d8 48 63 d3 e8 47 c7 07 00 49 8d 44 24 0c
49 89 47 10 eb 3a bb f4 ff ff ff e9 c2 00 00 00 75 04
<0f> 0b eb fe 41 0f b7 45 04 49 8d 7c 24 0c 48 6b c0 0c 4c 89 e6 
RIP [<ffffffff811a6184>] ext4_ext_insert_extent+0xac1/0xbe0
RSP <ffff8801499cd580>
---[ end trace 035c7d09ed95fb32 ]--- 		 	   		  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Sep-06 11:33 UTC

head link

[Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

fsck some of the the hard disk has multiply-claimd blocks.
And it looks like i need this patch to fix "should not have EOFBLOCKS_FL
set" error.
 
http://git390.marist.edu/cgi-bin/gitweb.cgi?p=linux-2.6.git;a=commitdiff;h=58590b06d79f7ce5ab64ff3b6d537180fa50dc84
 
Inode 50343178 should not have EOFBLOCKS_FL set (size 67108864, lblk 16383)
Clear? yes
Inode 50345362 should not have EOFBLOCKS_FL set (size 67108864, lblk 16383)
Clear? yes
Inode 50345386 should not have EOFBLOCKS_FL set (size 63963136, lblk 15615)
Clear? yes
Inode 50345648 should not have EOFBLOCKS_FL set (size 3145728, lblk 767)
Clear? yes
Inode 50345690 should not have EOFBLOCKS_FL set (size 67108864, lblk 16383)
Clear? yes
Inode 50346361, i_blocks is 133136, should be 133256.  Fix? yes

Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 50346361: 226854591 226854592 226854593
226854594 226854595 226854596 226854597 226854598 226854599 226854600 226854601
226854602 226854603 226854604 226854605 226854591 226854592 226854593 226854594
226854595 226854596 226854597 226854598 226854599 226854600 226854601 226854602
226854603 226854604 226854605
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 1 inodes containing multiply-claimed blocks.)
File /chunks/2410339941482498_637 (inode #50346361, mod time Tue Sep  6 16:25:33
2011)
  has 30 multiply-claimed block(s), shared with 0 file(s):
Clone multiply-claimed blocks? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (78, counted=63).
Fix? yes
Free blocks count wrong (7028646, counted=7028631).
Fix? yes

----------------------------------------> From: tinnycloud@hotmail.com
> To: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com
> CC: jeremy@goop.org; konrad.wilk@oracle.com
> Subject: ext4 BUG in dom0 Kernel 2.6.32.36
> Date: Tue, 6 Sep 2011 15:24:14 +0800
>
>
>
> Hi:
>
> I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack
below)
> 32.36 kernel commit:
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ae333e97552c81ab10395ad1ffc6d6daaadb144a
>
> The bug only show up in our cluster environments which includes 300
physical machines, one server will run into this bug per day.
> Running ontop of every server, there are about 30 VMS, each of which has
heavy IO workload inside.(we are doing some kinds of stress tests)
>
> We have our own distribute file system as the storage of cluster, every
VM''image file will be spilt into several files with equal size in
> physical disk, and every creation of file use ext4 fallocation(fallocation
size 1MB). So I believe there will be quite a lot of uninitialized
> extent to be initialized during the test.
>
> After go through the src code. Call routinue is
>
ext4_da_sritepages->mpage_da_map_blocks->ext4_get_blocks->ext4_ext_get_blocks->
>
ext4_ext_handle_uninitialized_extents->ext4_ext_convert_to_initialized->ext4_ext_insert_extent
>
>
> if ext4_ext_handle_uninitialized_extents is called, then line 3306 must be
satisfied.
> that is we have in_range(iblock, ee_block, ee_len) = true.
> so iblock >= ee_block
>
> fs/ext4/extents.c
> 3306 <+++<+++if (in_range(iblock, ee_block, ee_len)) {
> 3307 <+++<+++<+++newblock = iblock - ee_block + ee_start;
> 3308 <+++<+++<+++/* number of remaining blocks in the extent */
> 3309 <+++<+++<+++allocated = ee_len - (iblock - ee_block);
> 3310 <+++<+++<+++ext_debug("%u fit into %u:%d ->
%llu\n", iblock,
> 3311 <+++<+++<+++<+++<+++ee_block, ee_len, newblock);
> 3312
> 3313 <+++<+++<+++/* Do not put uninitialized extent in the cache
*/
> 3314 <+++<+++<+++if (!ext4_ext_is_uninitialized(ex)) {
> 3315 <+++<+++<+++<+++ext4_ext_put_in_cache(inode, ee_block,
> 3316 <+++<+++<+++<+++<+++<+++<+++ee_len, ee_start,
> 3317
<+++<+++<+++<+++<+++<+++<+++EXT4_EXT_CACHE_EXTENT);
> 3318 <+++<+++<+++<+++goto out;
> 3319 <+++<+++<+++}
> 3320 <+++<+++<+++ret =
ext4_ext_handle_uninitialized_extents(handle,
> 3321 <+++<+++<+++<+++<+++inode, iblock, max_blocks, path,
> 3322 <+++<+++<+++<+++<+++flags, allocated, bh_result,
newblock);
> 3323 <+++<+++<+++return ret;
> 3324 <+++<+++}
>
>
> the newext is from line 2678, its ee_block is iblock + max_blocks
> the nearex is path[depth].p_ext(line 1683)
>
> BUG_ON 1716 means iblock + max_blocks = ee_block.
> So maybe that means we have iblock = ee_block and max_blocks = 0.
>
>
> 1716 <+++<+++BUG_ON(newext->ee_block == nearex->ee_block);
> 1717 <+++<+++len = (EXT_MAX_EXTENT(eh) - nearex) * sizeof(struct
ext4_extent);
> 1718 <+++<+++len = len < 0 ? 0 : len;
> 1719 <+++<+++ext_debug("insert %d:%llu:[%d]%d before: nearest
0x%p, "
> 1720 <+++<+++<+++<+++"move %d from 0x%p to 0x%p\n",
> 1721 <+++<+++<+++<+++le32_to_cpu(newext->ee_block),
> 1722 <+++<+++<+++<+++ext_pblock(newext),
> 1723 <+++<+++<+++<+++ext4_ext_is_uninitialized(newext),
> 1724 <+++<+++<+++<+++ext4_ext_get_actual_len(newext),
> 1725 <+++<+++<+++<+++nearex, len, nearex + 1, nearex + 2);
> 1726 <+++<+++memmove(nearex + 1, nearex, len);
> 1727 <+++<+++path[depth].p_ext = nearex;
> 1728 <+++}
>
>
> 2678 <+++<+++ex3 = &newex;
> 2679 <+++<+++ex3->ee_block = cpu_to_le32(iblock + max_blocks);
> 2680 <+++<+++ext4_ext_store_pblock(ex3, newblock + max_blocks);
> 2681 <+++<+++ex3->ee_len = cpu_to_le16(allocated - max_blocks);
> 2682 <+++<+++ext4_ext_mark_uninitialized(ex3);
> 2683 <+++<+++err = ext4_ext_insert_extent(handle, inode, path, ex3,
0);
> 2684 <+++<+++if (err == -ENOSPC && may_zeroout) {
> 2685 <+++<+++<+++err = ext4_ext_zeroout(inode, &orig_ex);
>
>
> if max_blocks = 0; it means 2225, mpd->b_size >>
mpd->inode->i_blkbits is 0.
>
> fs/ext4/inode.c
> 2220 static int mpage_da_map_blocks(struct mpage_da_data *mpd)
> 2221 {
> 2222 <+++int err, blks, get_blocks_flags;
> 2223 <+++struct buffer_head new;
> 2224 <+++sector_t next = mpd->b_blocknr;
> 2225 <+++unsigned max_blocks = mpd->b_size >>
mpd->inode->i_blkbits;
> 2226 <+++loff_t disksize = EXT4_I(mpd->inode)->i_disksize;
> 2227 <+++handle_t *handle = NULL;
> 2228
>
>
> Could it be possilbe, right now I am tring to reproduce this problem in a
much
> easiler way, any suggestion?
>
> Many thanks.
>
>
> ------------[ cut here ]------------
> kernel BUG at fs/ext4/extents.c:1716!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/block/tapdevk/stat
> CPU 3
> Modules linked in: xt_iprange xt_mac arptable_filter arp_tables xt_physdev
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
> iptable_filter ip_tables bridge autofs4 ipmi_devintf ipmi_si
ipmi_msghandler lockd sunrpc bonding ipv6 8021q garp stp llc xenfs
> dm_multipath fuse xen_netback xen_blkback blktap blkback_pagemap loop nbd
video output sbs sbshc parport_pc lp parport joydev ses
> enclosure snd_seq_dummy snd_seq_oss bnx2 snd_seq_midi_event snd_seq
snd_seq_device dcdbas snd_pcm_oss snd_mixer_oss serio_raw snd_pcm
> snd_timer snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr
shpchp [last unloaded: freq_table]
> Pid: 9073, comm: flush-8:16 Not tainted 2.6.32.36xen #1 PowerEdge R710
> RIP: e030:[<ffffffff811a6184>] [<ffffffff811a6184>]
ext4_ext_insert_extent+0xac1/0xbe0
> RSP: e02b:ffff8801499cd580 EFLAGS: 00010246
> RAX: 0000000000002948 RBX: 0000000000000000 RCX: ffff8801499cd780
> RDX: ffff8801499cd360 RSI: ffff88007dedb310 RDI: 0000000000000017
> RBP: ffff8801499cd650 R08: ffff8801499cd340 R09: ffff880063488930
> R10: 000000018100f8bf R11: dead000000200200 R12: ffff88005a29700c
> R13: ffff88005a297000 R14: ffff8801158198c0 R15: ffff88003e9ea1b0
> FS: 00007fd3cc4bf6e0(0000) GS:ffff88002808f000(0000) knlGS:0000000000000000
> CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000042a09e CR3: 00000000bf3bd000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process flush-8:16 (pid: 9073, threadinfo ffff8801499cc000, task
ffff880149ad5b40)
> Stack:
> ffff8801499cd780 ffff88003e9ea180 ffff8801c5b47300 01ffffff81103c0c
> <0> ffff88003e9ea180 000000017dedb2a0 ffff880115819800
ffff88007dedb2a0
> <0> ffff8801499cd5d0 ffffffff811c12ea ffff8801499cd5f0
ffffffff811c16ea
> Call Trace:
> [<ffffffff811c12ea>] ? jbd_unlock_bh_journal_head+0x16/0x18
> [<ffffffff811c16ea>] ? jbd2_journal_put_journal_head+0x4d/0x52
> [<ffffffff811bb7d6>] ? jbd2_journal_get_write_access+0x31/0x38
> [<ffffffff811a88e9>] ? __ext4_journal_get_write_access+0x4c/0x5f
> [<ffffffff811a6ce3>]
ext4_ext_handle_uninitialized_extents+0xa40/0xef5
> [<ffffffff8100f175>] ? xen_force_evtchn_callback+0xd/0xf
> [<ffffffff8100f8d2>] ? check_events+0x12/0x20
> [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
> [<ffffffff811a74e1>] ext4_ext_get_blocks+0x265/0x6eb
> [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
> [<ffffffff81188b55>] ext4_get_blocks+0x140/0x204
> [<ffffffff81188d2f>] mpage_da_map_blocks+0xb7/0x681
> [<ffffffff810d3b29>] ? find_get_pages_tag+0x48/0xcc
> [<ffffffff8100f8d2>] ? check_events+0x12/0x20
> [<ffffffff810da8df>] ? pagevec_lookup_tag+0x27/0x30
> [<ffffffff810d87cc>] ? write_cache_pages+0x175/0x35e
> [<ffffffff811893f0>] ? __mpage_da_writepage+0x0/0x164
> [<ffffffff81103c0c>] ? kmem_cache_alloc+0x94/0xf6
> [<ffffffff811bbc40>] ? jbd2_journal_start+0xa1/0xcd
> [<ffffffff8119957f>] ? ext4_journal_start_sb+0xdc/0x111
> [<ffffffff81186852>] ? ext4_meta_trans_blocks+0x74/0xce
> [<ffffffff8118bc42>] ext4_da_writepages+0x47a/0x6a7
> [<ffffffff810d8a00>] do_writepages+0x21/0x2a
> [<ffffffff8112cdb8>] writeback_single_inode+0xc8/0x1e3
> [<ffffffff8112d5e4>] writeback_inodes_wb+0x30b/0x37e
> [<ffffffff8102f82d>] ? paravirt_end_context_switch+0x17/0x31
> [<ffffffff8100b459>] ? xen_end_context_switch+0x1e/0x22
> [<ffffffff8112d788>] wb_writeback+0x131/0x1bb
> [<ffffffff81064029>] ? try_to_del_timer_sync+0x73/0x81
> [<ffffffff8112d9ef>] wb_do_writeback+0x13c/0x153
> [<ffffffff8106425b>] ? process_timeout+0x0/0x10
> [<ffffffff810e78d1>] ? bdi_start_fn+0x0/0xd0
> [<ffffffff8112da32>] bdi_writeback_task+0x2c/0xb3
> [<ffffffff810e793b>] bdi_start_fn+0x6a/0xd0
> [<ffffffff810754b7>] kthread+0x6e/0x76
> [<ffffffff81013daa>] child_rip+0xa/0x20
> [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
> [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
> [<ffffffff81013da0>] ? child_rip+0x0/0x20
> Code: 8d 04 85 f4 ff ff ff 85 c0 0f 49 d8 48 63 d3 e8 47 c7 07 00 49 8d 44
24 0c 49 89 47 10 eb 3a bb f4 ff ff ff e9 c2 00 00 00 75 04
> <0f> 0b eb fe 41 0f b7 45 04 49 8d 7c 24 0c 48 6b c0 0c 4c 89 e6
> RIP [<ffffffff811a6184>] ext4_ext_insert_extent+0xac1/0xbe0
> RSP <ffff8801499cd580>
> ---[ end trace 035c7d09ed95fb32 ]---
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-06 14:53 UTC

head link

[Xen-devel] Re: ext4 BUG in dom0 Kernel 2.6.32.36

On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun
wrote:> 
> 
> Hi:
> 
> I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack
below)
Did you try the 3.0 kernel?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Sep-06 15:11 UTC

head link

[Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

> Date: Tue, 6 Sep 2011 10:53:47 -0400
> From: konrad.wilk@oracle.com
> To: tinnycloud@hotmail.com
> CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com;
jeremy@goop.org
> Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36
> 
> On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:
> > 
> > 
> > Hi:
> > 
> > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel
stack below)
> 
> Did you try the 3.0 kernel?No,  I am afried the change would be to much for our current env.
May result in other stable issue.
So, I want to dig out what really happen. Hopes.
 
Thanks. 		 	   		  
--_88720937-3531-4584-a055-432612a6bc85_
Content-Type: text/html; charset="gb2312"
Content-Transfer-Encoding: 8bit

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:微软雅黑
}
--></style>
</head>
<body class=''hmmessage''><div
dir=''ltr''>
&nbsp;<BR>
<DIV>
&gt; Date: Tue, 6 Sep 2011 10:53:47 -0400<BR>&gt; From:
konrad.wilk@oracle.com<BR>&gt; To:
tinnycloud@hotmail.com<BR>&gt; CC: linux-ext4@vger.kernel.org;
xen-devel@lists.xensource.com; jeremy@goop.org<BR>&gt; Subject: Re:
ext4 BUG in dom0 Kernel 2.6.32.36<BR>&gt; <BR>&gt; On Tue,
Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:<BR>&gt; &gt;
<BR>&gt; &gt; <BR>&gt; &gt; Hi:<BR>&gt;
&gt; <BR>&gt; &gt; I''ve met an ext4 Bug in dom0
kernel 2.6.32.36. (See kernel stack below)<BR>&gt; <BR>&gt;
Did you try the 3.0 kernel?<BR>No,&nbsp; I am afried the change would
be to much for our current env.</DIV>
<DIV>May result in other stable issue.</DIV>
<DIV>So, I want to dig out what really happen. Hopes.</DIV>
<DIV>&nbsp;</DIV>
<DIV>Thanks.</DIV> 		 	   		  </div></body>
</html>
--_88720937-3531-4584-a055-432612a6bc85_--


--===============1819275364=Content-Type: text/plain;
charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

--===============1819275364==--

Jeremy Fitzhardinge

2011-Sep-06 18:55 UTC

head link

Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

On 09/06/2011 08:11 AM, MaoXiaoyun wrote:>
> > Date: Tue, 6 Sep 2011 10:53:47 -0400
> > From: konrad.wilk@oracle.com
> > To: tinnycloud@hotmail.com
> > CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com;
> jeremy@goop.org
> > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36
> >
> > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:
> > >
> > >
> > > Hi:
> > >
> > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See
kernel stack
> below)
> >
> > Did you try the 3.0 kernel?
> No, I am afried the change would be to much for our current env.
> May result in other stable issue.
> So, I want to dig out what really happen. Hopes.
Another question is whether this is a regression compared to earlier
versions of 2.6.32? Do you know if this problem exists in a non-Xen
environment?

Thanks,
J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Sep-07 02:35 UTC

head link

RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

----------------------------------------> Date: Tue, 6 Sep 2011 11:55:02 -0700
> From: jeremy@goop.org
> To: tinnycloud@hotmail.com
> CC: konrad.wilk@oracle.com; linux-ext4@vger.kernel.org;
xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36
>
> On 09/06/2011 08:11 AM, MaoXiaoyun wrote:
> >
> > > Date: Tue, 6 Sep 2011 10:53:47 -0400
> > > From: konrad.wilk@oracle.com
> > > To: tinnycloud@hotmail.com
> > > CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com;
> > jeremy@goop.org
> > > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36
> > >
> > > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:
> > > >
> > > >
> > > > Hi:
> > > >
> > > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See
kernel stack
> > below)
> > >
> > > Did you try the 3.0 kernel?
> > No, I am afried the change would be to much for our current env.
> > May result in other stable issue.
> > So, I want to dig out what really happen. Hopes.
>
> Another question is whether this is a regression compared to earlier
> versions of 2.6.32? Do you know if this problem exists in a non-Xen
> environment?
> 
There are some others reports this issue in non-xen env.
http://markmail.org/message/ywr4nfgiuvgdcr7y
http://www.spinics.net/lists/linux-ext4/msg21066.html
 
The difficulty is I haven''t find a efficient way to reproduce it.
(Currently it only show in our cluster, redeploy our cluster may cost 3days
more. )
 
> Thanks,
> J
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Sep-16 06:08 UTC

head link

RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

Hi:
 
    I finally captured extents overlaped in the ext4. But still wondering how it
happen.
 
    I checked overlap for the last extent in the tree at the very beginning of
ext4_ext_convert_to_initialized. Messages.12 attached show the overlap found.
 
Line 8-10: 3467:[1]15:57921642 3468:[0]14:57921643 has overlaped.
 
 
8 Sep 15 08:27:39 xmao kernel:  3331:[0]7:53750025 3338:[0]8:53750033
3346:[0]1:53848953 3347:[0]7:53848955 3354:[0]1:53848969 3355:[0]7:53848971
3362:[0]1:53848985 3363:[0]7:56996848 3370:[0]1:57606144 3371:[0]7:57795290
3378:[0]1:57814407 3379:[0]7:57858606 3386:[0]8:57858620 3394:[0]1:57858629
3395:[0]8:57858637 3403:[0]7:57858646 3410:[0]1:57858661 3411:[0]8:57858669
3419:[0]7:57858678 3426:[0]8:57858692 3434:[0]1:57858701 3435:[0]7:57858709
3442:[0]1:57858717 3443:[0]7:57858725 3450:[0]1:57858733 3451:[0]7:57858741
3458:[0]1:57858749 3459:[0]7:57858757 3466:[0]1:57921634 3467:[1]15:57921642
9  Sep 15 08:27:39 xmao kernel: Displaying leaf extents for inode 12339004
10 Sep 15 08:27:39 xmao kernel: 3468:[0]14:57921643 3482:[0]1:57921664
3483:[0]7:57921666 3490:[0]1:57921680 3491:[0]8:57921682 3499:[0]7:57921691
3506:[0]8:57921705 3514:[0]1:57921714 3515:[0]7:57921722 3522:[0]41:57916683
3563:[0]7:58159767 3570:[0]1:58159781 3571:[0]7:58238992 3578:[0]1:58288144
3579:[0]7:58327750 3586:[0]1:58579969 3587:[0]7:58954838 3594:[0]1:59006641
3595:[0]7:59006643 3602:[0]1:59006657 3603:[0]7:59006659 3610:[0]8:59006673
3618:[0]8:59006688 3626:[0]470:58982658 4096:[0]3:58987732 4099:[0]1:58992655
4100:[0]7:59143253 4107:[0]1:59171840 4108:[0]7:59183878 4115:[0]1:59192886
4116:[0]8:59593463 4124:[0]8:59669484 4132:[0]7:73086538 4139:[0]1:73352801
4140:[0]7:73339273 4147:[0]1:73526280 4148:[0]8:78229012 4156:[0]1:78229021
4157:[0]7:78818388 4164:[0]1:79069383 4165:[0]7:79428616 4172:[0]1:80490925
4173:[0]7:81439488 4180:[0]1:82854062 4181:[0]7:83462272 4188:[0]1:83656904
4189:[0]7:89127381 4196:[0]1:89584313 4197:[0]8:91592930 4205:[0]7:91592945
4212:[0]1:91592953 4213:[0]7:91592961 422

 
I also dumped file in disk use filefrag which show no overlap, no extent
3468:[0]14:57921643.
 
 ext logical physical expected length flags
....
 337    3459 57858757 57858749      7
 338    3466 57921634 57858763      1 unwritten
 339    3467 57921642 57921634     15 unwritten
 340    3482 57921664 57921656      1
 341    3483 57921666 57921664      7
.....
 
There is one assumption, After 3468:[0]14:57921643 successfully inserted,  there
is something err happen.
At the bottom of ext4_ext_convert_to_initialized, fix_extent_len will fix the
origin ex ee_len.(Later I will do the err check)
 
3403 fix_extent_len:
3404     ex->ee_block = orig_ex.ee_block;
3405     ex->ee_len   = orig_ex.ee_len;
3406     ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
3407     ext4_ext_mark_uninitialized(ex);
3408     ext4_ext_dirty(handle, inode, path + depth);
 
 
Any comments?
 
 
Well, but something strange messages.12.
 
 
message.12 is from another machine, it log is printf right before
BUG_ON(newext->ee_block == nearex->ee_block);
strange is 14412:[1]16:9927''s pblock is much different from
14411:[0]1:222332613.

 
1993         if(newext->ee_block == nearex->ee_block){
1994             len = (EXT_MAX_EXTENT(eh) - nearex) * sizeof(struct
ext4_extent);
1995             len = len < 0 ? 0 : len;
1996             printk("old_depth %d depth %d old_path %p path %p
next_has_free %d next %llu\n",
1997                     old_depth, depth, old_path, path, next_has_free,
(unsigned long long)next);
2004 
2005             printk("insert %d:%llu:[%d]%d before: nearest 0x%p, "
2006                     "move %d from 0x%p to 0x%p\n",
2007                     le32_to_cpu(newext->ee_block),
2008                     ext_pblock(newext),
2009                     ext4_ext_is_uninitialized(newext),
2010                     ext4_ext_get_actual_len(newext),
2011                     nearex, len, nearex + 1, nearex + 2);
2012             ext4_ext_show_leaf_xmao(inode, old_path);
2013             ext4_ext_show_leaf_xmao(inode, path);
2014         };
2015         BUG_ON(newext->ee_block == nearex->ee_block);
 
 
 
Sep 13 16:16:35 xmao kernel: 57:[0]31:157254721 12288:[0]54:157503830
12342:[0]10:157503884 12352:[0]5:157534763 12357:[0]1:157534768
12358:[0]58:157534769 12416:[0]64:157567168 12480:[0]13:158051261
12493:[0]73:172263095 12566:[0]24:172265399 12590:[0]71:172521859
12661:[0]71:172627897 12732:[0]71:172733735 12803:[0]69:172722619
12872:[0]9:172764859 12881:[0]42:110500028 12923:[0]86:143030061
13009:[0]86:143119859 13095:[0]48:143173376 13143:[0]16:195333586
13159:[0]32:197526105 13191:[0]40:198875861 13231:[0]39:198872300
13270:[0]5:199663576 13275:[0]26:200964192 13301:[0]36:202015708
13337:[0]47:202221682 13384:[0]9:202221729 13393:[0]58:202624966
13451:[0]12:202606535 13463:[0]35:212117725 13498:[0]35:212135811
13533:[0]34:212115513 13567:[0]32:212108608 13599:[0]29:212144185
13628:[0]50:231280420 13678:[0]38:231645389 13716:[0]13:231645427
13729:[0]51:231650765 13780:[0]50:231647658 13830:[0]54:231985340
13884:[0]24:231981259 13908:[0]64:105098731 13972:[0]87:136696745
14059:[0]45:136700237 14104:[0]61:2
Sep 13 16:16:35 xmao kernel: 3651 14165:[0]69:222042299 14234:[0]68:222044092
14302:[0]34:222091761 14336:[0]68:222172860 14404:[0]7:222332606
14411:[0]1:222332613
Sep 13 16:16:35 xmao kernel: Displaying leaf extents for inode 30685060
Sep 13 16:16:35 xmao kernel: 14412:[1]16:9927 14428:[1]41:13213 14469:[1]1:13254
14470:[0]67:222673085
 
 
Also, filefrag show extents is ok.
 
 336   14302 222091761 222044159     34
 337   14336 222172860 222091794     68
 338   14404 222332606 222172927      7
 339   14411 222332613              59 unwritten
 340   14470 222673085 222332671     67
 341   14537 222848155 222673151     43
 342   14580 165617358 222848197     56
 343   14636 165777353 165617413     55
 344   14691 165961927 165777407     57
 
seems 14412:[1]16:9927 14428:[1]41:13213 14469:[1]1:13254  is unexpected.
 
 
Many thanks.
----------------------------------------> From: tinnycloud@hotmail.com
> To: jeremy@goop.org
> CC: konrad.wilk@oracle.com; linux-ext4@vger.kernel.org;
xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36
> Date: Wed, 7 Sep 2011 10:35:21 +0800
>
>
>
>
> ----------------------------------------
> > Date: Tue, 6 Sep 2011 11:55:02 -0700
> > From: jeremy@goop.org
> > To: tinnycloud@hotmail.com
> > CC: konrad.wilk@oracle.com; linux-ext4@vger.kernel.org;
xen-devel@lists.xensource.com
> > Subject: Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36
> >
> > On 09/06/2011 08:11 AM, MaoXiaoyun wrote:
> > >
> > > > Date: Tue, 6 Sep 2011 10:53:47 -0400
> > > > From: konrad.wilk@oracle.com
> > > > To: tinnycloud@hotmail.com
> > > > CC: linux-ext4@vger.kernel.org;
xen-devel@lists.xensource.com;
> > > jeremy@goop.org
> > > > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36
> > > >
> > > > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:
> > > > >
> > > > >
> > > > > Hi:
> > > > >
> > > > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36.
(See kernel stack
> > > below)
> > > >
> > > > Did you try the 3.0 kernel?
> > > No, I am afried the change would be to much for our current env.
> > > May result in other stable issue.
> > > So, I want to dig out what really happen. Hopes.
> >
> > Another question is whether this is a regression compared to earlier
> > versions of 2.6.32? Do you know if this problem exists in a non-Xen
> > environment?
> >
>
> There are some others reports this issue in non-xen env.
> http://markmail.org/message/ywr4nfgiuvgdcr7y
> http://www.spinics.net/lists/linux-ext4/msg21066.html
>
> The difficulty is I haven''t find a efficient way to reproduce it.
> (Currently it only show in our cluster, redeploy our cluster may cost 3days
more. )
>
>
> > Thanks,
> > J
> > --
> > To unsubscribe from this list: send the line "unsubscribe
linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Sep-25 08:45 UTC

head link

[Xen-devel] [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

Hi:
 
   We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush thread, and
result in disk unvailiable.
 
   BUG details refer to:
http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded
 
   Attached is the fix, verified in our env. 
 
   Without this patch, more than 3 servers hit BUG_ON in our hundreds of servers
every day.
 
   
   many thanks. 		 	   		  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-26 14:28 UTC

head link

[Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

On Sun, Sep 25, 2011 at 04:45:39PM +0800, MaoXiaoyun
wrote:> 
>  
> Hi:
>  
>    We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush thread,
and result in disk unvailiable.
>  
>    BUG details refer to:
http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded
>  
>    Attached is the fix, verified in our env. 
So.. you are asking for this upstream git commit to be back-ported to 2.6.32,
right?
>  
>    Without this patch, more than 3 servers hit BUG_ON in our hundreds of
servers every day.
>  
>    
>    many thanks. 		 	   		  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Sep-27 02:22 UTC

head link

[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

----------------------------------------> Date: Mon, 26 Sep 2011 10:28:08 -0400
> From: konrad.wilk@oracle.com
> To: tinnycloud@hotmail.com
> CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com;
tytso@mit.edu; jack@suse.cz
> Subject: Re: [patch 1/1]
ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
>
> On Sun, Sep 25, 2011 at 04:45:39PM +0800, MaoXiaoyun wrote:
> >
> >
> > Hi:
> >
> > We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush
thread, and result in disk unvailiable.
> >
> > BUG details refer to:
http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded
> >
> > Attached is the fix, verified in our env.
>
> So.. you are asking for this upstream git commit to be back-ported to
2.6.32, right?
> 
The patch is for 2.6.39. It can be patched on 2.6.32 too.
Thanks.
> >
> > Without this patch, more than 3 servers hit BUG_ON in our hundreds of
servers every day.
> >
> >
> > many thanks.
>
> 		 	   		  
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Sep-27 09:09 UTC

head link

[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

>>> On 27.09.11 at 04:22, MaoXiaoyun <tinnycloud@hotmail.com>
wrote:
> 
> 
> ----------------------------------------
>> Date: Mon, 26 Sep 2011 10:28:08 -0400
>> From: konrad.wilk@oracle.com 
>> To: tinnycloud@hotmail.com 
>> CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com;
tytso@mit.edu;
> jack@suse.cz 
>> Subject: Re: [patch 1/1]
ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
>>
>> On Sun, Sep 25, 2011 at 04:45:39PM +0800, MaoXiaoyun wrote:
>> >
>> >
>> > Hi:
>> >
>> > We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush
thread,
> and result in disk unvailiable.
>> >
>> > BUG details refer to: 
>
http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded
>> >
>> > Attached is the fix, verified in our env.
>>
>> So.. you are asking for this upstream git commit to be back-ported to
2.6.32,
> right?
>>
>  
> The patch is for 2.6.39. It can be patched on 2.6.32 too.
> Thanks.
So why don''t you suggest applying this to the stable tree maintainers
instead? xen-devel really isn''t the right forum for this sort of bug
fixes,
particularly when the underlying kernel.org tree is still being maintained.

Jan
>> >
>> > Without this patch, more than 3 servers hit BUG_ON in our hundreds
of
> servers every day.
>> >
>> >
>> > many thanks.
>>
>> 		 	   		  
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tao Ma

2011-Sep-27 09:54 UTC

head link

Re: [Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

On 09/27/2011 05:09 PM, Jan Beulich wrote:>>>> On 27.09.11 at 04:22, MaoXiaoyun <tinnycloud@hotmail.com>
wrote:
> 
>>
>>
>> ----------------------------------------
>>> Date: Mon, 26 Sep 2011 10:28:08 -0400
>>> From: konrad.wilk@oracle.com 
>>> To: tinnycloud@hotmail.com 
>>> CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com;
tytso@mit.edu;
>> jack@suse.cz 
>>> Subject: Re: [patch 1/1]
ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
>>>
>>> On Sun, Sep 25, 2011 at 04:45:39PM +0800, MaoXiaoyun wrote:
>>>>
>>>>
>>>> Hi:
>>>>
>>>> We met an ext4 BUG_ON in extents.c:1716 which crash kernel
flush thread,
>> and result in disk unvailiable.
>>>>
>>>> BUG details refer to: 
>>
http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded
>>>>
>>>> Attached is the fix, verified in our env.
>>>
>>> So.. you are asking for this upstream git commit to be back-ported
to 2.6.32,
>> right?
>>>
>>  
>> The patch is for 2.6.39. It can be patched on 2.6.32 too.
>> Thanks.
> 
> So why don''t you suggest applying this to the stable tree
maintainers
> instead? xen-devel really isn''t the right forum for this sort of
bug fixes,
> particularly when the underlying kernel.org tree is still being maintained.AFAIK, the upstream linux kernel doesn''t have this problem because this
part of codes have been refactored. So I am not sure whether Greg KH
will accept it or not.

btw, I don''t think the fix is appropriate. One of my colleague is
working out another patch to resolve this(I will ask him to post the
patch when it is ready). And we will contact Redhat for considering
merging it to the enterprise kernel.

Thanks
Tao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ted Ts''o

2011-Sep-27 19:35 UTC

head link

[Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk
wrote:> >  
> >    Attached is the fix, verified in our env. 
> 
> So.. you are asking for this upstream git commit to be back-ported
> to 2.6.32, right?
I''m curious --- is there a good reason why Xen users are using an
upstream 2.6.32 kernel?  If they are using a distro kernel, fine, but
then the distro kernel should be providing the support.  But at this
point, 2.6.32 is so positively *ancient* that, I''m personally not
interesting in providing free, unpaid distro support for users who
aren''t willing to either (a) pay $$$ and get a supported distro
kernel, or (b) use a much more modern kernel.  At this point, Guest
and Host Xen support is available in 3.0 kernels, so there''s really no
excuse, right?

						- Ted

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olivier B.

2011-Sep-27 23:41 UTC

head link

Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

On 27/09/2011 21:35, Ted Ts''o wrote:> On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote:
>>>
>>>     Attached is the fix, verified in our env.
>>
>> So.. you are asking for this upstream git commit to be back-ported
>> to 2.6.32, right?
>
> I''m curious --- is there a good reason why Xen users are using an
> upstream 2.6.32 kernel?  If they are using a distro kernel, fine, but
> then the distro kernel should be providing the support.  But at this
> point, 2.6.32 is so positively *ancient* that, I''m personally not
> interesting in providing free, unpaid distro support for users who
> aren''t willing to either (a) pay $$$ and get a supported distro
> kernel, or (b) use a much more modern kernel.  At this point, Guest
> and Host Xen support is available in 3.0 kernels, so there''s
really no
> excuse, right?
>
> 						- Ted
>
In my case, for Dom0 I use the 2.6.32 kernel from my distrib, because
it''s stable.
I have stability problems with the kernel 3.0 on Dom0, and as I haven''t
physical access neither kvm or serial port, I don''t know what to
report... It just hang randomly, without any line in logs.

Does the Xen support on 3.0 kernels should be considered stable ?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Sep-28 04:09 UTC

head link

[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

----------------------------------------> Date: Tue, 27 Sep 2011 15:35:23 -0400
> From: tytso@mit.edu
> To: konrad.wilk@oracle.com
> CC: tinnycloud@hotmail.com; linux-ext4@vger.kernel.org;
xen-devel@lists.xensource.com; jack@suse.cz
> Subject: Re: [patch 1/1]
ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
>
> On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote:
> > >
> > > Attached is the fix, verified in our env.
> >
> > So.. you are asking for this upstream git commit to be back-ported
> > to 2.6.32, right?
>
> I''m curious --- is there a good reason why Xen users are using an
> upstream 2.6.32 kernel? If they are using a distro kernel, fine, but
> then the distro kernel should be providing the support. But at this
> point, 2.6.32 is so positively *ancient* that, I''m personally not
> interesting in providing free, unpaid distro support for users who
> aren''t willing to either (a) pay $$$ and get a supported distro
> kernel, or (b) use a much more modern kernel. At this point, Guest
> and Host Xen support is available in 3.0 kernels, so there''s
really no
> excuse, right?
Mmm...
 
We first met this bug at pvops kernel(jeremy''s tree, 2.6.32.36). 
 
We failed to find any related fix from google, so we debug the bug ourself.
Fortunately, we located root cause and thought some other xen users might
have this problem as well, that''s why we sent out the fix to Xen-devel.
 
We go through the code from 2.6.32 - 2.6.39, this bug exists.
People who use *ancient* kernel need this. 
 
Thanks.
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-28 12:47 UTC

head link

Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

On Wed, Sep 28, 2011 at 01:41:30AM +0200, Olivier B.
wrote:> On 27/09/2011 21:35, Ted Ts''o wrote:
> >On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote:
> >>>
> >>>    Attached is the fix, verified in our env.
> >>
> >>So.. you are asking for this upstream git commit to be back-ported
> >>to 2.6.32, right?
> >
> >I''m curious --- is there a good reason why Xen users are using
an
> >upstream 2.6.32 kernel?  If they are using a distro kernel, fine, but
> >then the distro kernel should be providing the support.  But at this
> >point, 2.6.32 is so positively *ancient* that, I''m personally
not
> >interesting in providing free, unpaid distro support for users who
> >aren''t willing to either (a) pay $$$ and get a supported
distro
> >kernel, or (b) use a much more modern kernel.  At this point, Guest
> >and Host Xen support is available in 3.0 kernels, so there''s
really no
> >excuse, right?
> >
> >						- Ted
> >
> 
> In my case, for Dom0 I use the 2.6.32 kernel from my distrib, because
it''s stable.
> I have stability problems with the kernel 3.0 on Dom0, and as I
haven''t physical access neither kvm or serial port, I don''t
know what to report... It just hang randomly, without any line in logs.
No physical access? No IPMI? No SOL?> 
> Does the Xen support on 3.0 kernels should be considered stable ?
Yes - it should be considered stable (3.0.4 at least, with one particular
patch that is going in 3.0.5: https://lkml.org/lkml/2011/9/2/114).

Please help me find whatever is causing your crash.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Sep-28 18:41 UTC

head link

Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

On 09/27/2011 12:35 PM, Ted Ts''o wrote:> On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote:
>>>  
>>>    Attached is the fix, verified in our env. 
>> So.. you are asking for this upstream git commit to be back-ported
>> to 2.6.32, right?
> I''m curious --- is there a good reason why Xen users are using an
> upstream 2.6.32 kernel?  If they are using a distro kernel, fine, but
> then the distro kernel should be providing the support.  But at this
> point, 2.6.32 is so positively *ancient* that, I''m personally not
> interesting in providing free, unpaid distro support for users who
> aren''t willing to either (a) pay $$$ and get a supported distro
> kernel, or (b) use a much more modern kernel.  At this point, Guest
> and Host Xen support is available in 3.0 kernels, so there''s
really no
> excuse, right?
The 2.6.32.x-based kernel has been the preferred "stable" kernel for
Xen
users for a while, and it is still considered to be more stable and
functional than what''s upstream (obviously we''re trying to fix
that).
Also, because many current distros don''t support Xen dom0, it has been
an ad-hoc distro kernel.

Since kernel.org 2.6.32 is still considered to be a maintained
long-term-stable kernel, I keep the xen.git version up-to-date with
stable-2.6.32 bugfixes and occasional separate Xen-specific fixes.  But
I''d really prefer to avoid having any non-Xen private changes in that
tree, in favour of getting everything from upstream stable.

Do you not consider it worth continuing support of the 2.6.32 stable
tree with respect to ext4?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ted Ts''o

2011-Sep-28 19:46 UTC

head link

Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

On Wed, Sep 28, 2011 at 11:41:11AM -0700, Jeremy Fitzhardinge
wrote:> Since kernel.org 2.6.32 is still considered to be a maintained
> long-term-stable kernel, I keep the xen.git version up-to-date with
> stable-2.6.32 bugfixes and occasional separate Xen-specific fixes.  But
> I''d really prefer to avoid having any non-Xen private changes in
that
> tree, in favour of getting everything from upstream stable.
> 
> Do you not consider it worth continuing support of the 2.6.32 stable
> tree with respect to ext4?
I just don''t have the *time* to maintain backports of ext4 fixes to
2.6.32.  There have been so many bug fixes to ext4, and some of them
depend on changes in the quota subsystem, so trying to back port them
all would be hellish, and not something I''m willing to do on a
volunteer basis.

I''m busy enough with silly things like trying to help with the
kernel.org getting back on-line, that channelling my stay-really-late
hours to support users who are too cheap to pay distro support fees is
not really a way that I would choose to spend my personal time.

If someone would like to volunteer to be unpaid distro support, that''s
great.  It''s worth it as long as I get to volunteer somebody
else''s
time.  :-)

					- Ted

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Reasonably Related Threads

Search for more apparently analagous threads

Xen devel - Sep 2011 - ext4 BUG in dom0 Kernel 2.6.32.36

[Xen-devel] ext4 BUG in dom0 Kernel 2.6.32.36

[Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

[Xen-devel] Re: ext4 BUG in dom0 Kernel 2.6.32.36

[Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36

[Xen-devel] [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

[Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

Re: [Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

[Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch

Reasonably Related Threads