Hi: I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack below) 32.36 kernel commit: http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ae333e97552c81ab10395ad1ffc6d6daaadb144a The bug only show up in our cluster environments which includes 300 physical machines, one server will run into this bug per day. Running ontop of every server, there are about 30 VMS, each of which has heavy IO workload inside.(we are doing some kinds of stress tests) We have our own distribute file system as the storage of cluster, every VM''image file will be spilt into several files with equal size in physical disk, and every creation of file use ext4 fallocation(fallocation size 1MB). So I believe there will be quite a lot of uninitialized extent to be initialized during the test. After go through the src code. Call routinue is ext4_da_sritepages->mpage_da_map_blocks->ext4_get_blocks->ext4_ext_get_blocks-> ext4_ext_handle_uninitialized_extents->ext4_ext_convert_to_initialized->ext4_ext_insert_extent if ext4_ext_handle_uninitialized_extents is called, then line 3306 must be satisfied. that is we have in_range(iblock, ee_block, ee_len) = true. so iblock >= ee_block fs/ext4/extents.c 3306 <+++<+++if (in_range(iblock, ee_block, ee_len)) { 3307 <+++<+++<+++newblock = iblock - ee_block + ee_start; 3308 <+++<+++<+++/* number of remaining blocks in the extent */ 3309 <+++<+++<+++allocated = ee_len - (iblock - ee_block); 3310 <+++<+++<+++ext_debug("%u fit into %u:%d -> %llu\n", iblock, 3311 <+++<+++<+++<+++<+++ee_block, ee_len, newblock); 3312 3313 <+++<+++<+++/* Do not put uninitialized extent in the cache */ 3314 <+++<+++<+++if (!ext4_ext_is_uninitialized(ex)) { 3315 <+++<+++<+++<+++ext4_ext_put_in_cache(inode, ee_block, 3316 <+++<+++<+++<+++<+++<+++<+++ee_len, ee_start, 3317 <+++<+++<+++<+++<+++<+++<+++EXT4_EXT_CACHE_EXTENT); 3318 <+++<+++<+++<+++goto out; 3319 <+++<+++<+++} 3320 <+++<+++<+++ret = ext4_ext_handle_uninitialized_extents(handle, 3321 <+++<+++<+++<+++<+++inode, iblock, max_blocks, path, 3322 <+++<+++<+++<+++<+++flags, allocated, bh_result, newblock); 3323 <+++<+++<+++return ret; 3324 <+++<+++} the newext is from line 2678, its ee_block is iblock + max_blocks the nearex is path[depth].p_ext(line 1683) BUG_ON 1716 means iblock + max_blocks = ee_block. So maybe that means we have iblock = ee_block and max_blocks = 0. 1716 <+++<+++BUG_ON(newext->ee_block == nearex->ee_block); 1717 <+++<+++len = (EXT_MAX_EXTENT(eh) - nearex) * sizeof(struct ext4_extent); 1718 <+++<+++len = len < 0 ? 0 : len; 1719 <+++<+++ext_debug("insert %d:%llu:[%d]%d before: nearest 0x%p, " 1720 <+++<+++<+++<+++"move %d from 0x%p to 0x%p\n", 1721 <+++<+++<+++<+++le32_to_cpu(newext->ee_block), 1722 <+++<+++<+++<+++ext_pblock(newext), 1723 <+++<+++<+++<+++ext4_ext_is_uninitialized(newext), 1724 <+++<+++<+++<+++ext4_ext_get_actual_len(newext), 1725 <+++<+++<+++<+++nearex, len, nearex + 1, nearex + 2); 1726 <+++<+++memmove(nearex + 1, nearex, len); 1727 <+++<+++path[depth].p_ext = nearex; 1728 <+++} 2678 <+++<+++ex3 = &newex; 2679 <+++<+++ex3->ee_block = cpu_to_le32(iblock + max_blocks); 2680 <+++<+++ext4_ext_store_pblock(ex3, newblock + max_blocks); 2681 <+++<+++ex3->ee_len = cpu_to_le16(allocated - max_blocks); 2682 <+++<+++ext4_ext_mark_uninitialized(ex3); 2683 <+++<+++err = ext4_ext_insert_extent(handle, inode, path, ex3, 0); 2684 <+++<+++if (err == -ENOSPC && may_zeroout) { 2685 <+++<+++<+++err = ext4_ext_zeroout(inode, &orig_ex); if max_blocks = 0; it means 2225, mpd->b_size >> mpd->inode->i_blkbits is 0. fs/ext4/inode.c 2220 static int mpage_da_map_blocks(struct mpage_da_data *mpd) 2221 { 2222 <+++int err, blks, get_blocks_flags; 2223 <+++struct buffer_head new; 2224 <+++sector_t next = mpd->b_blocknr; 2225 <+++unsigned max_blocks = mpd->b_size >> mpd->inode->i_blkbits; 2226 <+++loff_t disksize = EXT4_I(mpd->inode)->i_disksize; 2227 <+++handle_t *handle = NULL; 2228 Could it be possilbe, right now I am tring to reproduce this problem in a much easiler way, any suggestion? Many thanks. ------------[ cut here ]------------ kernel BUG at fs/ext4/extents.c:1716! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/block/tapdevk/stat CPU 3 Modules linked in: xt_iprange xt_mac arptable_filter arp_tables xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables bridge autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 8021q garp stp llc xenfs dm_multipath fuse xen_netback xen_blkback blktap blkback_pagemap loop nbd video output sbs sbshc parport_pc lp parport joydev ses enclosure snd_seq_dummy snd_seq_oss bnx2 snd_seq_midi_event snd_seq snd_seq_device dcdbas snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr shpchp [last unloaded: freq_table] Pid: 9073, comm: flush-8:16 Not tainted 2.6.32.36xen #1 PowerEdge R710 RIP: e030:[<ffffffff811a6184>] [<ffffffff811a6184>] ext4_ext_insert_extent+0xac1/0xbe0 RSP: e02b:ffff8801499cd580 EFLAGS: 00010246 RAX: 0000000000002948 RBX: 0000000000000000 RCX: ffff8801499cd780 RDX: ffff8801499cd360 RSI: ffff88007dedb310 RDI: 0000000000000017 RBP: ffff8801499cd650 R08: ffff8801499cd340 R09: ffff880063488930 R10: 000000018100f8bf R11: dead000000200200 R12: ffff88005a29700c R13: ffff88005a297000 R14: ffff8801158198c0 R15: ffff88003e9ea1b0 FS: 00007fd3cc4bf6e0(0000) GS:ffff88002808f000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000042a09e CR3: 00000000bf3bd000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-8:16 (pid: 9073, threadinfo ffff8801499cc000, task ffff880149ad5b40) Stack: ffff8801499cd780 ffff88003e9ea180 ffff8801c5b47300 01ffffff81103c0c <0> ffff88003e9ea180 000000017dedb2a0 ffff880115819800 ffff88007dedb2a0 <0> ffff8801499cd5d0 ffffffff811c12ea ffff8801499cd5f0 ffffffff811c16ea Call Trace: [<ffffffff811c12ea>] ? jbd_unlock_bh_journal_head+0x16/0x18 [<ffffffff811c16ea>] ? jbd2_journal_put_journal_head+0x4d/0x52 [<ffffffff811bb7d6>] ? jbd2_journal_get_write_access+0x31/0x38 [<ffffffff811a88e9>] ? __ext4_journal_get_write_access+0x4c/0x5f [<ffffffff811a6ce3>] ext4_ext_handle_uninitialized_extents+0xa40/0xef5 [<ffffffff8100f175>] ? xen_force_evtchn_callback+0xd/0xf [<ffffffff8100f8d2>] ? check_events+0x12/0x20 [<ffffffff81042fcf>] ? need_resched+0x23/0x2d [<ffffffff811a74e1>] ext4_ext_get_blocks+0x265/0x6eb [<ffffffff81042fcf>] ? need_resched+0x23/0x2d [<ffffffff81188b55>] ext4_get_blocks+0x140/0x204 [<ffffffff81188d2f>] mpage_da_map_blocks+0xb7/0x681 [<ffffffff810d3b29>] ? find_get_pages_tag+0x48/0xcc [<ffffffff8100f8d2>] ? check_events+0x12/0x20 [<ffffffff810da8df>] ? pagevec_lookup_tag+0x27/0x30 [<ffffffff810d87cc>] ? write_cache_pages+0x175/0x35e [<ffffffff811893f0>] ? __mpage_da_writepage+0x0/0x164 [<ffffffff81103c0c>] ? kmem_cache_alloc+0x94/0xf6 [<ffffffff811bbc40>] ? jbd2_journal_start+0xa1/0xcd [<ffffffff8119957f>] ? ext4_journal_start_sb+0xdc/0x111 [<ffffffff81186852>] ? ext4_meta_trans_blocks+0x74/0xce [<ffffffff8118bc42>] ext4_da_writepages+0x47a/0x6a7 [<ffffffff810d8a00>] do_writepages+0x21/0x2a [<ffffffff8112cdb8>] writeback_single_inode+0xc8/0x1e3 [<ffffffff8112d5e4>] writeback_inodes_wb+0x30b/0x37e [<ffffffff8102f82d>] ? paravirt_end_context_switch+0x17/0x31 [<ffffffff8100b459>] ? xen_end_context_switch+0x1e/0x22 [<ffffffff8112d788>] wb_writeback+0x131/0x1bb [<ffffffff81064029>] ? try_to_del_timer_sync+0x73/0x81 [<ffffffff8112d9ef>] wb_do_writeback+0x13c/0x153 [<ffffffff8106425b>] ? process_timeout+0x0/0x10 [<ffffffff810e78d1>] ? bdi_start_fn+0x0/0xd0 [<ffffffff8112da32>] bdi_writeback_task+0x2c/0xb3 [<ffffffff810e793b>] bdi_start_fn+0x6a/0xd0 [<ffffffff810754b7>] kthread+0x6e/0x76 [<ffffffff81013daa>] child_rip+0xa/0x20 [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6 [<ffffffff81013da0>] ? child_rip+0x0/0x20 Code: 8d 04 85 f4 ff ff ff 85 c0 0f 49 d8 48 63 d3 e8 47 c7 07 00 49 8d 44 24 0c 49 89 47 10 eb 3a bb f4 ff ff ff e9 c2 00 00 00 75 04 <0f> 0b eb fe 41 0f b7 45 04 49 8d 7c 24 0c 48 6b c0 0c 4c 89 e6 RIP [<ffffffff811a6184>] ext4_ext_insert_extent+0xac1/0xbe0 RSP <ffff8801499cd580> ---[ end trace 035c7d09ed95fb32 ]--- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
fsck some of the the hard disk has multiply-claimd blocks. And it looks like i need this patch to fix "should not have EOFBLOCKS_FL set" error. http://git390.marist.edu/cgi-bin/gitweb.cgi?p=linux-2.6.git;a=commitdiff;h=58590b06d79f7ce5ab64ff3b6d537180fa50dc84 Inode 50343178 should not have EOFBLOCKS_FL set (size 67108864, lblk 16383) Clear? yes Inode 50345362 should not have EOFBLOCKS_FL set (size 67108864, lblk 16383) Clear? yes Inode 50345386 should not have EOFBLOCKS_FL set (size 63963136, lblk 15615) Clear? yes Inode 50345648 should not have EOFBLOCKS_FL set (size 3145728, lblk 767) Clear? yes Inode 50345690 should not have EOFBLOCKS_FL set (size 67108864, lblk 16383) Clear? yes Inode 50346361, i_blocks is 133136, should be 133256. Fix? yes Running additional passes to resolve blocks claimed by more than one inode... Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 50346361: 226854591 226854592 226854593 226854594 226854595 226854596 226854597 226854598 226854599 226854600 226854601 226854602 226854603 226854604 226854605 226854591 226854592 226854593 226854594 226854595 226854596 226854597 226854598 226854599 226854600 226854601 226854602 226854603 226854604 226854605 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There are 1 inodes containing multiply-claimed blocks.) File /chunks/2410339941482498_637 (inode #50346361, mod time Tue Sep 6 16:25:33 2011) has 30 multiply-claimed block(s), shared with 0 file(s): Clone multiply-claimed blocks? yes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (78, counted=63). Fix? yes Free blocks count wrong (7028646, counted=7028631). Fix? yes ----------------------------------------> From: tinnycloud@hotmail.com > To: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com > CC: jeremy@goop.org; konrad.wilk@oracle.com > Subject: ext4 BUG in dom0 Kernel 2.6.32.36 > Date: Tue, 6 Sep 2011 15:24:14 +0800 > > > > Hi: > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack below) > 32.36 kernel commit: http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ae333e97552c81ab10395ad1ffc6d6daaadb144a > > The bug only show up in our cluster environments which includes 300 physical machines, one server will run into this bug per day. > Running ontop of every server, there are about 30 VMS, each of which has heavy IO workload inside.(we are doing some kinds of stress tests) > > We have our own distribute file system as the storage of cluster, every VM''image file will be spilt into several files with equal size in > physical disk, and every creation of file use ext4 fallocation(fallocation size 1MB). So I believe there will be quite a lot of uninitialized > extent to be initialized during the test. > > After go through the src code. Call routinue is > ext4_da_sritepages->mpage_da_map_blocks->ext4_get_blocks->ext4_ext_get_blocks-> > ext4_ext_handle_uninitialized_extents->ext4_ext_convert_to_initialized->ext4_ext_insert_extent > > > if ext4_ext_handle_uninitialized_extents is called, then line 3306 must be satisfied. > that is we have in_range(iblock, ee_block, ee_len) = true. > so iblock >= ee_block > > fs/ext4/extents.c > 3306 <+++<+++if (in_range(iblock, ee_block, ee_len)) { > 3307 <+++<+++<+++newblock = iblock - ee_block + ee_start; > 3308 <+++<+++<+++/* number of remaining blocks in the extent */ > 3309 <+++<+++<+++allocated = ee_len - (iblock - ee_block); > 3310 <+++<+++<+++ext_debug("%u fit into %u:%d -> %llu\n", iblock, > 3311 <+++<+++<+++<+++<+++ee_block, ee_len, newblock); > 3312 > 3313 <+++<+++<+++/* Do not put uninitialized extent in the cache */ > 3314 <+++<+++<+++if (!ext4_ext_is_uninitialized(ex)) { > 3315 <+++<+++<+++<+++ext4_ext_put_in_cache(inode, ee_block, > 3316 <+++<+++<+++<+++<+++<+++<+++ee_len, ee_start, > 3317 <+++<+++<+++<+++<+++<+++<+++EXT4_EXT_CACHE_EXTENT); > 3318 <+++<+++<+++<+++goto out; > 3319 <+++<+++<+++} > 3320 <+++<+++<+++ret = ext4_ext_handle_uninitialized_extents(handle, > 3321 <+++<+++<+++<+++<+++inode, iblock, max_blocks, path, > 3322 <+++<+++<+++<+++<+++flags, allocated, bh_result, newblock); > 3323 <+++<+++<+++return ret; > 3324 <+++<+++} > > > the newext is from line 2678, its ee_block is iblock + max_blocks > the nearex is path[depth].p_ext(line 1683) > > BUG_ON 1716 means iblock + max_blocks = ee_block. > So maybe that means we have iblock = ee_block and max_blocks = 0. > > > 1716 <+++<+++BUG_ON(newext->ee_block == nearex->ee_block); > 1717 <+++<+++len = (EXT_MAX_EXTENT(eh) - nearex) * sizeof(struct ext4_extent); > 1718 <+++<+++len = len < 0 ? 0 : len; > 1719 <+++<+++ext_debug("insert %d:%llu:[%d]%d before: nearest 0x%p, " > 1720 <+++<+++<+++<+++"move %d from 0x%p to 0x%p\n", > 1721 <+++<+++<+++<+++le32_to_cpu(newext->ee_block), > 1722 <+++<+++<+++<+++ext_pblock(newext), > 1723 <+++<+++<+++<+++ext4_ext_is_uninitialized(newext), > 1724 <+++<+++<+++<+++ext4_ext_get_actual_len(newext), > 1725 <+++<+++<+++<+++nearex, len, nearex + 1, nearex + 2); > 1726 <+++<+++memmove(nearex + 1, nearex, len); > 1727 <+++<+++path[depth].p_ext = nearex; > 1728 <+++} > > > 2678 <+++<+++ex3 = &newex; > 2679 <+++<+++ex3->ee_block = cpu_to_le32(iblock + max_blocks); > 2680 <+++<+++ext4_ext_store_pblock(ex3, newblock + max_blocks); > 2681 <+++<+++ex3->ee_len = cpu_to_le16(allocated - max_blocks); > 2682 <+++<+++ext4_ext_mark_uninitialized(ex3); > 2683 <+++<+++err = ext4_ext_insert_extent(handle, inode, path, ex3, 0); > 2684 <+++<+++if (err == -ENOSPC && may_zeroout) { > 2685 <+++<+++<+++err = ext4_ext_zeroout(inode, &orig_ex); > > > if max_blocks = 0; it means 2225, mpd->b_size >> mpd->inode->i_blkbits is 0. > > fs/ext4/inode.c > 2220 static int mpage_da_map_blocks(struct mpage_da_data *mpd) > 2221 { > 2222 <+++int err, blks, get_blocks_flags; > 2223 <+++struct buffer_head new; > 2224 <+++sector_t next = mpd->b_blocknr; > 2225 <+++unsigned max_blocks = mpd->b_size >> mpd->inode->i_blkbits; > 2226 <+++loff_t disksize = EXT4_I(mpd->inode)->i_disksize; > 2227 <+++handle_t *handle = NULL; > 2228 > > > Could it be possilbe, right now I am tring to reproduce this problem in a much > easiler way, any suggestion? > > Many thanks. > > > ------------[ cut here ]------------ > kernel BUG at fs/ext4/extents.c:1716! > invalid opcode: 0000 [#1] SMP > last sysfs file: /sys/block/tapdevk/stat > CPU 3 > Modules linked in: xt_iprange xt_mac arptable_filter arp_tables xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack > iptable_filter ip_tables bridge autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 8021q garp stp llc xenfs > dm_multipath fuse xen_netback xen_blkback blktap blkback_pagemap loop nbd video output sbs sbshc parport_pc lp parport joydev ses > enclosure snd_seq_dummy snd_seq_oss bnx2 snd_seq_midi_event snd_seq snd_seq_device dcdbas snd_pcm_oss snd_mixer_oss serio_raw snd_pcm > snd_timer snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr shpchp [last unloaded: freq_table] > Pid: 9073, comm: flush-8:16 Not tainted 2.6.32.36xen #1 PowerEdge R710 > RIP: e030:[<ffffffff811a6184>] [<ffffffff811a6184>] ext4_ext_insert_extent+0xac1/0xbe0 > RSP: e02b:ffff8801499cd580 EFLAGS: 00010246 > RAX: 0000000000002948 RBX: 0000000000000000 RCX: ffff8801499cd780 > RDX: ffff8801499cd360 RSI: ffff88007dedb310 RDI: 0000000000000017 > RBP: ffff8801499cd650 R08: ffff8801499cd340 R09: ffff880063488930 > R10: 000000018100f8bf R11: dead000000200200 R12: ffff88005a29700c > R13: ffff88005a297000 R14: ffff8801158198c0 R15: ffff88003e9ea1b0 > FS: 00007fd3cc4bf6e0(0000) GS:ffff88002808f000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000042a09e CR3: 00000000bf3bd000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process flush-8:16 (pid: 9073, threadinfo ffff8801499cc000, task ffff880149ad5b40) > Stack: > ffff8801499cd780 ffff88003e9ea180 ffff8801c5b47300 01ffffff81103c0c > <0> ffff88003e9ea180 000000017dedb2a0 ffff880115819800 ffff88007dedb2a0 > <0> ffff8801499cd5d0 ffffffff811c12ea ffff8801499cd5f0 ffffffff811c16ea > Call Trace: > [<ffffffff811c12ea>] ? jbd_unlock_bh_journal_head+0x16/0x18 > [<ffffffff811c16ea>] ? jbd2_journal_put_journal_head+0x4d/0x52 > [<ffffffff811bb7d6>] ? jbd2_journal_get_write_access+0x31/0x38 > [<ffffffff811a88e9>] ? __ext4_journal_get_write_access+0x4c/0x5f > [<ffffffff811a6ce3>] ext4_ext_handle_uninitialized_extents+0xa40/0xef5 > [<ffffffff8100f175>] ? xen_force_evtchn_callback+0xd/0xf > [<ffffffff8100f8d2>] ? check_events+0x12/0x20 > [<ffffffff81042fcf>] ? need_resched+0x23/0x2d > [<ffffffff811a74e1>] ext4_ext_get_blocks+0x265/0x6eb > [<ffffffff81042fcf>] ? need_resched+0x23/0x2d > [<ffffffff81188b55>] ext4_get_blocks+0x140/0x204 > [<ffffffff81188d2f>] mpage_da_map_blocks+0xb7/0x681 > [<ffffffff810d3b29>] ? find_get_pages_tag+0x48/0xcc > [<ffffffff8100f8d2>] ? check_events+0x12/0x20 > [<ffffffff810da8df>] ? pagevec_lookup_tag+0x27/0x30 > [<ffffffff810d87cc>] ? write_cache_pages+0x175/0x35e > [<ffffffff811893f0>] ? __mpage_da_writepage+0x0/0x164 > [<ffffffff81103c0c>] ? kmem_cache_alloc+0x94/0xf6 > [<ffffffff811bbc40>] ? jbd2_journal_start+0xa1/0xcd > [<ffffffff8119957f>] ? ext4_journal_start_sb+0xdc/0x111 > [<ffffffff81186852>] ? ext4_meta_trans_blocks+0x74/0xce > [<ffffffff8118bc42>] ext4_da_writepages+0x47a/0x6a7 > [<ffffffff810d8a00>] do_writepages+0x21/0x2a > [<ffffffff8112cdb8>] writeback_single_inode+0xc8/0x1e3 > [<ffffffff8112d5e4>] writeback_inodes_wb+0x30b/0x37e > [<ffffffff8102f82d>] ? paravirt_end_context_switch+0x17/0x31 > [<ffffffff8100b459>] ? xen_end_context_switch+0x1e/0x22 > [<ffffffff8112d788>] wb_writeback+0x131/0x1bb > [<ffffffff81064029>] ? try_to_del_timer_sync+0x73/0x81 > [<ffffffff8112d9ef>] wb_do_writeback+0x13c/0x153 > [<ffffffff8106425b>] ? process_timeout+0x0/0x10 > [<ffffffff810e78d1>] ? bdi_start_fn+0x0/0xd0 > [<ffffffff8112da32>] bdi_writeback_task+0x2c/0xb3 > [<ffffffff810e793b>] bdi_start_fn+0x6a/0xd0 > [<ffffffff810754b7>] kthread+0x6e/0x76 > [<ffffffff81013daa>] child_rip+0xa/0x20 > [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b > [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6 > [<ffffffff81013da0>] ? child_rip+0x0/0x20 > Code: 8d 04 85 f4 ff ff ff 85 c0 0f 49 d8 48 63 d3 e8 47 c7 07 00 49 8d 44 24 0c 49 89 47 10 eb 3a bb f4 ff ff ff e9 c2 00 00 00 75 04 > <0f> 0b eb fe 41 0f b7 45 04 49 8d 7c 24 0c 48 6b c0 0c 4c 89 e6 > RIP [<ffffffff811a6184>] ext4_ext_insert_extent+0xac1/0xbe0 > RSP <ffff8801499cd580> > ---[ end trace 035c7d09ed95fb32 ]--- > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-06 14:53 UTC
[Xen-devel] Re: ext4 BUG in dom0 Kernel 2.6.32.36
On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:> > > Hi: > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack below)Did you try the 3.0 kernel? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Date: Tue, 6 Sep 2011 10:53:47 -0400 > From: konrad.wilk@oracle.com > To: tinnycloud@hotmail.com > CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; jeremy@goop.org > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36 > > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote: > > > > > > Hi: > > > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack below) > > Did you try the 3.0 kernel?No, I am afried the change would be to much for our current env. May result in other stable issue. So, I want to dig out what really happen. Hopes. Thanks. --_88720937-3531-4584-a055-432612a6bc85_ Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: 8bit <html> <head> <style><!-- .hmmessage P { margin:0px; padding:0px } body.hmmessage { font-size: 10pt; font-family:微软雅黑 } --></style> </head> <body class=''hmmessage''><div dir=''ltr''> <BR> <DIV> > Date: Tue, 6 Sep 2011 10:53:47 -0400<BR>> From: konrad.wilk@oracle.com<BR>> To: tinnycloud@hotmail.com<BR>> CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; jeremy@goop.org<BR>> Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36<BR>> <BR>> On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:<BR>> > <BR>> > <BR>> > Hi:<BR>> > <BR>> > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack below)<BR>> <BR>> Did you try the 3.0 kernel?<BR>No, I am afried the change would be to much for our current env.</DIV> <DIV>May result in other stable issue.</DIV> <DIV>So, I want to dig out what really happen. Hopes.</DIV> <DIV> </DIV> <DIV>Thanks.</DIV> </div></body> </html> --_88720937-3531-4584-a055-432612a6bc85_-- --===============1819275364=Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============1819275364==--
Jeremy Fitzhardinge
2011-Sep-06 18:55 UTC
Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36
On 09/06/2011 08:11 AM, MaoXiaoyun wrote:> > > Date: Tue, 6 Sep 2011 10:53:47 -0400 > > From: konrad.wilk@oracle.com > > To: tinnycloud@hotmail.com > > CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; > jeremy@goop.org > > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36 > > > > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote: > > > > > > > > > Hi: > > > > > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack > below) > > > > Did you try the 3.0 kernel? > No, I am afried the change would be to much for our current env. > May result in other stable issue. > So, I want to dig out what really happen. Hopes.Another question is whether this is a regression compared to earlier versions of 2.6.32? Do you know if this problem exists in a non-Xen environment? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
----------------------------------------> Date: Tue, 6 Sep 2011 11:55:02 -0700 > From: jeremy@goop.org > To: tinnycloud@hotmail.com > CC: konrad.wilk@oracle.com; linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36 > > On 09/06/2011 08:11 AM, MaoXiaoyun wrote: > > > > > Date: Tue, 6 Sep 2011 10:53:47 -0400 > > > From: konrad.wilk@oracle.com > > > To: tinnycloud@hotmail.com > > > CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; > > jeremy@goop.org > > > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36 > > > > > > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote: > > > > > > > > > > > > Hi: > > > > > > > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack > > below) > > > > > > Did you try the 3.0 kernel? > > No, I am afried the change would be to much for our current env. > > May result in other stable issue. > > So, I want to dig out what really happen. Hopes. > > Another question is whether this is a regression compared to earlier > versions of 2.6.32? Do you know if this problem exists in a non-Xen > environment? >There are some others reports this issue in non-xen env. http://markmail.org/message/ywr4nfgiuvgdcr7y http://www.spinics.net/lists/linux-ext4/msg21066.html The difficulty is I haven''t find a efficient way to reproduce it. (Currently it only show in our cluster, redeploy our cluster may cost 3days more. )> Thanks, > J > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: I finally captured extents overlaped in the ext4. But still wondering how it happen. I checked overlap for the last extent in the tree at the very beginning of ext4_ext_convert_to_initialized. Messages.12 attached show the overlap found. Line 8-10: 3467:[1]15:57921642 3468:[0]14:57921643 has overlaped. 8 Sep 15 08:27:39 xmao kernel: 3331:[0]7:53750025 3338:[0]8:53750033 3346:[0]1:53848953 3347:[0]7:53848955 3354:[0]1:53848969 3355:[0]7:53848971 3362:[0]1:53848985 3363:[0]7:56996848 3370:[0]1:57606144 3371:[0]7:57795290 3378:[0]1:57814407 3379:[0]7:57858606 3386:[0]8:57858620 3394:[0]1:57858629 3395:[0]8:57858637 3403:[0]7:57858646 3410:[0]1:57858661 3411:[0]8:57858669 3419:[0]7:57858678 3426:[0]8:57858692 3434:[0]1:57858701 3435:[0]7:57858709 3442:[0]1:57858717 3443:[0]7:57858725 3450:[0]1:57858733 3451:[0]7:57858741 3458:[0]1:57858749 3459:[0]7:57858757 3466:[0]1:57921634 3467:[1]15:57921642 9 Sep 15 08:27:39 xmao kernel: Displaying leaf extents for inode 12339004 10 Sep 15 08:27:39 xmao kernel: 3468:[0]14:57921643 3482:[0]1:57921664 3483:[0]7:57921666 3490:[0]1:57921680 3491:[0]8:57921682 3499:[0]7:57921691 3506:[0]8:57921705 3514:[0]1:57921714 3515:[0]7:57921722 3522:[0]41:57916683 3563:[0]7:58159767 3570:[0]1:58159781 3571:[0]7:58238992 3578:[0]1:58288144 3579:[0]7:58327750 3586:[0]1:58579969 3587:[0]7:58954838 3594:[0]1:59006641 3595:[0]7:59006643 3602:[0]1:59006657 3603:[0]7:59006659 3610:[0]8:59006673 3618:[0]8:59006688 3626:[0]470:58982658 4096:[0]3:58987732 4099:[0]1:58992655 4100:[0]7:59143253 4107:[0]1:59171840 4108:[0]7:59183878 4115:[0]1:59192886 4116:[0]8:59593463 4124:[0]8:59669484 4132:[0]7:73086538 4139:[0]1:73352801 4140:[0]7:73339273 4147:[0]1:73526280 4148:[0]8:78229012 4156:[0]1:78229021 4157:[0]7:78818388 4164:[0]1:79069383 4165:[0]7:79428616 4172:[0]1:80490925 4173:[0]7:81439488 4180:[0]1:82854062 4181:[0]7:83462272 4188:[0]1:83656904 4189:[0]7:89127381 4196:[0]1:89584313 4197:[0]8:91592930 4205:[0]7:91592945 4212:[0]1:91592953 4213:[0]7:91592961 422 I also dumped file in disk use filefrag which show no overlap, no extent 3468:[0]14:57921643. ext logical physical expected length flags .... 337 3459 57858757 57858749 7 338 3466 57921634 57858763 1 unwritten 339 3467 57921642 57921634 15 unwritten 340 3482 57921664 57921656 1 341 3483 57921666 57921664 7 ..... There is one assumption, After 3468:[0]14:57921643 successfully inserted, there is something err happen. At the bottom of ext4_ext_convert_to_initialized, fix_extent_len will fix the origin ex ee_len.(Later I will do the err check) 3403 fix_extent_len: 3404 ex->ee_block = orig_ex.ee_block; 3405 ex->ee_len = orig_ex.ee_len; 3406 ext4_ext_store_pblock(ex, ext_pblock(&orig_ex)); 3407 ext4_ext_mark_uninitialized(ex); 3408 ext4_ext_dirty(handle, inode, path + depth); Any comments? Well, but something strange messages.12. message.12 is from another machine, it log is printf right before BUG_ON(newext->ee_block == nearex->ee_block); strange is 14412:[1]16:9927''s pblock is much different from 14411:[0]1:222332613. 1993 if(newext->ee_block == nearex->ee_block){ 1994 len = (EXT_MAX_EXTENT(eh) - nearex) * sizeof(struct ext4_extent); 1995 len = len < 0 ? 0 : len; 1996 printk("old_depth %d depth %d old_path %p path %p next_has_free %d next %llu\n", 1997 old_depth, depth, old_path, path, next_has_free, (unsigned long long)next); 2004 2005 printk("insert %d:%llu:[%d]%d before: nearest 0x%p, " 2006 "move %d from 0x%p to 0x%p\n", 2007 le32_to_cpu(newext->ee_block), 2008 ext_pblock(newext), 2009 ext4_ext_is_uninitialized(newext), 2010 ext4_ext_get_actual_len(newext), 2011 nearex, len, nearex + 1, nearex + 2); 2012 ext4_ext_show_leaf_xmao(inode, old_path); 2013 ext4_ext_show_leaf_xmao(inode, path); 2014 }; 2015 BUG_ON(newext->ee_block == nearex->ee_block); Sep 13 16:16:35 xmao kernel: 57:[0]31:157254721 12288:[0]54:157503830 12342:[0]10:157503884 12352:[0]5:157534763 12357:[0]1:157534768 12358:[0]58:157534769 12416:[0]64:157567168 12480:[0]13:158051261 12493:[0]73:172263095 12566:[0]24:172265399 12590:[0]71:172521859 12661:[0]71:172627897 12732:[0]71:172733735 12803:[0]69:172722619 12872:[0]9:172764859 12881:[0]42:110500028 12923:[0]86:143030061 13009:[0]86:143119859 13095:[0]48:143173376 13143:[0]16:195333586 13159:[0]32:197526105 13191:[0]40:198875861 13231:[0]39:198872300 13270:[0]5:199663576 13275:[0]26:200964192 13301:[0]36:202015708 13337:[0]47:202221682 13384:[0]9:202221729 13393:[0]58:202624966 13451:[0]12:202606535 13463:[0]35:212117725 13498:[0]35:212135811 13533:[0]34:212115513 13567:[0]32:212108608 13599:[0]29:212144185 13628:[0]50:231280420 13678:[0]38:231645389 13716:[0]13:231645427 13729:[0]51:231650765 13780:[0]50:231647658 13830:[0]54:231985340 13884:[0]24:231981259 13908:[0]64:105098731 13972:[0]87:136696745 14059:[0]45:136700237 14104:[0]61:2 Sep 13 16:16:35 xmao kernel: 3651 14165:[0]69:222042299 14234:[0]68:222044092 14302:[0]34:222091761 14336:[0]68:222172860 14404:[0]7:222332606 14411:[0]1:222332613 Sep 13 16:16:35 xmao kernel: Displaying leaf extents for inode 30685060 Sep 13 16:16:35 xmao kernel: 14412:[1]16:9927 14428:[1]41:13213 14469:[1]1:13254 14470:[0]67:222673085 Also, filefrag show extents is ok. 336 14302 222091761 222044159 34 337 14336 222172860 222091794 68 338 14404 222332606 222172927 7 339 14411 222332613 59 unwritten 340 14470 222673085 222332671 67 341 14537 222848155 222673151 43 342 14580 165617358 222848197 56 343 14636 165777353 165617413 55 344 14691 165961927 165777407 57 seems 14412:[1]16:9927 14428:[1]41:13213 14469:[1]1:13254 is unexpected. Many thanks. ----------------------------------------> From: tinnycloud@hotmail.com > To: jeremy@goop.org > CC: konrad.wilk@oracle.com; linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36 > Date: Wed, 7 Sep 2011 10:35:21 +0800 > > > > > ---------------------------------------- > > Date: Tue, 6 Sep 2011 11:55:02 -0700 > > From: jeremy@goop.org > > To: tinnycloud@hotmail.com > > CC: konrad.wilk@oracle.com; linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com > > Subject: Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36 > > > > On 09/06/2011 08:11 AM, MaoXiaoyun wrote: > > > > > > > Date: Tue, 6 Sep 2011 10:53:47 -0400 > > > > From: konrad.wilk@oracle.com > > > > To: tinnycloud@hotmail.com > > > > CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; > > > jeremy@goop.org > > > > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36 > > > > > > > > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote: > > > > > > > > > > > > > > > Hi: > > > > > > > > > > I''ve met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack > > > below) > > > > > > > > Did you try the 3.0 kernel? > > > No, I am afried the change would be to much for our current env. > > > May result in other stable issue. > > > So, I want to dig out what really happen. Hopes. > > > > Another question is whether this is a regression compared to earlier > > versions of 2.6.32? Do you know if this problem exists in a non-Xen > > environment? > > > > There are some others reports this issue in non-xen env. > http://markmail.org/message/ywr4nfgiuvgdcr7y > http://www.spinics.net/lists/linux-ext4/msg21066.html > > The difficulty is I haven''t find a efficient way to reproduce it. > (Currently it only show in our cluster, redeploy our cluster may cost 3days more. ) > > > > Thanks, > > J > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-Sep-25 08:45 UTC
[Xen-devel] [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
Hi: We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush thread, and result in disk unvailiable. BUG details refer to: http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded Attached is the fix, verified in our env. Without this patch, more than 3 servers hit BUG_ON in our hundreds of servers every day. many thanks. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-26 14:28 UTC
[Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
On Sun, Sep 25, 2011 at 04:45:39PM +0800, MaoXiaoyun wrote:> > > Hi: > > We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush thread, and result in disk unvailiable. > > BUG details refer to: http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded > > Attached is the fix, verified in our env.So.. you are asking for this upstream git commit to be back-ported to 2.6.32, right?> > Without this patch, more than 3 servers hit BUG_ON in our hundreds of servers every day. > > > many thanks._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-Sep-27 02:22 UTC
[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
----------------------------------------> Date: Mon, 26 Sep 2011 10:28:08 -0400 > From: konrad.wilk@oracle.com > To: tinnycloud@hotmail.com > CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; tytso@mit.edu; jack@suse.cz > Subject: Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch > > On Sun, Sep 25, 2011 at 04:45:39PM +0800, MaoXiaoyun wrote: > > > > > > Hi: > > > > We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush thread, and result in disk unvailiable. > > > > BUG details refer to: http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded > > > > Attached is the fix, verified in our env. > > So.. you are asking for this upstream git commit to be back-ported to 2.6.32, right? >The patch is for 2.6.39. It can be patched on 2.6.32 too. Thanks.> > > > Without this patch, more than 3 servers hit BUG_ON in our hundreds of servers every day. > > > > > > many thanks. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Sep-27 09:09 UTC
[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
>>> On 27.09.11 at 04:22, MaoXiaoyun <tinnycloud@hotmail.com> wrote:> > > ---------------------------------------- >> Date: Mon, 26 Sep 2011 10:28:08 -0400 >> From: konrad.wilk@oracle.com >> To: tinnycloud@hotmail.com >> CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; tytso@mit.edu; > jack@suse.cz >> Subject: Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch >> >> On Sun, Sep 25, 2011 at 04:45:39PM +0800, MaoXiaoyun wrote: >> > >> > >> > Hi: >> > >> > We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush thread, > and result in disk unvailiable. >> > >> > BUG details refer to: > http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded >> > >> > Attached is the fix, verified in our env. >> >> So.. you are asking for this upstream git commit to be back-ported to 2.6.32, > right? >> > > The patch is for 2.6.39. It can be patched on 2.6.32 too. > Thanks.So why don''t you suggest applying this to the stable tree maintainers instead? xen-devel really isn''t the right forum for this sort of bug fixes, particularly when the underlying kernel.org tree is still being maintained. Jan>> > >> > Without this patch, more than 3 servers hit BUG_ON in our hundreds of > servers every day. >> > >> > >> > many thanks. >> >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tao Ma
2011-Sep-27 09:54 UTC
Re: [Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
On 09/27/2011 05:09 PM, Jan Beulich wrote:>>>> On 27.09.11 at 04:22, MaoXiaoyun <tinnycloud@hotmail.com> wrote: > >> >> >> ---------------------------------------- >>> Date: Mon, 26 Sep 2011 10:28:08 -0400 >>> From: konrad.wilk@oracle.com >>> To: tinnycloud@hotmail.com >>> CC: linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; tytso@mit.edu; >> jack@suse.cz >>> Subject: Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch >>> >>> On Sun, Sep 25, 2011 at 04:45:39PM +0800, MaoXiaoyun wrote: >>>> >>>> >>>> Hi: >>>> >>>> We met an ext4 BUG_ON in extents.c:1716 which crash kernel flush thread, >> and result in disk unvailiable. >>>> >>>> BUG details refer to: >> http://www.gossamer-threads.com/lists/xen/devel/217091?do=post_view_threaded >>>> >>>> Attached is the fix, verified in our env. >>> >>> So.. you are asking for this upstream git commit to be back-ported to 2.6.32, >> right? >>> >> >> The patch is for 2.6.39. It can be patched on 2.6.32 too. >> Thanks. > > So why don''t you suggest applying this to the stable tree maintainers > instead? xen-devel really isn''t the right forum for this sort of bug fixes, > particularly when the underlying kernel.org tree is still being maintained.AFAIK, the upstream linux kernel doesn''t have this problem because this part of codes have been refactored. So I am not sure whether Greg KH will accept it or not. btw, I don''t think the fix is appropriate. One of my colleague is working out another patch to resolve this(I will ask him to post the patch when it is ready). And we will contact Redhat for considering merging it to the enterprise kernel. Thanks Tao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ted Ts''o
2011-Sep-27 19:35 UTC
[Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote:> > > > Attached is the fix, verified in our env. > > So.. you are asking for this upstream git commit to be back-ported > to 2.6.32, right?I''m curious --- is there a good reason why Xen users are using an upstream 2.6.32 kernel? If they are using a distro kernel, fine, but then the distro kernel should be providing the support. But at this point, 2.6.32 is so positively *ancient* that, I''m personally not interesting in providing free, unpaid distro support for users who aren''t willing to either (a) pay $$$ and get a supported distro kernel, or (b) use a much more modern kernel. At this point, Guest and Host Xen support is available in 3.0 kernels, so there''s really no excuse, right? - Ted _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olivier B.
2011-Sep-27 23:41 UTC
Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
On 27/09/2011 21:35, Ted Ts''o wrote:> On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote: >>> >>> Attached is the fix, verified in our env. >> >> So.. you are asking for this upstream git commit to be back-ported >> to 2.6.32, right? > > I''m curious --- is there a good reason why Xen users are using an > upstream 2.6.32 kernel? If they are using a distro kernel, fine, but > then the distro kernel should be providing the support. But at this > point, 2.6.32 is so positively *ancient* that, I''m personally not > interesting in providing free, unpaid distro support for users who > aren''t willing to either (a) pay $$$ and get a supported distro > kernel, or (b) use a much more modern kernel. At this point, Guest > and Host Xen support is available in 3.0 kernels, so there''s really no > excuse, right? > > - Ted >In my case, for Dom0 I use the 2.6.32 kernel from my distrib, because it''s stable. I have stability problems with the kernel 3.0 on Dom0, and as I haven''t physical access neither kvm or serial port, I don''t know what to report... It just hang randomly, without any line in logs. Does the Xen support on 3.0 kernels should be considered stable ? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-Sep-28 04:09 UTC
[Xen-devel] RE: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
----------------------------------------> Date: Tue, 27 Sep 2011 15:35:23 -0400 > From: tytso@mit.edu > To: konrad.wilk@oracle.com > CC: tinnycloud@hotmail.com; linux-ext4@vger.kernel.org; xen-devel@lists.xensource.com; jack@suse.cz > Subject: Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch > > On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote: > > > > > > Attached is the fix, verified in our env. > > > > So.. you are asking for this upstream git commit to be back-ported > > to 2.6.32, right? > > I''m curious --- is there a good reason why Xen users are using an > upstream 2.6.32 kernel? If they are using a distro kernel, fine, but > then the distro kernel should be providing the support. But at this > point, 2.6.32 is so positively *ancient* that, I''m personally not > interesting in providing free, unpaid distro support for users who > aren''t willing to either (a) pay $$$ and get a supported distro > kernel, or (b) use a much more modern kernel. At this point, Guest > and Host Xen support is available in 3.0 kernels, so there''s really no > excuse, right?Mmm... We first met this bug at pvops kernel(jeremy''s tree, 2.6.32.36). We failed to find any related fix from google, so we debug the bug ourself. Fortunately, we located root cause and thought some other xen users might have this problem as well, that''s why we sent out the fix to Xen-devel. We go through the code from 2.6.32 - 2.6.39, this bug exists. People who use *ancient* kernel need this. Thanks.> - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-28 12:47 UTC
Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
On Wed, Sep 28, 2011 at 01:41:30AM +0200, Olivier B. wrote:> On 27/09/2011 21:35, Ted Ts''o wrote: > >On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote: > >>> > >>> Attached is the fix, verified in our env. > >> > >>So.. you are asking for this upstream git commit to be back-ported > >>to 2.6.32, right? > > > >I''m curious --- is there a good reason why Xen users are using an > >upstream 2.6.32 kernel? If they are using a distro kernel, fine, but > >then the distro kernel should be providing the support. But at this > >point, 2.6.32 is so positively *ancient* that, I''m personally not > >interesting in providing free, unpaid distro support for users who > >aren''t willing to either (a) pay $$$ and get a supported distro > >kernel, or (b) use a much more modern kernel. At this point, Guest > >and Host Xen support is available in 3.0 kernels, so there''s really no > >excuse, right? > > > > - Ted > > > > In my case, for Dom0 I use the 2.6.32 kernel from my distrib, because it''s stable. > I have stability problems with the kernel 3.0 on Dom0, and as I haven''t physical access neither kvm or serial port, I don''t know what to report... It just hang randomly, without any line in logs.No physical access? No IPMI? No SOL?> > Does the Xen support on 3.0 kernels should be considered stable ?Yes - it should be considered stable (3.0.4 at least, with one particular patch that is going in 3.0.5: https://lkml.org/lkml/2011/9/2/114). Please help me find whatever is causing your crash. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Sep-28 18:41 UTC
Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
On 09/27/2011 12:35 PM, Ted Ts''o wrote:> On Mon, Sep 26, 2011 at 10:28:08AM -0400, Konrad Rzeszutek Wilk wrote: >>> >>> Attached is the fix, verified in our env. >> So.. you are asking for this upstream git commit to be back-ported >> to 2.6.32, right? > I''m curious --- is there a good reason why Xen users are using an > upstream 2.6.32 kernel? If they are using a distro kernel, fine, but > then the distro kernel should be providing the support. But at this > point, 2.6.32 is so positively *ancient* that, I''m personally not > interesting in providing free, unpaid distro support for users who > aren''t willing to either (a) pay $$$ and get a supported distro > kernel, or (b) use a much more modern kernel. At this point, Guest > and Host Xen support is available in 3.0 kernels, so there''s really no > excuse, right?The 2.6.32.x-based kernel has been the preferred "stable" kernel for Xen users for a while, and it is still considered to be more stable and functional than what''s upstream (obviously we''re trying to fix that). Also, because many current distros don''t support Xen dom0, it has been an ad-hoc distro kernel. Since kernel.org 2.6.32 is still considered to be a maintained long-term-stable kernel, I keep the xen.git version up-to-date with stable-2.6.32 bugfixes and occasional separate Xen-specific fixes. But I''d really prefer to avoid having any non-Xen private changes in that tree, in favour of getting everything from upstream stable. Do you not consider it worth continuing support of the 2.6.32 stable tree with respect to ext4? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ted Ts''o
2011-Sep-28 19:46 UTC
Re: [Xen-devel] Re: [patch 1/1] ext4-fix-dirty-extent-when-origin-leaf-extent-reac.patch
On Wed, Sep 28, 2011 at 11:41:11AM -0700, Jeremy Fitzhardinge wrote:> Since kernel.org 2.6.32 is still considered to be a maintained > long-term-stable kernel, I keep the xen.git version up-to-date with > stable-2.6.32 bugfixes and occasional separate Xen-specific fixes. But > I''d really prefer to avoid having any non-Xen private changes in that > tree, in favour of getting everything from upstream stable. > > Do you not consider it worth continuing support of the 2.6.32 stable > tree with respect to ext4?I just don''t have the *time* to maintain backports of ext4 fixes to 2.6.32. There have been so many bug fixes to ext4, and some of them depend on changes in the quota subsystem, so trying to back port them all would be hellish, and not something I''m willing to do on a volunteer basis. I''m busy enough with silly things like trying to help with the kernel.org getting back on-line, that channelling my stay-really-late hours to support users who are too cheap to pay distro support fees is not really a way that I would choose to spend my personal time. If someone would like to volunteer to be unpaid distro support, that''s great. It''s worth it as long as I get to volunteer somebody else''s time. :-) - Ted _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel