Fengguang Wu
2013-Oct-10 03:33 UTC
[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003
On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu wrote:> Dave, > > > I note that you have CONFIG_SLUB=y, which means that the cache slabs > > are shared with objects of other types. That means that the memory > > corruption problem is likely to be caused by one of the other > > filesystems that is probing the block device(s), not XFS. > > Good to know that, it would easy to test then: just turn off every > other filesystems. I'll try it right away.Seems that we don't even need to do that. A dig through the oops database and I find stack dumps from other FS. This happens in the kernel with same kconfig and commit 3.12-rc1. [ 51.205369] block nbd1: Attempted send on closed socket [ 51.214126] BUG: unable to handle kernel NULL pointer dereference at 00000004 [ 51.215640] IP: [<c10343fb>] pool_mayday_timeout+0x5f/0x9c [ 51.216262] *pdpt = 000000000ca90001 *pde = 0000000000000000 [ 51.216262] Oops: 0000 [#1] [ 51.216262] CPU: 0 PID: 644 Comm: mount Not tainted 3.12.0-rc1 #2 [ 51.216262] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 51.216262] task: ccffd7a0 ti: cca54000 task.ti: cca54000 [ 51.216262] EIP: 0060:[<c10343fb>] EFLAGS: 00000046 CPU: 0 [ 51.216262] EIP is at pool_mayday_timeout+0x5f/0x9c [ 51.216262] EAX: 00000000 EBX: c1a81d50 ECX: 00000000 EDX: 00000000 [ 51.216262] ESI: cd0d303c EDI: cfff7054 EBP: cca55d2c ESP: cca55d18 [ 51.216262] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 51.216262] CR0: 8005003b CR2: 00000004 CR3: 0ca0b000 CR4: 000006b0 [ 51.216262] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 51.216262] DR6: 00000000 DR7: 00000000 [ 51.216262] Stack: [ 51.216262] c1a81d60 cd0d303c 00000100 c103439c cca55d58 cca55d3c c102cd96 c1ba4700 [ 51.216262] cca55d58 cca55d6c c102cf7e c1a81d50 c1ba5110 c1ba4f10 cca55d58 c103439c [ 51.216262] cca55d58 cca55d58 00000001 c1ba4588 00000100 cca55d90 c1028f61 00000001 [ 51.216262] Call Trace: [ 51.216262] [<c103439c>] ? need_to_create_worker+0x32/0x32 [ 51.216262] [<c102cd96>] call_timer_fn.isra.39+0x16/0x60 [ 51.216262] [<c102cf7e>] run_timer_softirq+0x144/0x15e [ 51.216262] [<c103439c>] ? need_to_create_worker+0x32/0x32 [ 51.216262] [<c1028f61>] __do_softirq+0x87/0x12b [ 51.216262] [<c10290c4>] irq_exit+0x3a/0x48 [ 51.216262] [<c1002918>] do_IRQ+0x64/0x77 [ 51.216262] [<c175fbac>] common_interrupt+0x2c/0x31 [ 51.216262] [<c12188ee>] ? ocfs2_get_sector+0x14/0x1cd [ 51.216262] [<c1218b72>] ocfs2_sb_probe+0xcb/0x7ca [ 51.216262] [<c107bb1c>] ? bdi_lock_two+0x8/0x14 [ 51.216262] [<c12cfc11>] ? string.isra.4+0x26/0x89 [ 51.216262] [<c121a7ba>] ocfs2_fill_super+0x39/0xe84 [ 51.216262] [<c12d1000>] ? pointer.isra.15+0x23f/0x25b [ 51.216262] [<c12c3660>] ? disk_name+0x20/0x65 [ 51.216262] [<c109d8f6>] mount_bdev+0x105/0x14d [ 51.216262] [<c1092aaa>] ? slab_pre_alloc_hook.isra.66+0x1e/0x25 [ 51.216262] [<c1095353>] ? __kmalloc_track_caller+0xb8/0xe4 [ 51.216262] [<c10ae5da>] ? alloc_vfsmnt+0xdc/0xff [ 51.216262] [<c1217173>] ocfs2_mount+0x10/0x12 [ 51.216262] [<c121a781>] ? ocfs2_handle_error+0xa2/0xa2 [ 51.216262] [<c109dad1>] mount_fs+0x55/0x123 [ 51.216262] [<c10aef24>] vfs_kern_mount+0x44/0xac [ 51.216262] [<c10b030a>] do_mount+0x647/0x768 [ 51.216262] [<c107b043>] ? strndup_user+0x2c/0x3d [ 51.216262] [<c10b049c>] SyS_mount+0x71/0xa0 [ 51.216262] [<c175f074>] syscall_call+0x7/0xb [ 51.216262] Code: 43 44 e8 7a 8c ff ff 58 5a 5b 5e 5f 5d c3 8b 43 10 8d 78 fc 8d 43 10 89 45 ec 8d 47 04 3b 45 ec 74 ca 89 f8 e8 44 f0 ff ff 89 c1 <8b> 50 04 83 7a 44 00 74 2c 8b 40 68 8d 71 68 39 f0 75 22 8b 72 [ 51.216262] EIP: [<c10343fb>] pool_mayday_timeout+0x5f/0x9c SS:ESP 0068:cca55d18 [ 51.216262] CR2: 0000000000000004 [ 51.216262] ---[ end trace 267272283b2d7610 ]--- [ 51.216262] Kernel panic - not syncing: Fatal exception in interrupt [ 3.244964] block nbd1: Attempted send on closed socket [ 3.246243] block nbd1: Attempted send on closed socket [ 3.247508] (mount,661,0):ocfs2_get_sector:1861 ERROR: status = -5 [ 3.248906] (mount,661,0):ocfs2_sb_probe:770 ERROR: status = -5 [ 3.250269] (mount,661,0):ocfs2_fill_super:1038 ERROR: superblock probe failed! [ 3.252100] (mount,661,0):ocfs2_fill_super:1229 ERROR: status = -5 [ 3.253569] BUG: unable to handle kernel NULL pointer dereference at 00000004 [ 3.255322] IP: [<c1034850>] process_one_work+0x1a/0x1cc [ 3.256681] *pdpt = 000000000c950001 *pde = 0000000000000000 [ 3.256833] Oops: 0000 [#1] [ 3.256833] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 3.12.0-rc1 #2 [ 3.256833] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 3.256833] task: cec44d80 ti: cec54000 task.ti: cec54000 [ 3.256833] EIP: 0060:[<c1034850>] EFLAGS: 00010046 CPU: 0 [ 3.256833] EIP is at process_one_work+0x1a/0x1cc [ 3.256833] EAX: 00000000 EBX: cec1b900 ECX: ccdf0700 EDX: ccdf0700 [ 3.256833] ESI: ccdf0754 EDI: c1a81d50 EBP: cec55f44 ESP: cec55f2c [ 3.256833] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 3.256833] CR0: 8005003b CR2: 0000005c CR3: 0cfc5000 CR4: 000006b0 [ 3.256833] Stack: [ 3.256833] c1a81d50 00000000 c10345b0 cec1b900 cec1b918 cec1b918 cec55f54 c1034a1d [ 3.256833] cec1b900 c1a81d50 cec55f70 c1034d3b cec44d80 c1a81d60 cec47eac cec1b900 [ 3.256833] c1034c02 cec55fac c10388f7 cec55f94 00000000 00000000 cec1b900 00000000 [ 3.256833] Call Trace: [ 3.256833] [<c10345b0>] ? manage_workers.isra.33+0x178/0x182 [ 3.256833] [<c1034a1d>] process_scheduled_works+0x1b/0x21 [ 3.256833] [<c1034d3b>] worker_thread+0x139/0x1bd [ 3.256833] [<c1034c02>] ? rescuer_thread+0x1df/0x1df [ 3.256833] [<c10388f7>] kthread+0x6d/0x72 [ 3.256833] [<c175f637>] ret_from_kernel_thread+0x1b/0x28 [ 3.256833] [<c103888a>] ? init_completion+0x1d/0x1d [ 3.256833] Code: 83 f8 10 74 04 f3 90 b2 f5 89 d0 59 5b 5e 5f 5d c3 55 89 e5 57 56 53 83 ec 0c 89 c3 89 d6 89 d0 e8 f3 eb ff ff 89 45 ec 8b 7b 24 <8b> 40 04 8b 80 80 00 00 00 c1 e8 05 83 e0 01 88 45 e8 f6 43 2c [ 3.256833] EIP: [<c1034850>] process_one_work+0x1a/0x1cc SS:ESP 0068:cec55f2c [ 3.256833] CR2: 0000000000000004 [ 3.256833] ---[ end trace a45beaff7f786118 ]--- [ 3.256833] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 [ 3.256833] in_atomic(): 1, irqs_disabled(): 1, pid: 5, name: kworker/0:0H
Fengguang Wu
2013-Oct-10 03:38 UTC
[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003
On Thu, Oct 10, 2013 at 11:33:00AM +0800, Fengguang Wu wrote:> On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu wrote: > > Dave, > > > > > I note that you have CONFIG_SLUB=y, which means that the cache slabs > > > are shared with objects of other types. That means that the memory > > > corruption problem is likely to be caused by one of the other > > > filesystems that is probing the block device(s), not XFS. > > > > Good to know that, it would easy to test then: just turn off every > > other filesystems. I'll try it right away. > > Seems that we don't even need to do that. A dig through the oops > database and I find stack dumps from other FS. > > This happens in the kernel with same kconfig and commit 3.12-rc1.Here is a summary of all FS with oops: 411 ocfs2_fill_super 189 xfs_fs_fill_super 86 jfs_fill_super 50 isofs_fill_super 33 fat_fill_super 18 vfat_fill_super 15 msdos_fill_super 11 ext2_fill_super 10 ext3_fill_super 3 reiserfs_fill_super Thanks, Fengguang
Dave Chinner
2013-Oct-10 04:28 UTC
[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003
On Thu, Oct 10, 2013 at 11:38:34AM +0800, Fengguang Wu wrote:> On Thu, Oct 10, 2013 at 11:33:00AM +0800, Fengguang Wu wrote: > > On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu wrote: > > > Dave, > > > > > > > I note that you have CONFIG_SLUB=y, which means that the cache slabs > > > > are shared with objects of other types. That means that the memory > > > > corruption problem is likely to be caused by one of the other > > > > filesystems that is probing the block device(s), not XFS. > > > > > > Good to know that, it would easy to test then: just turn off every > > > other filesystems. I'll try it right away. > > > > Seems that we don't even need to do that. A dig through the oops > > database and I find stack dumps from other FS. > > > > This happens in the kernel with same kconfig and commit 3.12-rc1. > > Here is a summary of all FS with oops: > > 411 ocfs2_fill_super > 189 xfs_fs_fill_super > 86 jfs_fill_super > 50 isofs_fill_super > 33 fat_fill_super > 18 vfat_fill_super > 15 msdos_fill_super > 11 ext2_fill_super > 10 ext3_fill_super > 3 reiserfs_fill_superThe order of probing on the original dmesg output you reported is: ext3 ext2 fatfs reiserfs gfs2 isofs ocfs2 which means that no XFS filesystem was mounted in the original bug report, and hence that further indicates that XFS is not responsible for the problem and that perhaps the original bisect was not reliable... Cheers, Dave. -- Dave Chinner david at fromorbit.com