thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003 [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Fengguang Wu

2013-Oct-10 03:33 UTC

[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003

On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu
wrote:> Dave,
> 
> > I note that you have CONFIG_SLUB=y, which means that the cache slabs
> > are shared with objects of other types. That means that the memory
> > corruption problem is likely to be caused by one of the other
> > filesystems that is probing the block device(s), not XFS.
> 
> Good to know that, it would easy to test then: just turn off every
> other filesystems. I'll try it right away.
Seems that we don't even need to do that. A dig through the oops
database and I find stack dumps from other FS.

This happens in the kernel with same kconfig and commit 3.12-rc1.

[   51.205369] block nbd1: Attempted send on closed socket
[   51.214126] BUG: unable to handle kernel NULL pointer dereference at 00000004
[   51.215640] IP: [<c10343fb>] pool_mayday_timeout+0x5f/0x9c
[   51.216262] *pdpt = 000000000ca90001 *pde = 0000000000000000 
[   51.216262] Oops: 0000 [#1] 
[   51.216262] CPU: 0 PID: 644 Comm: mount Not tainted 3.12.0-rc1 #2
[   51.216262] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   51.216262] task: ccffd7a0 ti: cca54000 task.ti: cca54000
[   51.216262] EIP: 0060:[<c10343fb>] EFLAGS: 00000046 CPU: 0
[   51.216262] EIP is at pool_mayday_timeout+0x5f/0x9c
[   51.216262] EAX: 00000000 EBX: c1a81d50 ECX: 00000000 EDX: 00000000
[   51.216262] ESI: cd0d303c EDI: cfff7054 EBP: cca55d2c ESP: cca55d18
[   51.216262]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[   51.216262] CR0: 8005003b CR2: 00000004 CR3: 0ca0b000 CR4: 000006b0
[   51.216262] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   51.216262] DR6: 00000000 DR7: 00000000
[   51.216262] Stack:
[   51.216262]  c1a81d60 cd0d303c 00000100 c103439c cca55d58 cca55d3c c102cd96
c1ba4700
[   51.216262]  cca55d58 cca55d6c c102cf7e c1a81d50 c1ba5110 c1ba4f10 cca55d58
c103439c
[   51.216262]  cca55d58 cca55d58 00000001 c1ba4588 00000100 cca55d90 c1028f61
00000001
[   51.216262] Call Trace:
[   51.216262]  [<c103439c>] ? need_to_create_worker+0x32/0x32
[   51.216262]  [<c102cd96>] call_timer_fn.isra.39+0x16/0x60
[   51.216262]  [<c102cf7e>] run_timer_softirq+0x144/0x15e
[   51.216262]  [<c103439c>] ? need_to_create_worker+0x32/0x32
[   51.216262]  [<c1028f61>] __do_softirq+0x87/0x12b
[   51.216262]  [<c10290c4>] irq_exit+0x3a/0x48
[   51.216262]  [<c1002918>] do_IRQ+0x64/0x77
[   51.216262]  [<c175fbac>] common_interrupt+0x2c/0x31
[   51.216262]  [<c12188ee>] ? ocfs2_get_sector+0x14/0x1cd
[   51.216262]  [<c1218b72>] ocfs2_sb_probe+0xcb/0x7ca
[   51.216262]  [<c107bb1c>] ? bdi_lock_two+0x8/0x14
[   51.216262]  [<c12cfc11>] ? string.isra.4+0x26/0x89
[   51.216262]  [<c121a7ba>] ocfs2_fill_super+0x39/0xe84
[   51.216262]  [<c12d1000>] ? pointer.isra.15+0x23f/0x25b
[   51.216262]  [<c12c3660>] ? disk_name+0x20/0x65
[   51.216262]  [<c109d8f6>] mount_bdev+0x105/0x14d
[   51.216262]  [<c1092aaa>] ? slab_pre_alloc_hook.isra.66+0x1e/0x25
[   51.216262]  [<c1095353>] ? __kmalloc_track_caller+0xb8/0xe4
[   51.216262]  [<c10ae5da>] ? alloc_vfsmnt+0xdc/0xff
[   51.216262]  [<c1217173>] ocfs2_mount+0x10/0x12
[   51.216262]  [<c121a781>] ? ocfs2_handle_error+0xa2/0xa2
[   51.216262]  [<c109dad1>] mount_fs+0x55/0x123
[   51.216262]  [<c10aef24>] vfs_kern_mount+0x44/0xac
[   51.216262]  [<c10b030a>] do_mount+0x647/0x768
[   51.216262]  [<c107b043>] ? strndup_user+0x2c/0x3d
[   51.216262]  [<c10b049c>] SyS_mount+0x71/0xa0
[   51.216262]  [<c175f074>] syscall_call+0x7/0xb
[   51.216262] Code: 43 44 e8 7a 8c ff ff 58 5a 5b 5e 5f 5d c3 8b 43 10 8d 78 fc
8d 43 10 89 45 ec 8d 47 04 3b 45 ec 74 ca 89 f8 e8 44 f0 ff ff 89 c1 <8b>
50 04 83 7a 44 00 74 2c 8b 40 68 8d 71 68 39 f0 75 22 8b 72
[   51.216262] EIP: [<c10343fb>] pool_mayday_timeout+0x5f/0x9c SS:ESP
0068:cca55d18
[   51.216262] CR2: 0000000000000004
[   51.216262] ---[ end trace 267272283b2d7610 ]---
[   51.216262] Kernel panic - not syncing: Fatal exception in interrupt

[    3.244964] block nbd1: Attempted send on closed socket
[    3.246243] block nbd1: Attempted send on closed socket
[    3.247508] (mount,661,0):ocfs2_get_sector:1861 ERROR: status = -5
[    3.248906] (mount,661,0):ocfs2_sb_probe:770 ERROR: status = -5
[    3.250269] (mount,661,0):ocfs2_fill_super:1038 ERROR: superblock probe
failed!
[    3.252100] (mount,661,0):ocfs2_fill_super:1229 ERROR: status = -5
[    3.253569] BUG: unable to handle kernel NULL pointer dereference at 00000004
[    3.255322] IP: [<c1034850>] process_one_work+0x1a/0x1cc
[    3.256681] *pdpt = 000000000c950001 *pde = 0000000000000000 
[    3.256833] Oops: 0000 [#1] 
[    3.256833] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 3.12.0-rc1 #2
[    3.256833] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    3.256833] task: cec44d80 ti: cec54000 task.ti: cec54000
[    3.256833] EIP: 0060:[<c1034850>] EFLAGS: 00010046 CPU: 0
[    3.256833] EIP is at process_one_work+0x1a/0x1cc
[    3.256833] EAX: 00000000 EBX: cec1b900 ECX: ccdf0700 EDX: ccdf0700
[    3.256833] ESI: ccdf0754 EDI: c1a81d50 EBP: cec55f44 ESP: cec55f2c
[    3.256833]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    3.256833] CR0: 8005003b CR2: 0000005c CR3: 0cfc5000 CR4: 000006b0
[    3.256833] Stack:
[    3.256833]  c1a81d50 00000000 c10345b0 cec1b900 cec1b918 cec1b918 cec55f54
c1034a1d
[    3.256833]  cec1b900 c1a81d50 cec55f70 c1034d3b cec44d80 c1a81d60 cec47eac
cec1b900
[    3.256833]  c1034c02 cec55fac c10388f7 cec55f94 00000000 00000000 cec1b900
00000000
[    3.256833] Call Trace:
[    3.256833]  [<c10345b0>] ? manage_workers.isra.33+0x178/0x182
[    3.256833]  [<c1034a1d>] process_scheduled_works+0x1b/0x21
[    3.256833]  [<c1034d3b>] worker_thread+0x139/0x1bd
[    3.256833]  [<c1034c02>] ? rescuer_thread+0x1df/0x1df
[    3.256833]  [<c10388f7>] kthread+0x6d/0x72
[    3.256833]  [<c175f637>] ret_from_kernel_thread+0x1b/0x28
[    3.256833]  [<c103888a>] ? init_completion+0x1d/0x1d
[    3.256833] Code: 83 f8 10 74 04 f3 90 b2 f5 89 d0 59 5b 5e 5f 5d c3 55 89 e5
57 56 53 83 ec 0c 89 c3 89 d6 89 d0 e8 f3 eb ff ff 89 45 ec 8b 7b 24 <8b>
40 04 8b 80 80 00 00 00 c1 e8 05 83 e0 01 88 45 e8 f6 43 2c
[    3.256833] EIP: [<c1034850>] process_one_work+0x1a/0x1cc SS:ESP
0068:cec55f2c
[    3.256833] CR2: 0000000000000004
[    3.256833] ---[ end trace a45beaff7f786118 ]---
[    3.256833] BUG: sleeping function called from invalid context at
kernel/rwsem.c:20
[    3.256833] in_atomic(): 1, irqs_disabled(): 1, pid: 5, name: kworker/0:0H

Fengguang Wu

2013-Oct-10 03:38 UTC

head link

[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003

On Thu, Oct 10, 2013 at 11:33:00AM +0800, Fengguang Wu
wrote:> On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu wrote:
> > Dave,
> > 
> > > I note that you have CONFIG_SLUB=y, which means that the cache
slabs
> > > are shared with objects of other types. That means that the
memory
> > > corruption problem is likely to be caused by one of the other
> > > filesystems that is probing the block device(s), not XFS.
> > 
> > Good to know that, it would easy to test then: just turn off every
> > other filesystems. I'll try it right away.
> 
> Seems that we don't even need to do that. A dig through the oops
> database and I find stack dumps from other FS.
> 
> This happens in the kernel with same kconfig and commit 3.12-rc1.
Here is a summary of all FS with oops:

    411 ocfs2_fill_super
    189 xfs_fs_fill_super
     86 jfs_fill_super
     50 isofs_fill_super
     33 fat_fill_super
     18 vfat_fill_super
     15 msdos_fill_super
     11 ext2_fill_super
     10 ext3_fill_super
      3 reiserfs_fill_super

Thanks,
Fengguang

Dave Chinner

2013-Oct-10 04:28 UTC

head link

[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003

On Thu, Oct 10, 2013 at 11:38:34AM +0800, Fengguang Wu
wrote:> On Thu, Oct 10, 2013 at 11:33:00AM +0800, Fengguang Wu wrote:
> > On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu wrote:
> > > Dave,
> > > 
> > > > I note that you have CONFIG_SLUB=y, which means that the
cache slabs
> > > > are shared with objects of other types. That means that the
memory
> > > > corruption problem is likely to be caused by one of the
other
> > > > filesystems that is probing the block device(s), not XFS.
> > > 
> > > Good to know that, it would easy to test then: just turn off
every
> > > other filesystems. I'll try it right away.
> > 
> > Seems that we don't even need to do that. A dig through the oops
> > database and I find stack dumps from other FS.
> > 
> > This happens in the kernel with same kconfig and commit 3.12-rc1.
> 
> Here is a summary of all FS with oops:
> 
>     411 ocfs2_fill_super
>     189 xfs_fs_fill_super
>      86 jfs_fill_super
>      50 isofs_fill_super
>      33 fat_fill_super
>      18 vfat_fill_super
>      15 msdos_fill_super
>      11 ext2_fill_super
>      10 ext3_fill_super
>       3 reiserfs_fill_super
The order of probing on the original dmesg output you reported is:

	ext3
	ext2
	fatfs
	reiserfs
	gfs2
	isofs
	ocfs2

which means that no XFS filesystem was mounted in the original bug
report, and hence that further indicates that XFS is not responsible
for the problem and that perhaps the original bisect was not
reliable...

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com

Ocfs2 devel - Oct 2013 - [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003

[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003

[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003

[Ocfs2-devel] [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003