thr3ads.net - Ocfs2 users - [Ocfs2-users] Bug in OCFS2 1.3.3 [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Paulo Rodrigues

2008-Aug-13 11:02 UTC

[Ocfs2-users] Bug in OCFS2 1.3.3

Hello,

I'm on 2.6.24 with OCFS2 1.3.3 and every couple days this comes up in dmesg.
I have to reboot the cluster machines, there's nothing else I can do.
Stopping the services or unmounting volumes fails. Perhaps this is a
well-known bug, but I couldn't find it. On the other hand, do you think it
could be solved by upgrading to 1.5.0?

BUG: unable to handle kernel NULL pointer dereference at virtual address
00000934
printing eip: f8d6f442 *pde = cf03a067
Oops: 0002 [#1] SMP
Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs
fuse ipv6 sr_mod cdrom ata_generic pata_acpi ata_piix serio_raw bnx2 libata
pcspkr iTCO_wdt iTCO_vendor_support button i5000_edac edac_core dcdbas sg
dm_round_robin dm_emc dm_multipath dm_snapshot dm_zero dm_mirror dm_mod lpfc
scsi_transport_fc scsi_tgt megaraid_sas sd_mod scsi_mod ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd

Pid: 575, comm: pop3 Not tainted (2.6.24.7-92.fc8 #1)
EIP: 0060:[<f8d6f442>] EFLAGS: 00210246 CPU: 2
EIP is at ocfs2_free_suballoc_bits+0x41a/0x6d9 [ocfs2]
EAX: f77b6f00 EBX: 00000000 ECX: 00000000 EDX: 000047a8
ESI: 0000000b EDI: 00000000 EBP: f35bf000 ESP: da67bcec
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process pop3 (pid: 575, ti=da67b000 task=f43b4000 task.ti=da67b000)
Stack: 00000002 da67bd2c 00000001 f77b7278 f75a4840 e0f019a0 f884e798
f7724f88
       f77b7278 f75a4840 f3bc9000 f3bc90c0 f772c968 f77b6f00 00000000
00000000
       f772c968 000047a8 00000000 f42b0800 010d81a8 f8d72da9 000047a8
010d3a00
Call Trace:
 [<f884e798>] do_get_write_access+0x329/0x362 [jbd]
 [<f8d72da9>] ocfs2_free_clusters+0x171/0x212 [ocfs2]
 [<f8d3ed5c>] __ocfs2_flush_truncate_log+0x596/0x702 [ocfs2]
 [<f884e463>] journal_stop+0x15d/0x169 [jbd]
 [<f8d483e9>] ocfs2_commit_truncate+0x30f/0x1240 [ocfs2]
 [<f8d4e3c0>] ocfs2_read_blocks+0x45c/0x46d [ocfs2]
 [<f8d6027f>] ocfs2_wipe_inode+0x4f3/0xcc2 [ocfs2]
 [<f8d628e5>] ocfs2_delete_inode+0x409/0x624 [ocfs2]
 [<c062b62b>] mutex_lock+0x1a/0x29
 [<c04ac011>] inotify_inode_is_dead+0x18/0x6c
 [<f8d624dc>] ocfs2_delete_inode+0x0/0x624 [ocfs2]
 [<c04994a7>] generic_delete_inode+0x91/0xf7
 [<c0498d95>] iput+0x60/0x62
 [<c04919aa>] do_unlinkat+0xae/0x119
 [<c04895f3>] vfs_read+0x111/0x14b
 [<c0489ac1>] sys_pread64+0x48/0x5f
 [<c04051da>] syscall_call+0x7/0xb
 ======================Code: 9c 00 00 00 8b 80 6c 01 00 00 8b b8 c4 00 00 00 8b
b0 c0 00 00 00 8b
44 24 34 8b 58 04 8b 08 39 df 75 0e 39 ce 75 0a 8b 4c 24 38 <0f> ab 51 40
19
c0 4a ff 4c 24 3c 83 7c 24 3c ff 75 b7 8b 5c 24
EIP: [<f8d6f442>] ocfs2_free_suballoc_bits+0x41a/0x6d9 [ocfs2] SS:ESP
0068:da67bcec
---[ end trace 2d0b75b98f26e1b8 ]---

Many thanks,
Paulo Rodrigues
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080813/1e690de3/attachment.html

Sunil Mushran

2008-Aug-13 18:05 UTC

head link

[Ocfs2-users] Bug in OCFS2 1.3.3

This could suggest an on disk problem. Have you run fsck.ocfs2 recently?

fsck.ocfs2 -f /dev/sdX1


Paulo Rodrigues wrote:> Hello,
>
> I'm on 2.6.24 with OCFS2 1.3.3 and every couple days this comes up in 
> dmesg. I have to reboot the cluster machines, there's nothing else I 
> can do. Stopping the services or unmounting volumes fails. Perhaps 
> this is a well-known bug, but I couldn't find it. On the other hand, 
> do you think it could be solved by upgrading to 1.5.0?
>
> BUG: unable to handle kernel NULL pointer dereference at virtual 
> address 00000934
> printing eip: f8d6f442 *pde = cf03a067
> Oops: 0002 [#1] SMP
> Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager 
> configfs fuse ipv6 sr_mod cdrom ata_generic pata_acpi ata_piix 
> serio_raw bnx2 libata pcspkr iTCO_wdt iTCO_vendor_support button 
> i5000_edac edac_core dcdbas sg dm_round_robin dm_emc dm_multipath 
> dm_snapshot dm_zero dm_mirror dm_mod lpfc scsi_transport_fc scsi_tgt 
> megaraid_sas sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
>
> Pid: 575, comm: pop3 Not tainted (2.6.24.7-92.fc8 #1)
> EIP: 0060:[<f8d6f442>] EFLAGS: 00210246 CPU: 2
> EIP is at ocfs2_free_suballoc_bits+0x41a/0x6d9 [ocfs2]
> EAX: f77b6f00 EBX: 00000000 ECX: 00000000 EDX: 000047a8
> ESI: 0000000b EDI: 00000000 EBP: f35bf000 ESP: da67bcec
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process pop3 (pid: 575, ti=da67b000 task=f43b4000 task.ti=da67b000)
> Stack: 00000002 da67bd2c 00000001 f77b7278 f75a4840 e0f019a0 f884e798 
> f7724f88
>        f77b7278 f75a4840 f3bc9000 f3bc90c0 f772c968 f77b6f00 00000000 
> 00000000
>        f772c968 000047a8 00000000 f42b0800 010d81a8 f8d72da9 000047a8 
> 010d3a00
> Call Trace:
>  [<f884e798>] do_get_write_access+0x329/0x362 [jbd]
>  [<f8d72da9>] ocfs2_free_clusters+0x171/0x212 [ocfs2]
>  [<f8d3ed5c>] __ocfs2_flush_truncate_log+0x596/0x702 [ocfs2]
>  [<f884e463>] journal_stop+0x15d/0x169 [jbd]
>  [<f8d483e9>] ocfs2_commit_truncate+0x30f/0x1240 [ocfs2]
>  [<f8d4e3c0>] ocfs2_read_blocks+0x45c/0x46d [ocfs2]
>  [<f8d6027f>] ocfs2_wipe_inode+0x4f3/0xcc2 [ocfs2]
>  [<f8d628e5>] ocfs2_delete_inode+0x409/0x624 [ocfs2]
>  [<c062b62b>] mutex_lock+0x1a/0x29
>  [<c04ac011>] inotify_inode_is_dead+0x18/0x6c
>  [<f8d624dc>] ocfs2_delete_inode+0x0/0x624 [ocfs2]
>  [<c04994a7>] generic_delete_inode+0x91/0xf7
>  [<c0498d95>] iput+0x60/0x62
>  [<c04919aa>] do_unlinkat+0xae/0x119
>  [<c04895f3>] vfs_read+0x111/0x14b
>  [<c0489ac1>] sys_pread64+0x48/0x5f
>  [<c04051da>] syscall_call+0x7/0xb
>  ======================> Code: 9c 00 00 00 8b 80 6c 01 00 00 8b b8 c4 00
00 00 8b b0 c0 00 00
> 00 8b 44 24 34 8b 58 04 8b 08 39 df 75 0e 39 ce 75 0a 8b 4c 24 38
<0f>
> ab 51 40 19 c0 4a ff 4c 24 3c 83 7c 24 3c ff 75 b7 8b 5c 24
> EIP: [<f8d6f442>] ocfs2_free_suballoc_bits+0x41a/0x6d9 [ocfs2] SS:ESP
> 0068:da67bcec
> ---[ end trace 2d0b75b98f26e1b8 ]---
>
> Many thanks,
> Paulo Rodrigues
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Paulo Rodrigues

2008-Aug-13 21:29 UTC

head link

[Ocfs2-users] Bug in OCFS2 1.3.3

Hello Sunil,

fsck says its clean:

Checking OCFS2 filesystem in /dev/dm-1:
  label:              /var/lib/dovecot/spool
  uuid:               ab 1e ac 82 67 cb 47 58 81 07 2b 00 55 f6 09 36
  number of blocks:   246838717
  bytes per block:    4096
  number of clusters: 246838717
  bytes per cluster:  4096
  max slots:          4

o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 0
o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0
o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0
o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0
/dev/dm-1 is clean.  It will be checked after 20 additional mounts.

I expected upgrading to 1.5.0 would fix it... What do you think?

Many thanks,
Paulo

This could suggest an on disk problem. Have you run fsck.ocfs2
recently?>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080813/2d2d4eef/attachment.html

Sunil Mushran

2008-Aug-13 21:44 UTC

head link

[Ocfs2-users] Bug in OCFS2 1.3.3

Does not look you used the force option. Or, you ran with the
file system mounted.

Umount the fs on all nodes and do:
$ fsck.ocfs2 -f /dev/dm-1

Paulo Rodrigues wrote:> Hello Sunil,
>
> fsck says its clean:
>
> Checking OCFS2 filesystem in /dev/dm-1:
>   label:              /var/lib/dovecot/spool
>   uuid:               ab 1e ac 82 67 cb 47 58 81 07 2b 00 55 f6 09 36
>   number of blocks:   246838717
>   bytes per block:    4096
>   number of clusters: 246838717
>   bytes per cluster:  4096
>   max slots:          4
>
> o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 0
> o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0
> o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0
> o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0
> /dev/dm-1 is clean.  It will be checked after 20 additional mounts.
>
> I expected upgrading to 1.5.0 would fix it... What do you think?
>
> Many thanks,
> Paulo
>
>     This could suggest an on disk problem. Have you run fsck.ocfs2
>     recently?
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Sunil Mushran

2008-Aug-15 19:19 UTC

head link

[Ocfs2-users] Bug in OCFS2 1.3.3

Please can you file a bugzilla and attach this stack trace.
Also attach the output of the following:

$ objdump -DSl /lib/modules/`uname -r`/kernel/fs/ocfs2/ocfs2.ko 
 >/tmp/ocfs2.out

Paulo Rodrigues wrote:> Got the same error again today.
>
> BUG: unable to handle kernel NULL pointer dereference at virtual 
> address 000002d8
> printing eip: f8d8f442 *pde = cecd2067
> Oops: 0002 [#1] SMP
> Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager 
> configfs fuse ipv6 sr_mod cdrom ata_generic pata_acpi bnx2 ata_piix 
> pcspkr iTCO_wdt libata button serio_raw iTCO_vendor_support i5000_edac 
> edac_core dcdbas sg dm_round_robin dm_emc dm_multipath dm_snapshot 
> dm_zero dm_mirror dm_mod lpfc scsi_transport_fc scsi_tgt megaraid_sas 
> sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
>
> Pid: 30807, comm: pop3 Not tainted (2.6.24.7-92.fc8 #1)
> EIP: 0060:[<f8d8f442>] EFLAGS: 00210246 CPU: 4
> EIP is at ocfs2_free_suballoc_bits+0x41a/0x6d9 [ocfs2]
> EAX: cbe6ca00 EBX: 00000000 ECX: 00000000 EDX: 000014c2
> ESI: 0000000b EDI: 00000000 EBP: c8afd000 ESP: dbb1bcec
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process pop3 (pid: 30807, ti=dbb1b000 task=c3472d20 task.ti=dbb1b000)
> Stack: 00000002 dbb1bd2c 00000001 cbe6cd78 f76bb5e8 f10195e8 f884e798 
> d7709070
>        cbe6cd78 f76bb5e8 c668f000 c668f0c0 f19ee380 cbe6ca00 00000000 
> 00000000
>        f19ee380 000014c2 00000000 e72a0000 0810cec2 f8d92da9 000014c2 
> 0810ba00
> Call Trace:
>  [<f884e798>] do_get_write_access+0x329/0x362 [jbd]
>  [<f8d92da9>] ocfs2_free_clusters+0x171/0x212 [ocfs2]
>  [<f8d5ed5c>] __ocfs2_flush_truncate_log+0x596/0x702 [ocfs2]
>  [<f884e463>] journal_stop+0x15d/0x169 [jbd]
>  [<f8d683e9>] ocfs2_commit_truncate+0x30f/0x1240 [ocfs2]
>  [<f8d6e3c0>] ocfs2_read_blocks+0x45c/0x46d [ocfs2]
>  [<f8d8027f>] ocfs2_wipe_inode+0x4f3/0xcc2 [ocfs2]
>  [<f8d828e5>] ocfs2_delete_inode+0x409/0x624 [ocfs2]
>  [<c062b62b>] mutex_lock+0x1a/0x29
>  [<c04ac011>] inotify_inode_is_dead+0x18/0x6c
>  [<f8d824dc>] ocfs2_delete_inode+0x0/0x624 [ocfs2]
>  [<c04994a7>] generic_delete_inode+0x91/0xf7
>  [<c0498d95>] iput+0x60/0x62
>  [<c04919aa>] do_unlinkat+0xae/0x119
>  [<c04895f3>] vfs_read+0x111/0x14b
>  [<c0489ac1>] sys_pread64+0x48/0x5f
>  [<c04051da>] syscall_call+0x7/0xb
>  ======================> Code: 9c 00 00 00 8b 80 6c 01 00 00 8b b8 c4 00
00 00 8b b0 c0 00 00
> 00 8b 44 24 34 8b 58 04 8b 08 39 df 75 0e 39 ce 75 0a 8b 4c 24 38
<0f>
> ab 51 40 19 c0 4a ff 4c 24 3c 83 7c 24 3c ff 75 b7 8b 5c 24
> EIP: [<f8d8f442>] ocfs2_free_suballoc_bits+0x41a/0x6d9 [ocfs2] SS:ESP
> 0068:dbb1bcec
> ---[ end trace 4bb65900c779e50c ]---
>
> Am I missing something?
>
> Thanks!

Paulo Rodrigues

2008-Aug-15 19:38 UTC

head link

[Ocfs2-users] Bug in OCFS2 1.3.3

Bugzilla #1011 filed! Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080815/c2fac707/attachment.html

Ocfs2 users - Aug 2008 - Bug in OCFS2 1.3.3

[Ocfs2-users] Bug in OCFS2 1.3.3

[Ocfs2-users] Bug in OCFS2 1.3.3

[Ocfs2-users] Bug in OCFS2 1.3.3

[Ocfs2-users] Bug in OCFS2 1.3.3

[Ocfs2-users] Bug in OCFS2 1.3.3

[Ocfs2-users] Bug in OCFS2 1.3.3