thr3ads.net - Ocfs2 devel - [Ocfs2-users] ocfs or configfs bug ? [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Welterlen Benoit

2011-Apr-19 14:54 UTC

[Ocfs2-users] ocfs or configfs bug ?

Hi all,

I have a bug with OCFS through configfs : to illustrate this, try :

while true ; do ls -l /sys/kernel/config/cluster/ocfs2/heartbeat ; done&

while true ; do echo 31> 
/sys/kernel/config/cluster/ocfs2/heartbeat/dead_threshold ; done&


So, I have a kernel crash :

BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230 [configfs]
PGD 467bea067 PUD 46d4d9067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/fs/o2cb/interface_revision
CPU 36
Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm nls_utf8 nfs lockd 
fscache nfs_acl auth_rpcgss ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue 
ipmi_devintf ipmi_si ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 
xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 i2c_i801 i2c_core
sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core igb dca ext4 jbd2 
sd_mod crc_t10dif usbhid hid ahci ehci_hcd uhci_hcd dm_mod [last unloaded: 
scsi_wait_scan]

Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm nls_utf8 nfs lockd 
fscache nfs_acl auth_rpcgss ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue 
ipmi_devintf ipmi_si ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 
xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 i2c_i801 i2c_core
sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core igb dca ext4 jbd2 
sd_mod crc_t10dif usbhid hid ahci ehci_hcd uhci_hcd dm_mod [last unloaded: 
scsi_wait_scan]
Pid: 59850, comm: ls Tainted: G   M       ----------------  
2.6.32-71.24.1.el6.Bull.23.x86_64 #1 bullx super-node
RIP: 0010:[<ffffffffa01fd214>]  [<ffffffffa01fd214>]
configfs_readdir+0xf4/0x230
[configfs]
RSP: 0018:ffff880c6c8b3e78  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88086c4b23a8 RCX: ffff88086c4b23a0
RDX: 000000000000000e RSI: ffff88086c4b2410 RDI: ffffffffa02946e1
RBP: ffff880c6c8b3ed8 R08: ffff88086c4b23a8 R09: 0000000000000004
R10: 00007fff59ce4cf0 R11: 0000000000000246 R12: ffff88046bfbe0c0
R13: ffffffffa02946e1 R14: ffff88046c687608 R15: ffff88046c687610
FS:  00007fdf806017a0(0000) GS:ffff880036840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 0000000467ffc000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 59850, threadinfo ffff880c6c8b2000, task ffff880c6aeeeea0)
Stack:
  ffff880c6c8b3ee8 0000000002347078 ffff88086c4b23a0 ffffffff8116bea0
<0> ffff880c6c8b3f38 ffff88086c4b2410 ffff880c6c8b3ef8 ffff88046bfbe0c0
<0> ffff880c6c8b3f38 ffffffff8116bea0 ffff88086e109720 ffff88086e109668
Call Trace:
  [<ffffffff8116bea0>] ? filldir+0x0/0xe0
  [<ffffffff8116bea0>] ? filldir+0x0/0xe0
  [<ffffffff8116c120>] vfs_readdir+0xc0/0xe0
  [<ffffffff8116c2a9>] sys_getdents+0x89/0xf0
  [<ffffffff8100c172>] system_call_fastpath+0x16/0x1b
Code: 48 83 f8 02 4d 8d 7e 08 48 89 55 c8 0f 84 15 01 00 00 49 8b 5e 08 48 3b 5d
c8 0f 85 7c 00 00 00 e9 da 00 00 00 66 90 48 8b 40 10 <4c> 8b 40 40 44 0f
b7 49
44 4c 89 ee 49 8b 4c 24 40 48 8b 7d c0
RIP  [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230 [configfs]
  RSP <ffff880c6c8b3e78>
CR2: 0000000000000040
crash> bt ffff880c6aeeeea0
PID: 59850  TASK: ffff880c6aeeeea0  CPU: 36  COMMAND: "ls"
  #0 [ffff880c6c8b3b40] machine_kexec at ffffffff8102e77b
  #1 [ffff880c6c8b3ba0] crash_kexec at ffffffff810a6cd8
  #2 [ffff880c6c8b3c70] oops_end at ffffffff8146aad0
  #3 [ffff880c6c8b3ca0] no_context at ffffffff8103789b
  #4 [ffff880c6c8b3cf0] __bad_area_nosemaphore at ffffffff81037b25
  #5 [ffff880c6c8b3d40] bad_area at ffffffff81037c4e
  #6 [ffff880c6c8b3d70] do_page_fault at ffffffff8146c648
  #7 [ffff880c6c8b3dc0] page_fault at ffffffff81469e45
     [exception RIP: configfs_readdir+244]
     RIP: ffffffffa01fd214  RSP: ffff880c6c8b3e78  RFLAGS: 00010282
     RAX: 0000000000000000  RBX: ffff88086c4b23a8  RCX: ffff88086c4b23a0
     RDX: 000000000000000e  RSI: ffff88086c4b2410  RDI: ffffffffa02946e1
     RBP: ffff880c6c8b3ed8   R8: ffff88086c4b23a8   R9: 0000000000000004
     R10: 00007fff59ce4cf0  R11: 0000000000000246  R12: ffff88046bfbe0c0
     R13: ffffffffa02946e1  R14: ffff88046c687608  R15: ffff88046c687610
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #8 [ffff880c6c8b3ee0] vfs_readdir at ffffffff8116c120
  #9 [ffff880c6c8b3f30] sys_getdents at ffffffff8116c2a9
#10 [ffff880c6c8b3f80] system_call_fastpath at ffffffff8100c172
     RIP: 00007fdf7f8dcec5  RSP: 00007fff59ce4e70  RFLAGS: 00010202
     RAX: 000000000000004e  RBX: ffffffff8100c172  RCX: 0000000002347070
     RDX: 0000000000008000  RSI: 000000000233f078  RDI: 0000000000000003
     RBP: ffffffffffffff08   R8: 000000000233f078   R9: 0000000000800000
     R10: 00007fff59ce4cf0  R11: 0000000000000246  R12: 000000000233f010
     R13: 000000000233f078  R14: 0000000000000000  R15: 000000000233f050
     ORIG_RAX: 000000000000004e  CS: 0033  SS: 002b


I have a dump if you want more information.

I've looked into the source code, but I found that a lock is useless :
/* Only sets a new threshold if there are no active regions.
  *
  * No locking or otherwise interesting code is required for reading
  * o2hb_dead_threshold as it can't change once regions are active and
  * it's not interesting to anyone until then anyway. */
static void o2hb_dead_threshold_set(unsigned int threshold)
{
         if (threshold > O2HB_MIN_DEAD_THRESHOLD) {
                 spin_lock(&o2hb_live_lock);
                 if (list_empty(&o2hb_all_regions))
                         o2hb_dead_threshold = threshold;
                 spin_unlock(&o2hb_live_lock);
         }
}

So, is it a configfs or ocfs problem ? Who is in charge of locking the configfs 
access ?

Thanks !

Regards,

Benoit


-- 
Benoit Welterlen
Open Software R&D
Bull, Architect of an Open World TM
Tel : +33 4 76 29 73 90
http://www.bull-world.com/
www.bull.com

This e-mail contains material that is confidential for the sole use of the
intended recipient. Any review, reliance or distribution by others or forwarding
without express permission is strictly prohibited.
If you are not the intended recipient, please contact the sender and delete all
copies.

Joel Becker

2011-Apr-19 19:48 UTC

head link

[Ocfs2-users] ocfs or configfs bug ?

On Tue, Apr 19, 2011 at 04:54:32PM +0200, Welterlen Benoit
wrote:> I have a bug with OCFS through configfs : to illustrate this, try :
> 
> while true ; do ls -l /sys/kernel/config/cluster/ocfs2/heartbeat ;
done&
> 
> while true ; do echo 31> 
/sys/kernel/config/cluster/ocfs2/heartbeat/dead_threshold ; done&
	Interesting!
> RIP  [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230 [configfs]
<snip>>   #8 [ffff880c6c8b3ee0] vfs_readdir at ffffffff8116c120
>   #9 [ffff880c6c8b3f30] sys_getdents at ffffffff8116c2a9
	I presume this is in the process that is ls(1)ing the directory?
> I've looked into the source code, but I found that a lock is useless :
> /* Only sets a new threshold if there are no active regions.
>   *
>   * No locking or otherwise interesting code is required for reading
>   * o2hb_dead_threshold as it can't change once regions are active and
>   * it's not interesting to anyone until then anyway. */
> static void o2hb_dead_threshold_set(unsigned int threshold)
> {
>          if (threshold > O2HB_MIN_DEAD_THRESHOLD) {
>                  spin_lock(&o2hb_live_lock);
>                  if (list_empty(&o2hb_all_regions))
>                          o2hb_dead_threshold = threshold;
>                  spin_unlock(&o2hb_live_lock);
>          }
> }
	You're too late here.  This is in the echo process (bash,
really).  getdents() isn't happening.
	The problem is almost certainly in configfs.  It's a race
between setup and teardown of the virtual attribute files.  If anyone
else has a cycle to look at it, great, otherwise I'll try to get to it
later this week.

Joel
-- 

"In a crisis, don't hide behind anything or anybody. They're going
 to find you anyway."
	- Paul "Bear" Bryant

			http://www.jlbec.org/
			jlbec at evilplan.org

Sunil Mushran

2011-Apr-20 00:20 UTC

head link

[Ocfs2-devel] [Ocfs2-users] ocfs or configfs bug ?

On 04/19/2011 12:48 PM, Joel Becker wrote:> 	You're too late here.  This is in the echo process (bash,
> really).  getdents() isn't happening.
> 	The problem is almost certainly in configfs.  It's a race
> between setup and teardown of the virtual attribute files.  If anyone
> else has a cycle to look at it, great, otherwise I'll try to get to it
> later this week.
So we ran into it internally. This is what I wrote in the bug.

/@ The matching code in configfs_readir() is:/
/@     name = configfs_get_name(next);/
/@     len = strlen(name);/
/@     if (next->s_dentry)/
/@            ino = next->s_dentry->d_inode->i_ino; <===/
/@     else/
/@            ino = iunique(configfs_sb, 2);/
/@ ./
/@     if (filldir(dirent, name, len, filp->f_pos, ino,/
/@                 dt_type(next)) < 0)/
/@            return 0;/
/@ ./
/@ The oops indicates that next->s_dentry->d_inode is NULL./

Joel, does this give you any clues?

BTW, thanks for the testcase. And yes, I can reproduce it easily.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20110419/6b6d3b34/attachment.html

Apparently Analagous Threads

Search for more seemingly similar threads

Ocfs2 devel - Apr 2011 - ocfs or configfs bug ?

[Ocfs2-users] ocfs or configfs bug ?

[Ocfs2-users] ocfs or configfs bug ?

[Ocfs2-devel] [Ocfs2-users] ocfs or configfs bug ?

Apparently Analagous Threads