zam@clusterfs.com
2007-Jan-18 05:04 UTC
[Lustre-devel] [Bug 11562] racer-correctness test fails on b1_4_sles10 kernel 2.6.16.21-08-sles10 on x86_64
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11562 What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED the problem is reproducible locally in a SL10.1 environment. The folling hot fix (with debugging code) fixes the problem. However it is not tried on buffalo. diff -u -p -r1.27.40.3.14.3.26.4 symlink.c --- lustre/llite/symlink.c 16 Nov 2006 19:21:39 -0000 1.27.40.3.14.3.26.4 +++ lustre/llite/symlink.c 18 Jan 2007 10:41:09 -0000 @@ -130,7 +130,7 @@ static LL_FOLLOW_LINK_RETURN_TYPE ll_fol struct inode *inode = dentry->d_inode; struct ll_inode_info *lli = ll_i2info(inode); struct lookup_intent *it = ll_nd2it(nd); - struct ptlrpc_request *request; + struct ptlrpc_request *request = NULL; int rc; char *symname; ENTRY; @@ -145,6 +145,17 @@ static LL_FOLLOW_LINK_RETURN_TYPE ll_fol } CDEBUG(D_VFSTRACE, "VFS Op\n"); + + { + int dummy = 1; + printk("SP x%p lelel = %d\n", &dummy, current->link_count); + } + + if (current->link_count > 5) { + path_release(nd); + GOTO(out, rc = -ELOOP); + } + A simpler test found which causes stack overflow in a luster client without the hot fix: $ ln -sf foo foo $ ls foo The debugging code above gives information about how stack usage grows with each ll_follow_link call: SP 0xffff810001defb14 lelel = 1 SP 0xffff810001def8c4 lelel = 2 SP 0xffff810001def674 lelel = 3 SP 0xffff810001def424 lelel = 4 SP 0xffff810001def1d4 lelel = 5 SP 0xffff810001deef84 lelel = 6 It means these functions together eat 592 bytes on stack: link_path_walk __link_path_walk do_follow_link __do_follow_link __vfs_follow_link (link_path_walk again) especially link_path_walk takes 200 bytes and __link_path_walk takes 280 (from checkstack.pl report) for comparing, the same functions in the same kernel for i386 arch take: __link_path_walk: 280 link_path_walk: 200 and stack usage report for newer kernel 2.6.20-rc5 on x86_64: link_path_walk [vmlinux]: 152 __link_path_walk [vmlinux]: 104 2.6.9-rhel4 and x86_64: 0xffffffff80184f6f link_path_walk: 192 0xffffffff80183f80 __link_path_walk: 136
zam@clusterfs.com
2007-Jan-18 11:12 UTC
[Lustre-devel] [Bug 11562] racer-correctness test fails on b1_4_sles10 kernel 2.6.16.21-08-sles10 on x86_64
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11562 Created an attachment (id=9371) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9371&action=view) reduce stack usage in __vfs_follow_link __vfs_follow_link mistakenly pushed whole intent structure to the stack and increased stack usage by 72 bytes (for x86_64 arch). Thus recursive symlink lookup crashed before the MAX_NESTED_LINKS limit is reached. The fix gets rid of the temporary stack-allocated variable.