jean-marc.saffroy@ext.bull.net
2007-Feb-01 05:11 UTC
[Lustre-devel] [Bug 5841] Multiple simultaneous open(O_CREAT|O_RDWR) fail with ESTALE
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=5841 A customer using our 2.6.12 kernel with Lustre patches recently reported hangs with autofs (for NFS mounts), and we traced the problem down to the patch provided in this bug. One user process tries to walk an autofs mount point and triggers action from the automount daemon: STACK TRACE FOR TASK: 0xe0000021c4030000 (sbatchd) 0 schedule+0xc06 [0xa000000100543aa6] 1 interruptible_sleep_on+0xcc [0xa000000100545eec] 2 autofs_wait+0x2ac [0xa0000001002031ac] 3 try_to_fill_dentry+0x30c [0xa000000100200e8c] 4 autofs_revalidate+0x1fc [0xa00000010020123c] 5 real_lookup+0xfc [0xa00000010014b53c] 6 do_lookup+0x16c [0xa00000010014c00c] 7 __link_path_walk+0x3cc [0xa00000010014ca4c] 8 link_path_walk+0xbc [0xa00000010014f39c] 9 path_lookup+0x176 [0xa00000010014f6f6] 10 __user_walk_it+0x7c [0xa0000001001500bc] 11 vfs_stat+0x6c [0xa00000010014178c] 12 sys_newstat+0x2c [0xa000000100141d2c] 13 ia64_ret_from_syscall [0xa00000010000b040] The automount daemon tries to cross the same directory (autofs handles accesses from automount daemons differently) and blocks on the dir->i_sem: STACK TRACE FOR TASK: 0xe0000041fe8d0000 (automount) 0 schedule+0xc06 [0xa000000100543aa6] 1 __down+0x18c [0xa000000100542dcc] 2 vfs_readdir+0x12c [0xa00000010015b30c] 3 sys_getdents64+0xdc [0xa00000010015bbbc] 4 ia64_ret_from_syscall [0xa00000010000b040] At this point we are looking for a workaround in autofs, but suggestions for a better fix in Lustre are welcome. The offending patch is in the vfs_intent Lustre kernel patches for 2.6.12 and sles10.