Karen Liesenfeld
2010-Oct-31 22:30 UTC
[Lustre-discuss] Client lockup: Problems with bug 22935?
Hi all, we are running Lustre 1.8.3 on Redhat 2.6.18-164.11.1.el5 kernels and on one client the Lustre module ''crashed'' twice in 24 hours: The first error message looks like we are running into bug 22935. For the second error I could not find anything in bugzilla. There are no errors on the Lustre servers. Could those two problems be related? Does anybody know a workaround for this? Should we disable statahead for now? Happy to provide more information. Thanks for your help, Karen --- First error---- Oct 31 18:01:44 server kernel: LustreError: 4887:0:(statahead.c:1164:ll_statahead_exit()) ASSERTION(lli->lli_opendir_pid == cfs_curproc_pid()) failed Oct 31 18:01:44 server kernel: LustreError: 4887:0:(statahead.c:1164:ll_statahead_exit()) LBUG Oct 31 18:01:44 server kernel: [<ffffffff887c0a69>] ll_statahead_exit+0xd9/0x510 [lustre] Oct 31 18:01:44 server kernel: [<ffffffff887b63bd>] ll_i2gids+0x7d/0x150 [lustre] Oct 31 18:01:44 server kernel: [<ffffffff887b6517>] ll_prepare_mdc_op_data+0x87/0x110 [lustre] Oct 31 18:01:44 server kernel: [<ffffffff8876f9df>] ll_revalidate_it+0x86f/0x1050 [lustre] Oct 31 18:01:44 server kernel: [<ffffffff887b9280>] ll_mdc_blocking_ast+0x0/0x520 [lustre] Oct 31 18:01:44 server kernel: [<ffffffff887b5911>] ll_convert_intent+0xb1/0x170 [lustre] Oct 31 18:01:44 server kernel: [<ffffffff88770d64>] ll_revalidate_nd+0x194/0x390 [lustre] ---Second error---- Nov 1 03:17:23 server kernel: LustreError: 5341:0:(file.c:435:ll_file_open()) ASSERTION(lli->lli_sai == NULL) failed Nov 1 03:17:23 server kernel: LustreError: 5341:0:(file.c:435:ll_file_open()) LBUG Nov 1 03:17:23 server kernel: [<ffffffff887860c6>] ll_file_open+0x226/0xd10 [lustre] Nov 1 03:17:23 server kernel: [<ffffffff88785ea0>] ll_file_open+0x0/0xd10 [lustre] Nov 1 03:17:23 server kernel: [<ffffffff8876fdb2>] ll_revalidate_nd+0x1e2/0x390 [lustre] Nov 1 03:17:23 server kernel: LustreError: dumping log to /tmp/lustre-log.1288534643.5341
Andreas Dilger
2010-Nov-01 06:22 UTC
[Lustre-discuss] Client lockup: Problems with bug 22935?
On 2010-10-31, at 16:30, Karen Liesenfeld wrote:> The first error message looks like we are running into bug 22935. For > the second error I could not find anything in bugzilla. > There are no errors on the Lustre servers. > > Could those two problems be related?They both look like statahead problems.> Does anybody know a workaround for this? Should we disable statahead for > now?I believe 1.8.4 has fixes for the statahead code, and 22935 shows that a fix was landed for the upcoming 1.8.5 release as well. In the meantime, you could disable stathead on the clients. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Karen Liesenfeld
2010-Nov-01 20:42 UTC
[Lustre-discuss] Client lockup: Problems with bug 22935?
On 01/11/10 19:22, Andreas Dilger wrote:> On 2010-10-31, at 16:30, Karen Liesenfeld wrote: >> The first error message looks like we are running into bug 22935. For >> the second error I could not find anything in bugzilla. >> There are no errors on the Lustre servers. >> >> Could those two problems be related? > > They both look like statahead problems. > >> Does anybody know a workaround for this? Should we disable statahead for >> now? > > I believe 1.8.4 has fixes for the statahead code, and 22935 shows that a fix was landed for the upcoming 1.8.5 release as well. In the meantime, you could disable stathead on the clients. >Thanks for your help. I''ll test disabling statahead until we can upgrade. Thanks, Karen