Hi, We''ve had several OSSes kernel panic during the past week, and all but one occurred in ost_rw_prolong_locks in ost_handler.c. From what I can tell, this file hasn''t changed since 1.8.4, which is what we''re running in production. We have had no luck in tying these events to load on the file system or errors reported in the logs. Hardware wise, the machines are stable (until they crash and the RAID arrays need to rebuild). I''ve attached a screen shot from the console after the panic; unfortunately, I don''t know if the stack trace before the panic is associated with the kernel panic. For the most part, the kernel seems to manage cleaning up hung threads. At this point, we would appreciate any insight into what may be causing this. If someone thinks it may be a bug, I would be glad to open a ticket. Thanks, Rick Host info: CentOS 5.4 Linux lustre-oss-0-2.local 2.6.18-194.3.1.el5_lustre.1.8.4 #1 SMP Fri Jul 9 21:55:24 MDT 2010 x86_64 x86_64 x86_64 GNU/Linux -------------- next part -------------- A non-text attachment was scrubbed... Name: oss-0-2.png Type: image/png Size: 217332 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110721/f8f9eb7a/attachment-0001.png
Johann Lombardi
2011-Jul-28 19:18 UTC
[Lustre-discuss] Kernel panic in ost_rw_prolong_locks
Hi, On Thu, Jul 21, 2011 at 12:44:54PM -0700, Rick Wagner wrote:> Host info: > CentOS 5.4 > Linux lustre-oss-0-2.local 2.6.18-194.3.1.el5_lustre.1.8.4 #1 SMP Fri Jul 9 21:55:24 MDT 2010 x86_64 x86_64 x86_64 GNU/LinuxI think you hit bugzilla ticket 21804 which is fixed in both 1.8.6 & 1.8.6-wc1. Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com
Hi Johann, On Jul 28, 2011, at 12:18 PM, Johann Lombardi wrote:> Hi, > > On Thu, Jul 21, 2011 at 12:44:54PM -0700, Rick Wagner wrote: >> Host info: >> CentOS 5.4 >> Linux lustre-oss-0-2.local 2.6.18-194.3.1.el5_lustre.1.8.4 #1 SMP Fri Jul 9 21:55:24 MDT 2010 x86_64 x86_64 x86_64 GNU/Linux > > I think you hit bugzilla ticket 21804 which is fixed in both 1.8.6 & 1.8.6-wc1.That''s good news. We''re testing new servers with 1.8.6-wc1. Thanks, Rick> > Cheers, > Johann > > -- > Johann Lombardi > Whamcloud, Inc. > www.whamcloud.com
Repeated post. Please ignore. --Rick On Jul 21, 2011, at 12:44 PM, Rick Wagner wrote:> Hi, > > We''ve had several OSSes kernel panic during the past week, and all but one occurred in ost_rw_prolong_locks in ost_handler.c. From what I can tell, this file hasn''t changed since 1.8.4, which is what we''re running in production. We have had no luck in tying these events to load on the file system or errors reported in the logs. Hardware wise, the machines are stable (until they crash and the RAID arrays need to rebuild). > > I''ve attached a screen shot from the console after the panic; unfortunately, I don''t know if the stack trace before the panic is associated with the kernel panic. For the most part, the kernel seems to manage cleaning up hung threads. > > At this point, we would appreciate any insight into what may be causing this. If someone thinks it may be a bug, I would be glad to open a ticket. > > Thanks, > Rick > > Host info: > CentOS 5.4 > Linux lustre-oss-0-2.local 2.6.18-194.3.1.el5_lustre.1.8.4 #1 SMP Fri Jul 9 21:55:24 MDT 2010 x86_64 x86_64 x86_64 GNU/Linux > > <oss-0-2.png>_______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss