Denise Hummel
2008-Dec-04 14:40 UTC
[Lustre-discuss] Lustre-discuss Digest, Vol 35, Issue 5
Hi Brian; Thanks for the advice. The messages you saw were immediately prior to the kernel panic - the console showed the kernel panic and the messages on the console were about brw_writes and OST timeouts. I did do a baseline, so will try to determine the appropriate number of threads. You are right that we were probably oversubscribing the storage and just recently became overloaded with the number of Gaussian jobs running. Is it typical for a kernel panic in this situation? Thanks, Denise On Thu, 2008-12-04 at 01:27 -0800, lustre-discuss-request at lists.lustre.org wrote:
Brian J. Murrell
2008-Dec-04 14:51 UTC
[Lustre-discuss] Lustre-discuss Digest, Vol 35, Issue 5
On Thu, 2008-12-04 at 07:40 -0700, Denise Hummel wrote:> Hi Brian;Hi.> Thanks for the advice.NP.> The messages you saw were immediately prior to > the kernel panicThere was no kernel panic in the messages you sent. You need to understand that watch dog timeouts are not kernel panics although they do show a stack trace similar to kernel panics. If you do have an actual kernel panic, it was not included in the messages you sent.> I did do a baseline, so will try to determine the appropriate number of > threads. You are right that we were probably oversubscribing the > storage and just recently became overloaded with the number of Gaussian > jobs running. > Is it typical for a kernel panic in this situation?As I have said before, there was no kernel panic. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081204/5e4a1ca4/attachment.bin