On Mar 10, 2006 16:48 -0600, Charles Wright wrote:> I have a cray xd1 with an engenio 5884 hooked upto it.I''m not sure what an engenio 5884 is... I take it this is relevant to the problem, or you wouldn''t have mentioned it?> I get lots of messages like this (20 per second or so) > LDISKFS-fs warning (device sdad1): ldiskfs_mb_new_blocks: too long > searching: got 94 want 158This message can actually be disabled, as it is mostly for diagnostics. If you are getting 20 per second, this can actually be a noticable hit to performance, since console writes are done with global IRQs disabled.> And I just got the following kernel panic with one of those messages.Likely if you are getting 20/s any message that appears on the console would seem to be "with" one of them.> Lustre on our system has when to crap to the point that we are going to > have to disable it soon if I can figure out what is going on.I''m sorry to hear this, but this is also the first time I''ve heard of any such problems. The lustre-discuss mailing list is mostly for the non-support community to keep in touch. I see Cray has also opened a support request with CFS for the oops (CFS bug 10296), so I''ll respond to that there. Regards, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
I have a cray xd1 with an engenio 5884 hooked upto it. I get lots of messages like this (20 per second or so) LDISKFS-fs warning (device sdad1): ldiskfs_mb_new_blocks: too long searching: got 94 want 158 And I just got the following kernel panic with one of those messages. Lustre on our system has when to crap to the point that we are going to have to disable it soon if I can figure out what is going on. Please help... Message: Unable to handle kernel paging request at ffffffffa0580000 RIP: <ffffffff8024f76b>{__memcpy+11} PML4 103027 PGD 105027 PMD 6a741067 PTE 0 Oops: 0002 [1] SMP CPU 0 Pid: 15626, policy: 0, comm: kjournald Tainted: G U (2.6.5_H_01_03 #38 SMP Fri Jan 27 16:05:43 PST 2006 ) RIP: 0010:[<ffffffff8024f76b>] <ffffffff8024f76b>{__memcpy+11} RSP: 0018:000001005dbb3a60 EFLAGS: 00010246 RAX: ffffffffa057fff8 RBX: ffffffffa057f9b0 RCX: 0000000000000003 RDX: 0000000000000000 RSI: 0000010063105e28 RDI: ffffffffa0580000 RBP: 0000010063105e20 R08: 0000000000000000 R09: 00000101607dfa40 R10: 00000000ffffffff R11: 0000000000000000 R12: 00000101607dfa40 R13: 0000000000000001 R14: 00000101607dfa88 R15: 0000000000000000 FS: 0000002a95b294c0(0000) GS:ffffffff80613c40(0000) knlGS:000000005556c9c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa0580000 CR3: 0000000000101000 CR4: 00000000000006e0 Process kjournald (pid: 15626, task cpu: 0, policy: 0, threadinfo 000001005dbb2000, task 000001005dbb15f0) Stack: ffffffffa0507146 000000000000128e ffffffff803f6880 000001005dbb1918 0000000000000001 00000100593ba700 ffffffff8028158f 00000000593ba700 00000101607dfa40 0000000000000000 Call Trace:<ffffffffa0507146>{:ptlrpc:llog_obd_repl_cancel+1750} <ffffffff8028158f>{elv_next_request+255} <ffffffffa01ca6b1>{:obdclass:llog_cancel+1649} <ffffffff8013b3c4>{io_schedule+84} <ffffffffa062785d>{:obdfilter:filter_cancel_cookies_cb+269} <ffffffffa05ec3cc>{:fsfilt_ldiskfs:fsfilt_ldiskfs_cb_func+28} <ffffffffa05af465>{:jbd:journal_put_journal_head+149} <ffffffffa05abf74>{:jbd:journal_commit_transaction+3940} <ffffffff8013eaa0>{autoremove_wake_function+0} <ffffffff8013eaa0>{autoremove_wake_function+0} <ffffffff8013adc5>{thread_return+0} <ffffffffa05b13c7>{:jbd:kjournald+375} <ffffffff8013eaa0>{autoremove_wake_function+0} <ffffffff8013eaa0>{autoremove_wake_function+0} <ffffffffa05b1640>{:jbd:commit_timeout+0} <ffffffff801103ef>{child_rip+8} <ffffffffa05b1250>{:jbd:kjournald+0} <ffffffff801103e7>{child_rip+0} Code: f3 48 a5 89 d1 f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90 RIP <ffffffff8024f76b>{__memcpy+11} RSP <000001005dbb3a60> CR2: ffffffffa0580000 <4>LDISKFS-fs warning (device sdad1): ldiskfs_mb_new_blocks: too long searching: got 98 want 256 LDISKFS-fs warning (device sdad1): ldiskfs_mb_new_blocks: too long searching: got 94 want 158 -- Charles Wright, HPC Systems Administrator Alabama Research and Education Network Computer Sciences Corporation