Looks like it''s time for the "Dilger Procedure", yes?
See http://wiki.hpc.ufl.edu/index.php/Lustre
At least, it sounds like the same thing we encountered and this worked  
for us.
Charlie Taylor
UF HPC Center
On Sep 15, 2008, at 10:52 AM, Dan wrote:
> Hi,
>
> One of my OSSs crashed last week.  All OSTs on it recover and mount  
> except one that causes a kernel panic when it starts replaying  
> (after waiting for clients to connect).  I fscked it this weekend  
> and found no errors but still it panics the system.
>
> My only idea on how to fix it was to run lctl --device 12  
> abort_recovery.  The instant you run this it causes a kernel panic.   
> Somethings not right about the replay info I guess.  I''ve brought
up
> the other OSTs and deactivated them on the MGS/MDT but I cannot get  
> the clients to mount anyway.  Do I need to deactivate it on the  
> clients too?
>
> Help!
>
> Dan
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss