thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre NOT HEALTHY [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Brock Palen

2009-Jan-13 17:08 UTC

[Lustre-discuss] Lustre NOT HEALTHY

How common is it for servers to go NOT HEALTHY?  I feel it is  
happening much more often than it should be with us.  A few times a  
month.

If this happens, we reboot the servers.  Should we do something  
else?  Maybe it depends on what the problem was?

If we should not be getting NOT HEALTHY that often, what information  
should I collect to report to CFS?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985

Cliff White

2009-Jan-14 04:09 UTC

head link

[Lustre-discuss] Lustre NOT HEALTHY

Brock Palen wrote:> How common is it for servers to go NOT HEALTHY?  I feel it is  
> happening much more often than it should be with us.  A few times a  
> month.
> It should not happen at all, in the normal case. It indicates a problem.
> If this happens, we reboot the servers.  Should we do something  
> else?  Maybe it depends on what the problem was?
Well, determining what the actual problem that caused the NOT HEALTHY 
would be quite useful, yes. I would not just reboot.

-Examine consoles of _all_ servers for any error indications
- Examine syslogs of _all_ servers for any LustreErrors or LBUG
- Check network and hardware health. Are your disks happy?
Is your network dropping packets?

Try to figure out what was happening on the cluster. Does this relate to
a specific user workload or system load condition? Can you reproduce
the situation? Does it happen at a specific time of day, time of
month?> 
> If we should not be getting NOT HEALTHY that often, what information  
> should I collect to report to CFS?
The lustre-diagnostics package is good start for general system config.
Beyond that, most of what we would need is listed above.
cliffw
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Brock Palen

2009-Jan-14 15:07 UTC

head link

[Lustre-discuss] Lustre NOT HEALTHY

Ok thanks,

It happened again last night, sooner than normal.  I will send a new  
message with the details.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Jan 13, 2009, at 11:09 PM, Cliff White wrote:
> Brock Palen wrote:
>> How common is it for servers to go NOT HEALTHY?  I feel it is   
>> happening much more often than it should be with us.  A few times  
>> a  month.
> It should not happen at all, in the normal case. It indicates a  
> problem.
>
>> If this happens, we reboot the servers.  Should we do something   
>> else?  Maybe it depends on what the problem was?
>
> Well, determining what the actual problem that caused the NOT  
> HEALTHY would be quite useful, yes. I would not just reboot.
>
> -Examine consoles of _all_ servers for any error indications
> - Examine syslogs of _all_ servers for any LustreErrors or LBUG
> - Check network and hardware health. Are your disks happy?
> Is your network dropping packets?
>
> Try to figure out what was happening on the cluster. Does this  
> relate to
> a specific user workload or system load condition? Can you reproduce
> the situation? Does it happen at a specific time of day, time of  
> month?
>> If we should not be getting NOT HEALTHY that often, what  
>> information  should I collect to report to CFS?
>
> The lustre-diagnostics package is good start for general system  
> config.
> Beyond that, most of what we would need is listed above.
> cliffw
>
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> brockp at umich.edu
>> (734)936-1985
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>

Lustre discuss - Jan 2009 - Lustre NOT HEALTHY

[Lustre-discuss] Lustre NOT HEALTHY

[Lustre-discuss] Lustre NOT HEALTHY

[Lustre-discuss] Lustre NOT HEALTHY