thr3ads.net - Lustre discuss - [Lustre-discuss] OSS load in the roof [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Brock Palen

2008-Jun-27 16:44 UTC

[Lustre-discuss] OSS load in the roof

our OSS went crazy today.  It is attached to two OST''s.

The load normally around 2-4.  Right now it is 123.

I noticed this to be the cause:

root      6748  0.0  0.0     0    0 ?        D    May27   8:57  
[ll_ost_io_123]

All of them are stuck in un-interruptible sleep.
Has anyone seen this happen before?  Is this caused by a pending disk  
failure?

I ask the disk system failure because I also see this message:

mptscsi: ioc1: attempting task abort! (sc=0000010038904c40)
scsi1 : destination target 0, lun 0
         command = Read (10) 00 75 94 40 00 00 10 00 00
mptscsi: ioc1: task abort: SUCCESS (sc=0000010038904c40)

and:

Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup- 
OST0001: slow setattr 100s
Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog  
for pid 6698 disabled after 103.1261s

Thanks

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985

Brian J. Murrell

2008-Jun-27 17:07 UTC

head link

[Lustre-discuss] OSS load in the roof

On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:> 
> All of them are stuck in un-interruptible sleep.
> Has anyone seen this happen before?  Is this caused by a pending disk  
> failure?
Well, they are certainly stuck because of some blocking I/O.  That could
be disk failure, indeed.
> mptscsi: ioc1: attempting task abort! (sc=0000010038904c40)
> scsi1 : destination target 0, lun 0
>          command = Read (10) 00 75 94 40 00 00 10 00 00
> mptscsi: ioc1: task abort: SUCCESS (sc=0000010038904c40)
That does not look like a picture of happiness, indeed, no.  You have
SCSI commands aborting.
> Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup- 
> OST0001: slow setattr 100s
> Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog  
> for pid 6698 disabled after 103.1261s
Those are just fallout from the above disk situation.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080627/a4c33f34/attachment.bin

Bernd Schubert

2008-Jun-27 17:39 UTC

head link

[Lustre-discuss] OSS load in the roof

On Fri, Jun 27, 2008 at 01:07:32PM -0400, Brian J. Murrell
wrote:> On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:
> > 
> > All of them are stuck in un-interruptible sleep.
> > Has anyone seen this happen before?  Is this caused by a pending disk
> > failure?
> 
> Well, they are certainly stuck because of some blocking I/O.  That could
> be disk failure, indeed.
> 
> > mptscsi: ioc1: attempting task abort! (sc=0000010038904c40)
> > scsi1 : destination target 0, lun 0
> >          command = Read (10) 00 75 94 40 00 00 10 00 00
> > mptscsi: ioc1: task abort: SUCCESS (sc=0000010038904c40)
> 
> That does not look like a picture of happiness, indeed, no.  You have
> SCSI commands aborting.
> 
Well, these messages are not nice of course, since the mpt error handler
got activated, but in principle a scsi device can recover then.
Unfortunately, the verbosity level of scsi makes it impossbible to
figure out what was actually the problem. Since we suffered from severe
scsi problems, I wrote quite a number of patches to improve the situation.
We now at least can understand where the problem came from and also have
a slightly improved error handling. These are presently for 2.6.22 only, 
but my plan is to sent these upstream for 2.6.28.

> > Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup- 
> > OST0001: slow setattr 100s
> > Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog  
> > for pid 6698 disabled after 103.1261s
> 
> Those are just fallout from the above disk situation.
Probably the device was offlined and actually this also should have been
printed in the logs. Brock, can you check the device status 
(cat /sys/block/sdX/device/state).

Cheers,
Bernd

Brock Palen

2008-Jun-27 17:44 UTC

head link

[Lustre-discuss] OSS load in the roof

On Jun 27, 2008, at 1:39 PM, Bernd Schubert wrote:> On Fri, Jun 27, 2008 at 01:07:32PM -0400, Brian J. Murrell wrote:
>> On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:
>>>
>>> All of them are stuck in un-interruptible sleep.
>>> Has anyone seen this happen before?  Is this caused by a pending  
>>> disk
>>> failure?
>>
>> Well, they are certainly stuck because of some blocking I/O.  That  
>> could
>> be disk failure, indeed.
>>
>>> mptscsi: ioc1: attempting task abort! (sc=0000010038904c40)
>>> scsi1 : destination target 0, lun 0
>>>          command = Read (10) 00 75 94 40 00 00 10 00 00
>>> mptscsi: ioc1: task abort: SUCCESS (sc=0000010038904c40)
>>
>> That does not look like a picture of happiness, indeed, no.  You have
>> SCSI commands aborting.
>>
>
> Well, these messages are not nice of course, since the mpt error  
> handler
> got activated, but in principle a scsi device can recover then.
> Unfortunately, the verbosity level of scsi makes it impossbible to
> figure out what was actually the problem. Since we suffered from  
> severe
> scsi problems, I wrote quite a number of patches to improve the  
> situation.
> We now at least can understand where the problem came from and also  
> have
> a slightly improved error handling. These are presently for 2.6.22  
> only,
> but my plan is to sent these upstream for 2.6.28.
>
>
>>> Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup-
>>> OST0001: slow setattr 100s
>>> Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog
>>> for pid 6698 disabled after 103.1261s
>>
>> Those are just fallout from the above disk situation.
>
> Probably the device was offlined and actually this also should have  
> been
> printed in the logs. Brock, can you check the device status
> (cat /sys/block/sdX/device/state).
IO Is still flowing from both OST''s on that OSS,

[root at nyx167 ~]# cat /sys/block/sd*/device/state
running
running

Sigh, it only needs to live till August when we install our x4500''s.
I think its safe to send a notice to users they may want to copy  
their data.
>
> Cheers,
> Bernd
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Bernd Schubert

2008-Jun-27 18:22 UTC

head link

[Lustre-discuss] OSS load in the roof

On Fri, Jun 27, 2008 at 01:44:13PM -0400, Brock Palen
wrote:> On Jun 27, 2008, at 1:39 PM, Bernd Schubert wrote:
>> On Fri, Jun 27, 2008 at 01:07:32PM -0400, Brian J. Murrell wrote:
>>> On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:
>>>>
>>>> All of them are stuck in un-interruptible sleep.
>>>> Has anyone seen this happen before?  Is this caused by a
pending
>>>> disk
>>>> failure?
>>>
>>> Well, they are certainly stuck because of some blocking I/O.  That
>>> could
>>> be disk failure, indeed.
>>>
>>>> mptscsi: ioc1: attempting task abort! (sc=0000010038904c40)
>>>> scsi1 : destination target 0, lun 0
>>>>          command = Read (10) 00 75 94 40 00 00 10 00 00
>>>> mptscsi: ioc1: task abort: SUCCESS (sc=0000010038904c40)
>>>
>>> That does not look like a picture of happiness, indeed, no.  You
have
>>> SCSI commands aborting.
>>>
>>
>> Well, these messages are not nice of course, since the mpt error  
>> handler
>> got activated, but in principle a scsi device can recover then.
>> Unfortunately, the verbosity level of scsi makes it impossbible to
>> figure out what was actually the problem. Since we suffered from  
>> severe
>> scsi problems, I wrote quite a number of patches to improve the  
>> situation.
>> We now at least can understand where the problem came from and also  
>> have
>> a slightly improved error handling. These are presently for 2.6.22  
>> only,
>> but my plan is to sent these upstream for 2.6.28.
>>
>>
>>>> Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup-
>>>> OST0001: slow setattr 100s
>>>> Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired
watchdog
>>>> for pid 6698 disabled after 103.1261s
>>>
>>> Those are just fallout from the above disk situation.
>>
>> Probably the device was offlined and actually this also should have  
>> been
>> printed in the logs. Brock, can you check the device status
>> (cat /sys/block/sdX/device/state).
>
> IO Is still flowing from both OST''s on that OSS,
>
> [root at nyx167 ~]# cat /sys/block/sd*/device/state
> running
> running
So the device recovered. Is the parallel-scsi? If so it now might run at 
a lower scsi speed level, but you should have got domain validation messages
about this (unless you are using a customized driver, which has DV disabled).


Cheers,
Bernd

Brock Palen

2008-Jun-27 18:29 UTC

head link

[Lustre-discuss] OSS load in the roof

On Jun 27, 2008, at 2:22 PM, Bernd Schubert wrote:> On Fri, Jun 27, 2008 at 01:44:13PM -0400, Brock Palen wrote:
>> On Jun 27, 2008, at 1:39 PM, Bernd Schubert wrote:
>>> On Fri, Jun 27, 2008 at 01:07:32PM -0400, Brian J. Murrell wrote:
>>>> On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:
>>>>>
>>>>> All of them are stuck in un-interruptible sleep.
>>>>> Has anyone seen this happen before?  Is this caused by a
pending
>>>>> disk
>>>>> failure?
>>>>
>>>> Well, they are certainly stuck because of some blocking I/O. 
That
>>>> could
>>>> be disk failure, indeed.
>>>>
>>>>> mptscsi: ioc1: attempting task abort! (sc=0000010038904c40)
>>>>> scsi1 : destination target 0, lun 0
>>>>>          command = Read (10) 00 75 94 40 00 00 10 00 00
>>>>> mptscsi: ioc1: task abort: SUCCESS (sc=0000010038904c40)
>>>>
>>>> That does not look like a picture of happiness, indeed, no. 
You
>>>> have
>>>> SCSI commands aborting.
>>>>
>>>
>>> Well, these messages are not nice of course, since the mpt error
>>> handler
>>> got activated, but in principle a scsi device can recover then.
>>> Unfortunately, the verbosity level of scsi makes it impossbible to
>>> figure out what was actually the problem. Since we suffered from
>>> severe
>>> scsi problems, I wrote quite a number of patches to improve the
>>> situation.
>>> We now at least can understand where the problem came from and also
>>> have
>>> a slightly improved error handling. These are presently for 2.6.22
>>> only,
>>> but my plan is to sent these upstream for 2.6.28.
>>>
>>>
>>>>> Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr())
nobackup-
>>>>> OST0001: slow setattr 100s
>>>>> Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired
watchdog
>>>>> for pid 6698 disabled after 103.1261s
>>>>
>>>> Those are just fallout from the above disk situation.
>>>
>>> Probably the device was offlined and actually this also should have
>>> been
>>> printed in the logs. Brock, can you check the device status
>>> (cat /sys/block/sdX/device/state).
>>
>> IO Is still flowing from both OST''s on that OSS,
>>
>> [root at nyx167 ~]# cat /sys/block/sd*/device/state
>> running
>> running
>
> So the device recovered. Is the parallel-scsi? If so it now might  
> run at
> a lower scsi speed level, but you should have got domain validation  
> messages
> about this (unless you are using a customized driver, which has DV  
> disabled).
Its Fibre Channel for the medium. Direct connected (no loop or  
switch)  So I am not sure,  the driver is the stock one with RHEL4.
>
>
> Cheers,
> Bernd
>
>

Brock Palen

2008-Jun-27 20:17 UTC

head link

[Lustre-discuss] OSS load in the roof

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jun 27, 2008, at 1:07 PM, Brian J. Murrell wrote:> On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:
>>
>> All of them are stuck in un-interruptible sleep.
>> Has anyone seen this happen before?  Is this caused by a pending disk
>> failure?
>
> Well, they are certainly stuck because of some blocking I/O.  That  
> could
> be disk failure, indeed.
>
>> mptscsi: ioc1: attempting task abort! (sc=0000010038904c40)
>> scsi1 : destination target 0, lun 0
>>          command = Read (10) 00 75 94 40 00 00 10 00 00
>> mptscsi: ioc1: task abort: SUCCESS (sc=0000010038904c40)
>
> That does not look like a picture of happiness, indeed, no.  You have
> SCSI commands aborting.
While the array was reporting no problems one of the disk was really  
lagging the others. We have swapped it out.  Thanks for the feedback  
everyone.
>
>> Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup-
>> OST0001: slow setattr 100s
>> Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog
>> for pid 6698 disabled after 103.1261s
>
> Those are just fallout from the above disk situation.
>
> b.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFIZUq/MFCQB4Bvz5QRAvacAJ9jkhi+2KgfbJ7bUI/KfHJ0Hnq1wQCeNgHO
d6+tzscwCqwYtuHXmzT2kFI=5p1N
-----END PGP SIGNATURE-----

Bernd Schubert

2008-Jun-27 20:23 UTC

head link

[Lustre-discuss] OSS load in the roof

On Fri, Jun 27, 2008 at 02:29:24PM -0400, Brock Palen
wrote:> On Jun 27, 2008, at 2:22 PM, Bernd Schubert wrote:
>> On Fri, Jun 27, 2008 at 01:44:13PM -0400, Brock Palen wrote:
>>> On Jun 27, 2008, at 1:39 PM, Bernd Schubert wrote:
>>>> On Fri, Jun 27, 2008 at 01:07:32PM -0400, Brian J. Murrell
wrote:
>>>>> On Fri, 2008-06-27 at 12:44 -0400, Brock Palen wrote:
>>>>>>
>>>>>> All of them are stuck in un-interruptible sleep.
>>>>>> Has anyone seen this happen before?  Is this caused by
a pending
>>>>>> disk
>>>>>> failure?
>>>>>
>>>>> Well, they are certainly stuck because of some blocking
I/O.  That
>>>>> could
>>>>> be disk failure, indeed.
>>>>>
>>>>>> mptscsi: ioc1: attempting task abort!
(sc=0000010038904c40)
>>>>>> scsi1 : destination target 0, lun 0
>>>>>>          command = Read (10) 00 75 94 40 00 00 10 00 00
>>>>>> mptscsi: ioc1: task abort: SUCCESS
(sc=0000010038904c40)
>>>>>
>>>>> That does not look like a picture of happiness, indeed, no.
You
>>>>> have
>>>>> SCSI commands aborting.
>>>>>
>>>>
>>>> Well, these messages are not nice of course, since the mpt
error
>>>> handler
>>>> got activated, but in principle a scsi device can recover then.
>>>> Unfortunately, the verbosity level of scsi makes it impossbible
to
>>>> figure out what was actually the problem. Since we suffered
from
>>>> severe
>>>> scsi problems, I wrote quite a number of patches to improve the
>>>> situation.
>>>> We now at least can understand where the problem came from and
also
>>>> have
>>>> a slightly improved error handling. These are presently for
2.6.22
>>>> only,
>>>> but my plan is to sent these upstream for 2.6.28.
>>>>
>>>>
>>>>>> Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr())
nobackup-
>>>>>> OST0001: slow setattr 100s
>>>>>> Lustre: 6698:0:(watchdog.c:312:lcw_update_time())
Expired watchdog
>>>>>> for pid 6698 disabled after 103.1261s
>>>>>
>>>>> Those are just fallout from the above disk situation.
>>>>
>>>> Probably the device was offlined and actually this also should
have
>>>> been
>>>> printed in the logs. Brock, can you check the device status
>>>> (cat /sys/block/sdX/device/state).
>>>
>>> IO Is still flowing from both OST''s on that OSS,
>>>
>>> [root at nyx167 ~]# cat /sys/block/sd*/device/state
>>> running
>>> running
>>
>> So the device recovered. Is the parallel-scsi? If so it now might run 
>> at
>> a lower scsi speed level, but you should have got domain validation  
>> messages
>> about this (unless you are using a customized driver, which has DV  
>> disabled).
>
> Its Fibre Channel for the medium. Direct connected (no loop or switch)  
> So I am not sure,  the driver is the stock one with RHEL4.
>
Ok, quite different then. I only have very little experience with FC, so no 
idea what''s wrong with your system now.


Cheers,
Bernd

Lustre discuss - Jun 2008 - OSS load in the roof

[Lustre-discuss] OSS load in the roof

[Lustre-discuss] OSS load in the roof

[Lustre-discuss] OSS load in the roof

[Lustre-discuss] OSS load in the roof

[Lustre-discuss] OSS load in the roof

[Lustre-discuss] OSS load in the roof

[Lustre-discuss] OSS load in the roof

[Lustre-discuss] OSS load in the roof