thr3ads.net - CentOS - [CentOS] SUMMARY : Repair Filesystem prompt , after inode has illegal blocks ; qla2xxx message on reboot [Sep 2009]

If this information is useful, please help other people find it:
Share via:

McCulloch, Alan

2009-Sep-15 02:25 UTC

[CentOS] SUMMARY : Repair Filesystem prompt , after inode has illegal blocks ; qla2xxx message on reboot

hi All,

thanks for the responses.

After being dropped into the

# Filesystem repair

prompt,

(  on account of "inode 27344909 has illegal blocks" )

following warm reboot (via "reboot") after finding (SAN ) filesystem
in read-only
mode yesterday morning (possibly because of HBA fault on SAN) , I ran

fsck -r /data

(Linux version 2.6.18-92.1.18.el5 , Red Hat 4.1.2-42 , ext3 filesystem)

This took a couple of hours or so , prompting me for various changes
all of which I accepted. This appeared to complete OK, but then the
system would not boot, with the following error from the qla2xxx driver.

.
.
qla2xxx 0000:05:0d.0: Mailbox command timeout occurred. Scheduling ISP abort.
qla2xxx 0000:05:0d.0: Mailbox command timeout occurred. Scheduling ISP abort.
.
etc

However after powering down the system and cold-booting, the system was able
to boot up and mount the repaired filesystem without any obvious damage, but
with
abnormal not to mention scary looking boot messages  and ongoing warnings from
multipath.

This morning (as I sort of expected) the filesystem had dropped back down to
read-only mode, but meanwhile
the source of our woes was identified, a fibre port on the SAN controller which
was degraded but not
completely failed,  so that there had been no clean failover to the twin
controller, and therefore a degraded
virtual device was presented to the O/S, with consequence for the filesystem.

After that port and controller was quarantined, this time around I did a cold
power-off reboot
of the server , and this time there was a more normal looking boot and the
filesystem
came up normally without any repair being requested.

(My hypothesis is that in this situation - i.e. ext3 filesystem has put itself
in read-only mode -
a warm boot , via reboot, does not cleanly remount the filesystem and apply the
journal
quite like a cold power-off reboot does. I think it is likely that the lengthy
session of me answering "yes" to fsck's interactive repair, the
first time around, simply applied all of the
fixes that would automatically have been done from the journal , had I
cold-rebooted in the first place.
However that is only a hunch. But I will be making sure to do cold power-off
reboots in general, in
future.)

Another lesson is that a sophisticated system of twin SAN controllers with
failover does not protect
against a situation where a device is degrading  rather than failing completely.

Thanks again for the responses and sorry if my questions were a bit basic but I
have
been dropped  in a little out of my depth with this system.

Cheers

AMcC





======================================================================Attention:
The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
======================================================================--------------
next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.centos.org/pipermail/centos/attachments/20090915/f143082a/attachment.html>

Ross Walker

2009-Sep-15 13:25 UTC

head link

[CentOS] SUMMARY : Repair Filesystem prompt , after inode has illegal blocks ; qla2xxx message on reboot

On Sep 14, 2009, at 10:25 PM, "McCulloch, Alan" <alan.mcculloch at
agresearch.co.nz
 > wrote:
> hi All,
>
> thanks for the responses.
>
> After being dropped into the
>
> # Filesystem repair
>
> prompt,
>
> (  on account of ?inode 27344909 has illegal blocks? )
>
> following warm reboot (via ?reboot?) after finding (SAN )  
> filesystem in read-only
> mode yesterday morning (possibly because of HBA fault on SAN) , I ran
>
> fsck ?r /data
>
> (Linux version 2.6.18-92.1.18.el5 , Red Hat 4.1.2-42 , ext3  
> filesystem)
>
> This took a couple of hours or so , prompting me for various changes
> all of which I accepted. This appeared to complete OK, but then the
> system would not boot, with the following error from the qla2xxx  
> driver.
>
> .
> .
> qla2xxx 0000:05:0d.0: Mailbox command timeout occurred. Scheduling  
> ISP abort.
> qla2xxx 0000:05:0d.0: Mailbox command timeout occurred. Scheduling  
> ISP abort.
> .
> etc
>
> However after powering down the system and cold-booting, the system  
> was able
> to boot up and mount the repaired filesystem without any obvious  
> damage, but with
> abnormal not to mention scary looking boot messages  and ongoing  
> warnings from
> multipath.
>
> This morning (as I sort of expected) the filesystem had dropped back  
> down to read-only mode, but meanwhile
> the source of our woes was identified, a fibre port on the SAN  
> controller which was degraded but not
> completely failed,  so that there had been no clean failover to the  
> twin controller, and therefore a degraded
> virtual device was presented to the O/S, with consequence for the  
> filesystem.
>
> After that port and controller was quarantined, this time around I  
> did a cold power-off reboot
> of the server , and this time there was a more normal looking boot  
> and the filesystem
> came up normally without any repair being requested.
>
> (My hypothesis is that in this situation ? i.e. ext3 filesystem has  
> put itself in read-only mode ?
> a warm boot , via reboot, does not cleanly remount the filesystem  
> and apply the journal
> quite like a cold power-off reboot does. I think it is likely that  
> the lengthy
> session of me answering ?yes? to fsck?s interactive repair, the  
> first time around, simply applied all of the
> fixes that would automatically have been done from the journal , had  
> I cold-rebooted in the first place.
> However that is only a hunch. But I will be making sure to do cold  
> power-off reboots in general, in
> future.)
>
> Another lesson is that a sophisticated system of twin SAN  
> controllers with failover does not protect
> against a situation where a device is degrading  rather than failing  
> completely.
>
> Thanks again for the responses and sorry if my questions were a bit  
> basic but I have
> been dropped  in a little out of my depth with this system.
I always prefer round-robin mpath versus fail-over if possible as a  
degraded or failed path simply is not used, then there is the twice  
the bandwidth factor when both paths are working which is nice.

-Ross

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.centos.org/pipermail/centos/attachments/20090915/9eba313b/attachment.html>

Seemingly Similar Threads

Search for more maybe matching threads

CentOS - Sep 2009 - SUMMARY : Repair Filesystem prompt , after inode has illegal blocks ; qla2xxx message on reboot

[CentOS] SUMMARY : Repair Filesystem prompt , after inode has illegal blocks ; qla2xxx message on reboot

[CentOS] SUMMARY : Repair Filesystem prompt , after inode has illegal blocks ; qla2xxx message on reboot

Seemingly Similar Threads