thr3ads.net - Xen users - [Xen-users] Disk erros on one xen domain [May 2010]

If this information is useful, please help other people find it:
Share via:

Nicolas Michel

2010-May-25 06:43 UTC

[Xen-users] Disk erros on one xen domain

Hello,

I have 3 physical servers with some virtual machines on each.
When I look at dmesg on one of them I get theses errors :

*************************************************************************
[34783.559174] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[34783.559248] hda: task_in_intr: error=0x04 { AbortedCommand }
[34783.559289] ide: failed opcode was: 0xec
[121232.732355] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[121232.732413] hda: task_in_intr: error=0x04 { AbortedCommand }
[121232.732455] ide: failed opcode was: 0xec
[207708.187565] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[207708.187623] hda: task_in_intr: error=0x04 { AbortedCommand }
[207708.187664] ide: failed opcode was: 0xec
[294224.164969] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[294224.165029] hda: task_in_intr: error=0x04 { AbortedCommand }
[294224.165075] ide: failed opcode was: 0xec
[380705.378232] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[380705.378232] hda: task_in_intr: error=0x04 { AbortedCommand }
[380705.378232] ide: failed opcode was: 0xec
[467193.505658] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[467193.505717] hda: task_in_intr: error=0x04 { AbortedCommand }
[467193.505758] ide: failed opcode was: 0xec
[553683.657031] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[553683.657091] hda: task_in_intr: error=0x04 { AbortedCommand }
[553683.657132] ide: failed opcode was: 0xec
[640176.673218] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[640176.673218] hda: task_in_intr: error=0x04 { AbortedCommand }
[640176.673218] ide: failed opcode was: 0xec
[726657.593721] hda: task_in_intr: status=0x51 { DriveReady SeekComplete 
Error }
[726657.593721] hda: task_in_intr: error=0x04 { AbortedCommand }
[726657.593721] ide: failed opcode was: 0xec:
******************************************************************

You''ll see the full dmesg output in the attached file.
I found with google some comments about these errors saying that it 
means the disk is dying. But this is a relatively recent server (1 year) 
with 6 disks in RAID 10.

Since I started that server in prod, it crashed 3 times. It responds to 
pings but no ssh access (on xen domain and virtal machines either). Some 
services on virtual machines continue to respond, other don''t. The only
solution is a hard reboot.



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Nicolas Michel

2010-May-25 06:53 UTC

head link

Re: [Xen-users] Disk erros on one xen domain

I forgot to say : I''m under Debian Lenny 64 bits and using Xen shipped 
with the distro.

On 05/25/2010 08:43 AM, Nicolas Michel wrote:> Hello,
>
> I have 3 physical servers with some virtual machines on each.
> When I look at dmesg on one of them I get theses errors :
>
> *************************************************************************
> [34783.559174] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [34783.559248] hda: task_in_intr: error=0x04 { AbortedCommand }
> [34783.559289] ide: failed opcode was: 0xec
> [121232.732355] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [121232.732413] hda: task_in_intr: error=0x04 { AbortedCommand }
> [121232.732455] ide: failed opcode was: 0xec
> [207708.187565] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [207708.187623] hda: task_in_intr: error=0x04 { AbortedCommand }
> [207708.187664] ide: failed opcode was: 0xec
> [294224.164969] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [294224.165029] hda: task_in_intr: error=0x04 { AbortedCommand }
> [294224.165075] ide: failed opcode was: 0xec
> [380705.378232] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [380705.378232] hda: task_in_intr: error=0x04 { AbortedCommand }
> [380705.378232] ide: failed opcode was: 0xec
> [467193.505658] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [467193.505717] hda: task_in_intr: error=0x04 { AbortedCommand }
> [467193.505758] ide: failed opcode was: 0xec
> [553683.657031] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [553683.657091] hda: task_in_intr: error=0x04 { AbortedCommand }
> [553683.657132] ide: failed opcode was: 0xec
> [640176.673218] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [640176.673218] hda: task_in_intr: error=0x04 { AbortedCommand }
> [640176.673218] ide: failed opcode was: 0xec
> [726657.593721] hda: task_in_intr: status=0x51 { DriveReady SeekComplete
> Error }
> [726657.593721] hda: task_in_intr: error=0x04 { AbortedCommand }
> [726657.593721] ide: failed opcode was: 0xec:
> ******************************************************************
>
> You''ll see the full dmesg output in the attached file.
> I found with google some comments about these errors saying that it
> means the disk is dying. But this is a relatively recent server (1 year)
> with 6 disks in RAID 10.
>
> Since I started that server in prod, it crashed 3 times. It responds to
> pings but no ssh access (on xen domain and virtal machines either). Some
> services on virtual machines continue to respond, other don''t. The
only
> solution is a hard reboot.
>
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-May-25 07:04 UTC

head link

Re: [Xen-users] Disk erros on one xen domain

On Tue, May 25, 2010 at 1:43 PM, Nicolas Michel
<nicolas.michel@lemail.be> wrote:> I found with google some comments about these errors saying that it means
> the disk is dying. But this is a relatively recent server (1 year) with 6
> disks in RAID 10.
That doesn''t mean it will automatically guarantee to be error-free.
>
> Since I started that server in prod, it crashed 3 times. It responds to
> pings but no ssh access (on xen domain and virtal machines either). Some
> services on virtual machines continue to respond, other don''t. The
only
> solution is a hard reboot.
Does the other working machines have similar config (hardware, OS,
kernel, etc.)? If yes, then it''s hardware problem. No way around it.

There are cases when it''s not actually hardware problem, but kernel
problem (like when using opensuse 11.2 with HP smart array). In these
cases I''d try with liveCD/DVD of other distros first. This does not
seem to be case with your setup though.

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Nicolas Michel

2010-May-25 09:05 UTC

head link

Re: [Xen-users] Disk erros on one xen domain

I know RAID don''t guarantee there is no errors.

My two others physical machines that hosts each a Xen domain controller 
are not the same hardware at all but the same OS (Debian Lenny 64 bits). 
They don''t have these errors and never crashed.

You think I should try another kernel more up-to-date?

On 05/25/2010 09:04 AM, Fajar A. Nugraha wrote:> On Tue, May 25, 2010 at 1:43 PM, Nicolas Michel
> <nicolas.michel@lemail.be>  wrote:
>> I found with google some comments about these errors saying that it
means
>> the disk is dying. But this is a relatively recent server (1 year) with
6
>> disks in RAID 10.
>
> That doesn''t mean it will automatically guarantee to be
error-free.
>
>>
>> Since I started that server in prod, it crashed 3 times. It responds to
>> pings but no ssh access (on xen domain and virtal machines either).
Some
>> services on virtual machines continue to respond, other don''t.
The only
>> solution is a hard reboot.
>
> Does the other working machines have similar config (hardware, OS,
> kernel, etc.)? If yes, then it''s hardware problem. No way around
it.
>
> There are cases when it''s not actually hardware problem, but
kernel
> problem (like when using opensuse 11.2 with HP smart array). In these
> cases I''d try with liveCD/DVD of other distros first. This does
not
> seem to be case with your setup though.
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-May-25 10:26 UTC

head link

Re: [Xen-users] Disk erros on one xen domain

On Tue, May 25, 2010 at 4:05 PM, Nicolas Michel
<nicolas.michel@lemail.be> wrote:> I know RAID don''t guarantee there is no errors.
>
> My two others physical machines that hosts each a Xen domain controller are
> not the same hardware at all but the same OS (Debian Lenny 64 bits). They
> don''t have these errors and never crashed.
>
> You think I should try another kernel more up-to-date?
One thing to confirm first. Is hda the first disk? AFAIK Lenny should
come kernel 2.6.26, and newer kernels use sda instead of hda.

If it is the first disk, I''d start with picking a live CD/DVD of a
distro with recent kernel. Ubuntu Lucid would do. Boot it, and do
something like

dd if=/dev/sda of=/dev/null bs=16M

... which basically reads all the disk contents. See whether it can
complete without errors. If yes, then I''d try to compile newer kernel
for this server. Possibly 2.6.29 or 2.6.31 (since 2.6.32 needs Xen 4.0
to run correctly). If it shows read errors though, you''d know for sure
that it''s hardware problem.

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Nicolas Michel

2010-May-25 12:10 UTC

head link

Re: [Xen-users] Disk erros on one xen domain

Thank you for your help. I just looked at my exact kernel version :
    2.6.26-2-xen-amd64
It is Lenny:
    ~# lsb_release -a
    No LSB modules are available.
    Distributor ID:	Debian
    Description:	Debian GNU/Linux 5.0.2 (lenny)
    Release:	5.0.2
    Codename:	lenny

I''ll try your test but I don''t know when because this server
is in prod
for the moment. Maybe that WE.

Thank you,

On 05/25/2010 12:26 PM, Fajar A. Nugraha wrote:> On Tue, May 25, 2010 at 4:05 PM, Nicolas Michel
> <nicolas.michel@lemail.be>  wrote:
>> I know RAID don''t guarantee there is no errors.
>>
>> My two others physical machines that hosts each a Xen domain controller
are
>> not the same hardware at all but the same OS (Debian Lenny 64 bits).
They
>> don''t have these errors and never crashed.
>>
>> You think I should try another kernel more up-to-date?
>
> One thing to confirm first. Is hda the first disk? AFAIK Lenny should
> come kernel 2.6.26, and newer kernels use sda instead of hda.
>
> If it is the first disk, I''d start with picking a live CD/DVD of a
> distro with recent kernel. Ubuntu Lucid would do. Boot it, and do
> something like
>
> dd if=/dev/sda of=/dev/null bs=16M
>
> ... which basically reads all the disk contents. See whether it can
> complete without errors. If yes, then I''d try to compile newer
kernel
> for this server. Possibly 2.6.29 or 2.6.31 (since 2.6.32 needs Xen 4.0
> to run correctly). If it shows read errors though, you''d know for
sure
> that it''s hardware problem.
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - May 2010 - Disk erros on one xen domain

[Xen-users] Disk erros on one xen domain

Re: [Xen-users] Disk erros on one xen domain

Re: [Xen-users] Disk erros on one xen domain

Re: [Xen-users] Disk erros on one xen domain

Re: [Xen-users] Disk erros on one xen domain

Re: [Xen-users] Disk erros on one xen domain