thr3ads.net - Xen users - [Xen-users] [SPAM] Problem with the DomU crash after one part of raid1 reported an read error [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Artur Linhart - Linux communication

2007-Nov-15 15:37 UTC

[Xen-users] [SPAM] Problem with the DomU crash after one part of raid1 reported an read error

Hello,

 

            I have following problem on our server, running W2K3SRVx64 DomUs
over Debian Etch under Xen 3.1.0:

 

            There is following storage configuration:

omega:~# cat /proc/mdstat

Personalities : [raid1]

md3 : active raid1 sdc2[0] sde2[2](S) sdd2[1]

      488287552 blocks [2/2] [UU]

md2 : active raid1 sdc1[0] sde1[2](S) sdd1[1]

      96256 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]

      488287552 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]

      96256 blocks [2/2] [UU]

 

-          and the arrays md1-md3 are used in the volume group for the
LVM-managed logical volumes, used as block devices for the virtual
instances.

 

Today I encountered the problem with one physical disk connected into the
raid array, what caused the crash of one virtual domain - the output from
kern.log looks like:

 

Nov 15 14:14:19 omega kernel: sd 0:0:1:0: SCSI error: return code 0x08000002

Nov 15 14:14:19 omega kernel: sdb: Current: sense key: Medium Error

Nov 15 14:14:19 omega kernel:     Additional sense: Unrecovered read error

Nov 15 14:14:19 omega kernel: Info fld=0x12832f4d

Nov 15 14:14:19 omega kernel: end_request: I/O error, dev sdb, sector
310587213

Nov 15 14:14:19 omega kernel: raid1: sdb2: rescheduling sector 310394432

Nov 15 14:14:19 omega kernel: raid1: sdb2: rescheduling sector 310394440

Nov 15 14:14:24 omega kernel: raid1: sda2: redirecting sector 310394432 to
another mirror

Nov 15 14:14:28 omega kernel: raid1: sda2: redirecting sector 310394440 to
another mirror

Nov 15 14:14:28 omega kernel: qemu-dm[6305]: segfault at 0000000000000000
rip 0000000000000000 rsp 0000000041000ca8 error 14

Nov 15 14:14:28 omega kernel: xenbr0: port 4(tap0) entering disabled state

Nov 15 14:14:28 omega kernel: device tap0 left promiscuous mode

Nov 15 14:14:28 omega kernel: audit(1195132468.260:16): dev=tap0 prom=0
old_prom=256 auid=4294967295

Nov 15 14:14:28 omega kernel: xenbr0: port 4(tap0) entering disabled state

 

The question is, even if the disk /dev/sdb would fail, why has the virtual
instance died with the segfault?

In xend.log there is nothing logged about this problem...

 

The given instance has been still reported with the xm list or xm top
command, but spent 0 CPU time and there was no possibility to connect to
this instance over VNC, it was also ping unreachable...

After the xm shutdown it took some time but then it has been possibly
destroyed... After the xm create the instance normally continued to work as
usually...

 

What can I do to have it running more stable? I think, there could be some
read-timeout during the operation from the given device what caused the
instance segfault has came out before the raid subsystem could take the data
from the disk mirror... I throught always the virtual instance should
survive such a problem if running from md device...

 

Any helps or advices are appreciated...

 

            With best regards

 

                        Archie



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Nov 2007 - [SPAM] Problem with the DomU crash after one part of raid1 reported an read error

[Xen-users] [SPAM] Problem with the DomU crash after one part of raid1 reported an read error