Hello everyone,
I''m hoping to find an answer to my problem here. Currently I''m
installing
xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have
Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and 2 x
300 GB Western Digital SATA drives. Each xen server has 5 domUs, where some
of them have high traffic and thus require high disk usage. On certain
machines, I''m experiencing the following problem, after a while, I get
this
error in dom0''s dmesg:
TA: abnormal status 0xD0 on port 0xD080EC87
ATA: abnormal status 0xD0 on port 0xD080EC87
ATA: abnormal status 0xD0 on port 0xD080EC87
ata1: command 0x35 timeout, stat 0x51 host_stat 0x61
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0xb
ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sda, sector 449410141
also in dom0''s /var/log/messages, I have this:
Sep 19 00:15:59 x8 kernel: ata1: status=0x51 { DriveReady SeekComplete Error
}
Sep 19 00:15:59 x8 kernel: ata1: error=0x04 { DriveStatusError }
Sep 19 00:15:59 x8 kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Sep 19 00:15:59 x8 kernel: end_request: I/O error, dev sda, sector 449410141
Once that happens, in some of the xenU''s I get the error message in
dmesg
saying:
Buffer I/O error on device sda1, logical block 6848542
lost page write due to I/O error on sda1
end_request: I/O error, dev sda1, sector 55399888
And after a while the xenU instance just freezes or goes into read-only FS
and I have to restart the xenU instance. The dom0 machine never locks up,
it''s only the xenU''s.
Does anyone have any ideas why these errors would occur? Maybe libata
related? Is there a kernel problem? xen-2.0.7 comes with the 2.6.11.12
kernel, I''m not sure if there is a bug in there.
Any help is appreciated!
Thanks,
Alex
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Ernst Bachmann
2005-Sep-21 11:02 UTC
Re: [Xen-users] Major problems with 2.0.7 and SATA drives
On Wednesday 21 September 2005 12:47, Alexander wrote:> Hello everyone, > > I''m hoping to find an answer to my problem here. Currently I''m installing > xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have > Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and 2 > x 300 GB Western Digital SATA drives. Each xen server has 5 domUs, where > some of them have high traffic and thus require high disk usage. On certain > machines, I''m experiencing the following problem, after a while, I get this > error in dom0''s dmesg: > > TA: abnormal status 0xD0 on port 0xD080EC87 > ATA: abnormal status 0xD0 on port 0xD080EC87 > ATA: abnormal status 0xD0 on port 0xD080EC87 > ata1: command 0x35 timeout, stat 0x51 host_stat 0x61 > ata1: status=0x51 { DriveReady SeekComplete Error } > ata1: error=0x04 { DriveStatusError }Could you check /proc/interrupts if your SATA IRQ''s are shared with some other hardware, maybe the USB-Controller? Oh, and with your hardware, you''d maybe want to consider running xen-3 (unstable) instead of xen-2.0.7, so you can get access to all your memory (with 64 bit mode or PAE) /Ernst _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ralph Passgang
2005-Sep-21 11:14 UTC
Re: [Xen-users] Major problems with 2.0.7 and SATA drives
Hi,
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
this is usually a warning that only comes if the harddrive is really broken
(and cannot read/write some blocks). I saw this message a lot in the last
time and every time the diskdrive was really broken. But I never saw this in
on a xen host. So maybe it''s _only_ a xen problem and not a hw problem,
but
you should check.
To be sure you should check the harddrive with a special testing tool. All
major companies that sell harddrive have their own tool. I guess WD has a
similar tool anywhere on their webpage.
--Ralph
Am Mittwoch, 21. September 2005 13:02 schrieb Ernst
Bachmann:> On Wednesday 21 September 2005 12:47, Alexander wrote:
> > Hello everyone,
> >
> > I''m hoping to find an answer to my problem here. Currently
I''m installing
> > xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines
have
> > Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler
and
> > 2 x 300 GB Western Digital SATA drives. Each xen server has 5 domUs,
> > where some of them have high traffic and thus require high disk usage.
On
> > certain machines, I''m experiencing the following problem,
after a while,
> > I get this error in dom0''s dmesg:
> >
> > TA: abnormal status 0xD0 on port 0xD080EC87
> > ATA: abnormal status 0xD0 on port 0xD080EC87
> > ATA: abnormal status 0xD0 on port 0xD080EC87
> > ata1: command 0x35 timeout, stat 0x51 host_stat 0x61
> > ata1: status=0x51 { DriveReady SeekComplete Error }
> > ata1: error=0x04 { DriveStatusError }
>
> Could you check /proc/interrupts if your SATA IRQ''s are shared
with some
> other hardware, maybe the USB-Controller?
>
> Oh, and with your hardware, you''d maybe want to consider running
xen-3
> (unstable) instead of xen-2.0.7, so you can get access to all your memory
> (with 64 bit mode or PAE)
>
> /Ernst
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Steven Ellis
2005-Sep-21 11:59 UTC
Re: [Xen-users] Major problems with 2.0.7 and SATA drives
Ralph Passgang wrote:>Hi, > >ata1: status=0x51 { DriveReady SeekComplete Error } >ata1: error=0x04 { DriveStatusError } > >this is usually a warning that only comes if the harddrive is really broken >(and cannot read/write some blocks). I saw this message a lot in the last >time and every time the diskdrive was really broken. But I never saw this in >on a xen host. So maybe it''s _only_ a xen problem and not a hw problem, but >you should check. > >To be sure you should check the harddrive with a special testing tool. All >major companies that sell harddrive have their own tool. I guess WD has a >similar tool anywhere on their webpage. > >Have a try with ultimatebootcd (http://www.ultimatebootcd.com/). It contains a bootable form of most of the various manufacturer''s HD tools on one handy CD along with a bootable Linux for debugging. Amazingly useful tool. Steve _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Irqs are not being shared, I just checked: 17: 6039746 Phys-irq libata 19: 0 Phys-irq ohci_hcd:usb1, ohci_hcd:usb2 I actually only have 4 GB of memory, so pae is not necessary and we don''t want to run 64 bit yet. Any other suggestions? Ernst Bachmann writes:> On Wednesday 21 September 2005 12:47, Alexander wrote: >> Hello everyone, >> >> I''m hoping to find an answer to my problem here. Currently I''m installing >> xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have >> Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and 2 >> x 300 GB Western Digital SATA drives. Each xen server has 5 domUs, where >> some of them have high traffic and thus require high disk usage. On certain >> machines, I''m experiencing the following problem, after a while, I get this >> error in dom0''s dmesg: >> >> TA: abnormal status 0xD0 on port 0xD080EC87 >> ATA: abnormal status 0xD0 on port 0xD080EC87 >> ATA: abnormal status 0xD0 on port 0xD080EC87 >> ata1: command 0x35 timeout, stat 0x51 host_stat 0x61 >> ata1: status=0x51 { DriveReady SeekComplete Error } >> ata1: error=0x04 { DriveStatusError } > > Could you check /proc/interrupts if your SATA IRQ''s are shared with some other > hardware, maybe the USB-Controller? > > Oh, and with your hardware, you''d maybe want to consider running xen-3 > (unstable) instead of xen-2.0.7, so you can get access to all your memory > (with 64 bit mode or PAE) > > /Ernst > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ralph, checked the drive with dd, tried to read everything, also checked it with badblocks, nothing shows up. The problem is, when the error occurs, it doesn''t do it on the same sectors, it''s always different, and now I have a suspicion that it always happens when there is high throughput to the disk, it happened yesterday twice on a box when I was trying to rsync from a different machine and write on this one. Thanks, Alex Ralph Passgang writes:> Hi, > > ata1: status=0x51 { DriveReady SeekComplete Error } > ata1: error=0x04 { DriveStatusError } > > this is usually a warning that only comes if the harddrive is really broken > (and cannot read/write some blocks). I saw this message a lot in the last > time and every time the diskdrive was really broken. But I never saw this in > on a xen host. So maybe it''s _only_ a xen problem and not a hw problem, but > you should check. > > To be sure you should check the harddrive with a special testing tool. All > major companies that sell harddrive have their own tool. I guess WD has a > similar tool anywhere on their webpage. > > --Ralph > > Am Mittwoch, 21. September 2005 13:02 schrieb Ernst Bachmann: >> On Wednesday 21 September 2005 12:47, Alexander wrote: >> > Hello everyone, >> > >> > I''m hoping to find an answer to my problem here. Currently I''m installing >> > xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have >> > Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and >> > 2 x 300 GB Western Digital SATA drives. Each xen server has 5 domUs, >> > where some of them have high traffic and thus require high disk usage. On >> > certain machines, I''m experiencing the following problem, after a while, >> > I get this error in dom0''s dmesg: >> > >> > TA: abnormal status 0xD0 on port 0xD080EC87 >> > ATA: abnormal status 0xD0 on port 0xD080EC87 >> > ATA: abnormal status 0xD0 on port 0xD080EC87 >> > ata1: command 0x35 timeout, stat 0x51 host_stat 0x61 >> > ata1: status=0x51 { DriveReady SeekComplete Error } >> > ata1: error=0x04 { DriveStatusError } >> >> Could you check /proc/interrupts if your SATA IRQ''s are shared with some >> other hardware, maybe the USB-Controller? >> >> Oh, and with your hardware, you''d maybe want to consider running xen-3 >> (unstable) instead of xen-2.0.7, so you can get access to all your memory >> (with 64 bit mode or PAE) >> >> /Ernst >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xensource.com >> http://lists.xensource.com/xen-users > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users