Hello everyone, I''m hoping to find an answer to my problem here. Currently I''m installing xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and 2 x 300 GB Western Digital SATA drives. Each xen server has 5 domUs, where some of them have high traffic and thus require high disk usage. On certain machines, I''m experiencing the following problem, after a while, I get this error in dom0''s dmesg: TA: abnormal status 0xD0 on port 0xD080EC87 ATA: abnormal status 0xD0 on port 0xD080EC87 ATA: abnormal status 0xD0 on port 0xD080EC87 ata1: command 0x35 timeout, stat 0x51 host_stat 0x61 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0xb ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sda, sector 449410141 also in dom0''s /var/log/messages, I have this: Sep 19 00:15:59 x8 kernel: ata1: status=0x51 { DriveReady SeekComplete Error } Sep 19 00:15:59 x8 kernel: ata1: error=0x04 { DriveStatusError } Sep 19 00:15:59 x8 kernel: SCSI error : <0 0 0 0> return code = 0x8000002 Sep 19 00:15:59 x8 kernel: end_request: I/O error, dev sda, sector 449410141 Once that happens, in some of the xenU''s I get the error message in dmesg saying: Buffer I/O error on device sda1, logical block 6848542 lost page write due to I/O error on sda1 end_request: I/O error, dev sda1, sector 55399888 And after a while the xenU instance just freezes or goes into read-only FS and I have to restart the xenU instance. The dom0 machine never locks up, it''s only the xenU''s. Does anyone have any ideas why these errors would occur? Maybe libata related? Is there a kernel problem? xen-2.0.7 comes with the 2.6.11.12 kernel, I''m not sure if there is a bug in there. Any help is appreciated! Thanks, Alex _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ernst Bachmann
2005-Sep-21 11:02 UTC
Re: [Xen-users] Major problems with 2.0.7 and SATA drives
On Wednesday 21 September 2005 12:47, Alexander wrote:> Hello everyone, > > I''m hoping to find an answer to my problem here. Currently I''m installing > xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have > Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and 2 > x 300 GB Western Digital SATA drives. Each xen server has 5 domUs, where > some of them have high traffic and thus require high disk usage. On certain > machines, I''m experiencing the following problem, after a while, I get this > error in dom0''s dmesg: > > TA: abnormal status 0xD0 on port 0xD080EC87 > ATA: abnormal status 0xD0 on port 0xD080EC87 > ATA: abnormal status 0xD0 on port 0xD080EC87 > ata1: command 0x35 timeout, stat 0x51 host_stat 0x61 > ata1: status=0x51 { DriveReady SeekComplete Error } > ata1: error=0x04 { DriveStatusError }Could you check /proc/interrupts if your SATA IRQ''s are shared with some other hardware, maybe the USB-Controller? Oh, and with your hardware, you''d maybe want to consider running xen-3 (unstable) instead of xen-2.0.7, so you can get access to all your memory (with 64 bit mode or PAE) /Ernst _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ralph Passgang
2005-Sep-21 11:14 UTC
Re: [Xen-users] Major problems with 2.0.7 and SATA drives
Hi, ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } this is usually a warning that only comes if the harddrive is really broken (and cannot read/write some blocks). I saw this message a lot in the last time and every time the diskdrive was really broken. But I never saw this in on a xen host. So maybe it''s _only_ a xen problem and not a hw problem, but you should check. To be sure you should check the harddrive with a special testing tool. All major companies that sell harddrive have their own tool. I guess WD has a similar tool anywhere on their webpage. --Ralph Am Mittwoch, 21. September 2005 13:02 schrieb Ernst Bachmann:> On Wednesday 21 September 2005 12:47, Alexander wrote: > > Hello everyone, > > > > I''m hoping to find an answer to my problem here. Currently I''m installing > > xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have > > Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and > > 2 x 300 GB Western Digital SATA drives. Each xen server has 5 domUs, > > where some of them have high traffic and thus require high disk usage. On > > certain machines, I''m experiencing the following problem, after a while, > > I get this error in dom0''s dmesg: > > > > TA: abnormal status 0xD0 on port 0xD080EC87 > > ATA: abnormal status 0xD0 on port 0xD080EC87 > > ATA: abnormal status 0xD0 on port 0xD080EC87 > > ata1: command 0x35 timeout, stat 0x51 host_stat 0x61 > > ata1: status=0x51 { DriveReady SeekComplete Error } > > ata1: error=0x04 { DriveStatusError } > > Could you check /proc/interrupts if your SATA IRQ''s are shared with some > other hardware, maybe the USB-Controller? > > Oh, and with your hardware, you''d maybe want to consider running xen-3 > (unstable) instead of xen-2.0.7, so you can get access to all your memory > (with 64 bit mode or PAE) > > /Ernst > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Steven Ellis
2005-Sep-21 11:59 UTC
Re: [Xen-users] Major problems with 2.0.7 and SATA drives
Ralph Passgang wrote:>Hi, > >ata1: status=0x51 { DriveReady SeekComplete Error } >ata1: error=0x04 { DriveStatusError } > >this is usually a warning that only comes if the harddrive is really broken >(and cannot read/write some blocks). I saw this message a lot in the last >time and every time the diskdrive was really broken. But I never saw this in >on a xen host. So maybe it''s _only_ a xen problem and not a hw problem, but >you should check. > >To be sure you should check the harddrive with a special testing tool. All >major companies that sell harddrive have their own tool. I guess WD has a >similar tool anywhere on their webpage. > >Have a try with ultimatebootcd (http://www.ultimatebootcd.com/). It contains a bootable form of most of the various manufacturer''s HD tools on one handy CD along with a bootable Linux for debugging. Amazingly useful tool. Steve _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Irqs are not being shared, I just checked: 17: 6039746 Phys-irq libata 19: 0 Phys-irq ohci_hcd:usb1, ohci_hcd:usb2 I actually only have 4 GB of memory, so pae is not necessary and we don''t want to run 64 bit yet. Any other suggestions? Ernst Bachmann writes:> On Wednesday 21 September 2005 12:47, Alexander wrote: >> Hello everyone, >> >> I''m hoping to find an answer to my problem here. Currently I''m installing >> xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have >> Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and 2 >> x 300 GB Western Digital SATA drives. Each xen server has 5 domUs, where >> some of them have high traffic and thus require high disk usage. On certain >> machines, I''m experiencing the following problem, after a while, I get this >> error in dom0''s dmesg: >> >> TA: abnormal status 0xD0 on port 0xD080EC87 >> ATA: abnormal status 0xD0 on port 0xD080EC87 >> ATA: abnormal status 0xD0 on port 0xD080EC87 >> ata1: command 0x35 timeout, stat 0x51 host_stat 0x61 >> ata1: status=0x51 { DriveReady SeekComplete Error } >> ata1: error=0x04 { DriveStatusError } > > Could you check /proc/interrupts if your SATA IRQ''s are shared with some other > hardware, maybe the USB-Controller? > > Oh, and with your hardware, you''d maybe want to consider running xen-3 > (unstable) instead of xen-2.0.7, so you can get access to all your memory > (with 64 bit mode or PAE) > > /Ernst > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ralph, checked the drive with dd, tried to read everything, also checked it with badblocks, nothing shows up. The problem is, when the error occurs, it doesn''t do it on the same sectors, it''s always different, and now I have a suspicion that it always happens when there is high throughput to the disk, it happened yesterday twice on a box when I was trying to rsync from a different machine and write on this one. Thanks, Alex Ralph Passgang writes:> Hi, > > ata1: status=0x51 { DriveReady SeekComplete Error } > ata1: error=0x04 { DriveStatusError } > > this is usually a warning that only comes if the harddrive is really broken > (and cannot read/write some blocks). I saw this message a lot in the last > time and every time the diskdrive was really broken. But I never saw this in > on a xen host. So maybe it''s _only_ a xen problem and not a hw problem, but > you should check. > > To be sure you should check the harddrive with a special testing tool. All > major companies that sell harddrive have their own tool. I guess WD has a > similar tool anywhere on their webpage. > > --Ralph > > Am Mittwoch, 21. September 2005 13:02 schrieb Ernst Bachmann: >> On Wednesday 21 September 2005 12:47, Alexander wrote: >> > Hello everyone, >> > >> > I''m hoping to find an answer to my problem here. Currently I''m installing >> > xen 2.0.7 on 30 dual opteron machines with 4GB memory. The machines have >> > Tyan K8SR as motherboard with Silicon Image 3114 chip sata controler and >> > 2 x 300 GB Western Digital SATA drives. Each xen server has 5 domUs, >> > where some of them have high traffic and thus require high disk usage. On >> > certain machines, I''m experiencing the following problem, after a while, >> > I get this error in dom0''s dmesg: >> > >> > TA: abnormal status 0xD0 on port 0xD080EC87 >> > ATA: abnormal status 0xD0 on port 0xD080EC87 >> > ATA: abnormal status 0xD0 on port 0xD080EC87 >> > ata1: command 0x35 timeout, stat 0x51 host_stat 0x61 >> > ata1: status=0x51 { DriveReady SeekComplete Error } >> > ata1: error=0x04 { DriveStatusError } >> >> Could you check /proc/interrupts if your SATA IRQ''s are shared with some >> other hardware, maybe the USB-Controller? >> >> Oh, and with your hardware, you''d maybe want to consider running xen-3 >> (unstable) instead of xen-2.0.7, so you can get access to all your memory >> (with 64 bit mode or PAE) >> >> /Ernst >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xensource.com >> http://lists.xensource.com/xen-users > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users