Hello, We have some dedicated servers at layeredtech. Currently new SATA 500Gb drive was added to each server and we now noticed strange problems with all of them: all new sata disks failed under normal load (3-6 Mbit/s I/O). Engineers at data center replaced the drives but they failed again and again. Do you have any experience in working with 500Gb or more SATA drive? OS: FreeBSD 6.1-STABLE #1: Sun Aug 13 20:29:03 CDT 2006 And this is the information from dmesg of one of our servers: ad4: 476940MB <WDC WD5000KS-00MNB0 07.02E07> at ata2-master SATA150 da0 at mpt0 bus 0 target 0 lun 0 da0: <SEAGATE ST373454LW 0005> Fixed Direct Access SCSI-3 device da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted WARNING: /home was not properly dismounted WARNING: /tmp was not properly dismounted /tmp: mount pending error: blocks 0 files 4 WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted WARNING: /var/tmp was not properly dismounted WARNING: /mnt/sata was not properly dismounted ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=663885183 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=663885183 ad4: FAILURE - WRITE_DMA48 timed out LBA=663885183 g_vfs_done():ad4s1d[WRITE(offset=339909181440, length=16384)]error = 5 ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=828848959 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=828848959 ad4: FAILURE - WRITE_DMA48 timed out LBA=828848959 g_vfs_done():ad4s1d[WRITE(offset=424370634752, length=32768)]error = 5 ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=885960859 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=885960859 ad4: FAILURE - WRITE_DMA48 timed out LBA=885960859 g_vfs_done():ad4s1d[WRITE(offset=453611927552, length=2048)]error = 5 ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=857331839 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=857331839 ad4: FAILURE - WRITE_DMA48 timed out LBA=857331839 g_vfs_done():ad4s1d[WRITE(offset=438953869312, length=16384)]error = 5 ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=862600415 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=862600415 ad4: FAILURE - WRITE_DMA48 timed out LBA=862600415 g_vfs_done():ad4s1d[WRITE(offset=441651380224, length=16384)]error = 5 ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=907763295 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=907763295 ad4: FAILURE - WRITE_DMA48 timed out LBA=907763295 g_vfs_done():ad4s1d[WRITE(offset=464774774784, length=16384)]error = 5 ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=951794815 ----- With best regards, | The Power to Serve Nguyen Tam Chinh | http://www.FreeBSD.org Loc: sp.cs.msu.su |
On Sat, Oct 28, 2006 at 10:26:25AM +0400, Nguyen Tam Chinh wrote:> We have some dedicated servers at layeredtech. Currently new SATA 500Gb > drive was added to each server and we now noticed strange problems with > all of them: all new sata disks failed under normal load (3-6 Mbit/s I/O). > Engineers at data center replaced the drives but they failed again and > again. > Do you have any experience in working with 500Gb or more SATA drive?Can you provide what ATA chipset this drive is connected to? The drive appears to be associated with ata2. The following may suffice, for example: $ dmesg | grep ata2 ata2: <ATA channel 0> on atapci1 ad4: 114473MB <Seagate ST3120827AS 3.42> at ata2-master SATA150 $ dmesg | grep atapci1 atapci1: <nVidia nForce CK804 SATA300 controller> port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xd800-0xd80f mem 0xd3002000-0xd3002fff irq 23 at device 7.0 on pci0 ata2: <ATA channel 0> on atapci1 ata3: <ATA channel 1> on atapci1 Additionally, please provide the output of `vmstat -i` to see if there's any shared interrupts with the ATA controller. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
On Sat, Oct 28, 2006 at 03:12:18PM -0400, Mike Jakubik wrote:> Nguyen Tam Chinh wrote: > >ad4: 476940MB <WDC WD5000KS-00MNB0 06.02E06> at ata2-master SATA150 > >%dmesg | grep atapci0 > >atapci0: <SiI 3112 SATA150 controller> port > >0xdc00-0xdc07,0xd480-0xd483,0xd400-0xd407,0xd080-0xd083,0xd000-0xd00f > >mem 0xff8fec00-0xff8fedff irq > > Unfortunately you have a broken chipset, i would not recommend you use > the Sil 3112 for production. However, it may still be the cable, so try > replacing the cable first.I concur with this statement. The Silicon Image 3112 is incredibly buggy (regardless of OS), and should be avoided. Do these systems have a native SATA controller of some kind, such as the Intel ICH5/6/7/8 or nVidia nForce? For your sake, I hope so. If so, use them instead. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |