Hi, I have this ZFS server up for about 27 days, and about 3 weeks ago (was not really paying attention) it turns out it lost its SSD that I'm using for log and cache. There is also a poor and lonely memory stick for log. So the box did not really suffer file loss. system is running: FreeBSD zfs.digiware.nl 8.2-STABLE FreeBSD 8.2-STABLE #58: Thu Nov 17 09:43:46 CET 2011 root@zfs.digiware.nl:/home/obj/usr/src/src8/src/sys/ZFS amd64 more info like dmesg, pciconf, kernconf, zpool iostat at: http://www.tegenbosch28.nl/FreeBSD/systems/ZFS/ But it is weird to just lose a SSD from the bus. And it has happened before. And you can see that AHCI really banged on the frontdoor... The device is a Corsair 60Gb Force GT. And thusfar I have not found any suggestions that that serie of devices is prone to doing this. It was a real dead device, the only way to get it back: powercycle the device by pulling it, and stick it back then camcontrol rescan I've now upgrade it to a 120Gb Corsair, to see if that has the same problem. Other FreeBSD-ers have like problems? Regards, --WjW Jan 7 10:04:24 zfs kernel: ahcich3: Timeout on slot 27 port 0 Jan 7 10:04:24 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 38000000 rs 38000000 tfd c0 serr 00000000 cmd 0004dd17 Jan 7 10:04:56 zfs kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 7 10:05:26 zfs kernel: ahcich3: Timeout on slot 29 port 0 Jan 7 10:05:26 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 80 serr 00000000 cmd 0004dd17 Jan 7 10:05:57 zfs kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 7 10:06:27 zfs kernel: ahcich3: Timeout on slot 29 port 0 Jan 7 10:06:27 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 80 serr 00000000 cmd 0004dd17 Jan 7 10:06:27 zfs kernel: (ada2:ahcich3:0:0:0): lost device Jan 7 10:06:58 zfs kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 7 10:07:28 zfs kernel: ahcich3: Timeout on slot 29 port 0 Jan 7 10:07:28 zfs kernel: ahcich3: is 00000000 cs e0000000 ss e0000000 rs e0000000 tfd 80 serr 00000000 cmd 0004dd17 Jan 7 10:08:16 zfs kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 7 10:08:16 zfs kernel: ahcich3: Poll timeout on slot 31 port 0 Jan 7 10:08:16 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 80 serr 00000000 cmd 0004df17 Jan 7 10:08:46 zfs kernel: ahcich3: Timeout on slot 31 port 0 Jan 7 10:08:46 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 80 serr 00000000 cmd 0004df17 Jan 7 10:08:48 zfs kernel: (ada2:ahcich3:0:0:0): removing device entry Jan 7 10:09:33 zfs kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 7 10:09:33 zfs kernel: ahcich3: Poll timeout on slot 31 port 0 Jan 7 10:09:33 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 80 serr 00000000 cmd 0004df17
I had the (probably) same problem with a Crucial SSD with old firmware. With my problem, the mps driver or mpslsi driver logs the timeouts rather than AHCI. With new disk firmware, it works fine so far (about 2-3 weeks). Here is my forum thread http://forums.freebsd.org/showthread.php?t=28252 I can cause a very similar error by hot pulling the disk. After putting the disk back in, I can't use the disk until rebooting. (And I reproduced the same problem with a Seagate Green spinning disk). After the firmware upgrade, this test passes, so I put it back to work in the machine. Here is a thread with a similar problem with an OCZ Vertex 3 http://forums.freebsd.org/showthread.php?t=27128 I didn't try "camcontrol rescan". But I tried "camcontrol reset ...." which caused a kernel panic (meaning FreeBSD is most likely at least partially to blame). ;) Peter On 02/01/2012 02:40 PM, Willem Jan Withagen wrote:> Hi, > > I have this ZFS server up for about 27 days, and about 3 weeks ago (was > not really paying attention) it turns out it lost its SSD that I'm using > for log and cache. There is also a poor and lonely memory stick for log. > So the box did not really suffer file loss. > > system is running: > FreeBSD zfs.digiware.nl 8.2-STABLE FreeBSD 8.2-STABLE #58: Thu Nov 17 > 09:43:46 CET 2011 > root@zfs.digiware.nl:/home/obj/usr/src/src8/src/sys/ZFS amd64 > > more info like dmesg, pciconf, kernconf, zpool iostat at: > http://www.tegenbosch28.nl/FreeBSD/systems/ZFS/ > > But it is weird to just lose a SSD from the bus. And it has happened > before. And you can see that AHCI really banged on the frontdoor... > > The device is a Corsair 60Gb Force GT. And thusfar I have not found any > suggestions that that serie of devices is prone to doing this. > > It was a real dead device, the only way to get it back: > powercycle the device by pulling it, and stick it back > then camcontrol rescan > > I've now upgrade it to a 120Gb Corsair, to see if that has the same problem. > > Other FreeBSD-ers have like problems? > > Regards, > --WjW > > > Jan 7 10:04:24 zfs kernel: ahcich3: Timeout on slot 27 port 0 > Jan 7 10:04:24 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 38000000 > rs 38000000 tfd c0 serr 00000000 cmd 0004dd17 > Jan 7 10:04:56 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:05:26 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:05:26 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 > rs 20000000 tfd 80 serr 00000000 cmd 0004dd17 > Jan 7 10:05:57 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:06:27 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:06:27 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 > rs 20000000 tfd 80 serr 00000000 cmd 0004dd17 > Jan 7 10:06:27 zfs kernel: (ada2:ahcich3:0:0:0): lost device > Jan 7 10:06:58 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:07:28 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:07:28 zfs kernel: ahcich3: is 00000000 cs e0000000 ss e0000000 > rs e0000000 tfd 80 serr 00000000 cmd 0004dd17 > Jan 7 10:08:16 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:08:16 zfs kernel: ahcich3: Poll timeout on slot 31 port 0 > Jan 7 10:08:16 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 > rs 80000000 tfd 80 serr 00000000 cmd 0004df17 > Jan 7 10:08:46 zfs kernel: ahcich3: Timeout on slot 31 port 0 > Jan 7 10:08:46 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 > rs 80000000 tfd 80 serr 00000000 cmd 0004df17 > Jan 7 10:08:48 zfs kernel: (ada2:ahcich3:0:0:0): removing device entry > Jan 7 10:09:33 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:09:33 zfs kernel: ahcich3: Poll timeout on slot 31 port 0 > Jan 7 10:09:33 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 > rs 80000000 tfd 80 serr 00000000 cmd 0004df17 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"-- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------
On Wed, Feb 01, 2012 at 02:40:17PM +0100, Willem Jan Withagen wrote:> The device is a Corsair 60Gb Force GT. And thusfar I have not found any > suggestions that that serie of devices is prone to doing this.Can you please provide the following output when that SSD is attached to the system? You will need to install ports/sysutils/smartmontools for this (please make sure it's version 5.42 or newer). * smartctl -a /dev/whatever * smartctl -l devstat /dev/whatever * smartctl -l sataphy /dev/whatever * smartctl -l ssd /dev/whatever Thank you. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On 2012-02-01 14:40, Willem Jan Withagen wrote:> Hi, > > I have this ZFS server up for about 27 days, and about 3 weeks ago (was > not really paying attention) it turns out it lost its SSD that I'm using > for log and cache. There is also a poor and lonely memory stick for log. > So the box did not really suffer file loss. > > system is running: > FreeBSD zfs.digiware.nl 8.2-STABLE FreeBSD 8.2-STABLE #58: Thu Nov 17 > 09:43:46 CET 2011 > root@zfs.digiware.nl:/home/obj/usr/src/src8/src/sys/ZFS amd64 > > more info like dmesg, pciconf, kernconf, zpool iostat at: > http://www.tegenbosch28.nl/FreeBSD/systems/ZFS/ > > But it is weird to just lose a SSD from the bus. And it has happened > before. And you can see that AHCI really banged on the frontdoor... > > The device is a Corsair 60Gb Force GT. And thusfar I have not found any > suggestions that that serie of devices is prone to doing this. > > It was a real dead device, the only way to get it back: > powercycle the device by pulling it, and stick it back > then camcontrol rescan > > I've now upgrade it to a 120Gb Corsair, to see if that has the same problem. > > Other FreeBSD-ers have like problems? > > Regards, > --WjW > > > Jan 7 10:04:24 zfs kernel: ahcich3: Timeout on slot 27 port 0 > Jan 7 10:04:24 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 38000000 > rs 38000000 tfd c0 serr 00000000 cmd 0004dd17 > Jan 7 10:04:56 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:05:26 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:05:26 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 > rs 20000000 tfd 80 serr 00000000 cmd 0004dd17 > Jan 7 10:05:57 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:06:27 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:06:27 zfs kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 > rs 20000000 tfd 80 serr 00000000 cmd 0004dd17 > Jan 7 10:06:27 zfs kernel: (ada2:ahcich3:0:0:0): lost device > Jan 7 10:06:58 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:07:28 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:07:28 zfs kernel: ahcich3: is 00000000 cs e0000000 ss e0000000 > rs e0000000 tfd 80 serr 00000000 cmd 0004dd17 > Jan 7 10:08:16 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:08:16 zfs kernel: ahcich3: Poll timeout on slot 31 port 0 > Jan 7 10:08:16 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 > rs 80000000 tfd 80 serr 00000000 cmd 0004df17 > Jan 7 10:08:46 zfs kernel: ahcich3: Timeout on slot 31 port 0 > Jan 7 10:08:46 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 > rs 80000000 tfd 80 serr 00000000 cmd 0004df17 > Jan 7 10:08:48 zfs kernel: (ada2:ahcich3:0:0:0): removing device entry > Jan 7 10:09:33 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 00000080) > Jan 7 10:09:33 zfs kernel: ahcich3: Poll timeout on slot 31 port 0 > Jan 7 10:09:33 zfs kernel: ahcich3: is 00000000 cs 80000000 ss 00000000 > rs 80000000 tfd 80 serr 00000000 cmd 0004df17Just as a followup. I reported the above problem.... Today it occurred again. But this time I was able to find a firmware upgrade for the Corsair Force GT from 1.2 to 1.3.3 (Need Win7 to be able to upgrade....) Hopefully that helps, and it does not disconnect about every 4 weeks. Ciao, --WjW