Jim Klimov
2011-May-22 15:31 UTC
[zfs-discuss] iSCSI pool timeouts during high latency moments
Hi all, As I wrote before, I have a "dpcool" implemented as an iSCSI device stored in a volume of my physical "pool". When there are many operations, such as attempts to destroy a dataset (which leads to many small IOs in my config), the iSCSI device is 100% busy for hours, latencies can grow to seconds even with one task in queue max, and I believe this may lead to the following problem - the device is considered timed out and reportedly faulted: ==# zpool status dcpool pool: dcpool state: ONLINE status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-HC scan: resilvered 111M in 0h5m with 0 errors on Wed May 18 04:03:02 2011 config: NAME STATE READ WRITE CKSUM dcpool ONLINE 7 129K 0 c0t600144F09844CF0000004D8376AE0002d0 ONLINE 0 397K 0 cache c4t1d0p7 ONLINE 0 0 0 errors: 132410 data errors, use ''-v'' for a list == As we can see above, there are no devices actually marked as FAULTED, except that the status says so. At times I had dmesg reporting the failure as well, but not always, like this: ==May 22 12:51:32 bofh-sol scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): May 22 12:51:32 bofh-sol /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): Command Timeout on path iscsi0/disk at 0000iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a40001,0 May 22 12:51:32 bofh-sol iscsi: [ID 431120 kern.warning] WARNING: iscsi connection(7/3f) closing connection - target requested reason:0x7 May 22 12:51:32 bofh-sol iscsi: [ID 732673 kern.info] NOTICE: iscsi session(6) iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a4 offline May 22 12:51:39 bofh-sol scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): May 22 12:51:39 bofh-sol drive offline May 22 12:51:54 bofh-sol scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): May 22 12:51:54 bofh-sol drive offline May 22 12:52:00 bofh-sol scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): ... May 22 13:04:35 bofh-sol scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): May 22 13:04:35 bofh-sol drive offline May 22 13:04:45 bofh-sol iscsi: [ID 559844 kern.info] NOTICE: iscsi session(6) iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a4 online May 22 13:04:45 bofh-sol genunix: [ID 483743 kern.info] /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10) multipath status: degraded: path 1 iscsi0/disk at 0000iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a40001,0 is online May 22 13:04:46 bofh-sol fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major May 22 13:04:46 bofh-sol EVENT-TIME: Sun May 22 13:04:46 MSD 2011 May 22 13:04:46 bofh-sol PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: bofh-sol May 22 13:04:46 bofh-sol SOURCE: zfs-diagnosis, REV: 1.0 May 22 13:04:46 bofh-sol EVENT-ID: 435a2ccd-e710-cd37-a18e-a46743002e56 May 22 13:04:46 bofh-sol DESC: The ZFS pool has experienced currently unrecoverable I/O May 22 13:04:46 bofh-sol failures. Refer to http://sun.com/msg/ZFS-8000-HC for more information. May 22 13:04:46 bofh-sol AUTO-RESPONSE: No automated response will be taken. May 22 13:04:46 bofh-sol IMPACT: Read and write I/Os cannot be serviced. May 22 13:04:46 bofh-sol REC-ACTION: Make sure the affected devices are connected, then run May 22 13:04:46 bofh-sol ''zpool clear''. == The zillions of errors are in ZFS metadata as seen in "zpool status -v", and this condition can be cleared by forced reboot (uadmin 1 8) and subsequent import of the "dcpool", when after a few hours it comes online with deferred-free blocks deleted. Neither "zpool export" nor "zpool clear" help in this situation, and may instead lead to a hang of the system (no IOs would succeed, even to other pools). I''ve tried zpool attributes failmode=wait and =continue, with same results. I am not sure what to make of the two reports (in dmesg and zpool status) - possibly when the system bottleneck clears, the iSCSI device becomes visible again - but IOs are no longer served. I did not manage to restart iSCSI target and/or initiator SMF services, because in my case importing/mounting of dcpool is wrapped in a service which depends on the iSCSI services, and umounting/exporting hangs - but I''d keep trying somehow (suggestions welcome, so far I think of breaking the dependency and/or disabling my dcpool-mount service). While this particular system is at home and I can afford to reboot it, I do wonder what should be properly done in production environments? I.e. is it possible to multipath iSCSI instances on different local IP addresses (i.e. localhost and NIC IP) - but that would probably fail at the same bottleneck moment - or to connect to the zvol/rdsk/... directly, without iSCSI? Thanks for ideas, //Jim Klimov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110522/e61f2700/attachment.html>
Richard Elling
2011-May-22 16:39 UTC
[zfs-discuss] iSCSI pool timeouts during high latency moments
comment below... On May 22, 2011, at 8:31 AM, Jim Klimov wrote:> Hi all, > > As I wrote before, I have a "dpcool" implemented as an iSCSI > device stored in a volume of my physical "pool". When there > are many operations, such as attempts to destroy a dataset > (which leads to many small IOs in my config), the iSCSI > device is 100% busy for hours, latencies can grow to seconds > even with one task in queue max, and I believe this may lead > to the following problem - the device is considered timed out > and reportedly faulted: > > ==> # zpool status dcpool > pool: dcpool > state: ONLINE > status: One or more devices are faulted in response to IO failures. > action: Make sure the affected devices are connected, then run ''zpool clear''. > see: http://www.sun.com/msg/ZFS-8000-HC > scan: resilvered 111M in 0h5m with 0 errors on Wed May 18 04:03:02 2011 > config: > NAME STATE READ WRITE CKSUM > dcpool ONLINE 7 129K 0 > c0t600144F09844CF0000004D8376AE0002d0 ONLINE 0 397K 0 > cache > c4t1d0p7 ONLINE 0 0 0 > errors: 132410 data errors, use ''-v'' for a list > ==> > As we can see above, there are no devices actually marked as > FAULTED, except that the status says so. At times I had dmesg > reporting the failure as well, but not always, like this: > > ==> May 22 12:51:32 bofh-sol scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): > May 22 12:51:32 bofh-sol /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): Command Timeout on path iscsi0/disk at 0000iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a40001,0 > May 22 12:51:32 bofh-sol iscsi: [ID 431120 kern.warning] WARNING: iscsi connection(7/3f) closing connection - target requested reason:0x7This means that the target closed the connection because there was already a task in progress. Likely this was the retry after the timeout. By default, these timeouts are quite long, so by now performance is already terrible. I''m not sure if you can make the iscsi target any slower. I''ve seen some well-known-manufacturer''s iscsi arrays take 2-3 minutes to respond, in a repeatable manner. The best advice is to fix it or replace it. I don''t think there is anything you can do on the initiator side to get to a state of happiness. -- richard> May 22 12:51:32 bofh-sol iscsi: [ID 732673 kern.info] NOTICE: iscsi session(6) iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a4 offline > May 22 12:51:39 bofh-sol scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): > May 22 12:51:39 bofh-sol drive offline > May 22 12:51:54 bofh-sol scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): > May 22 12:51:54 bofh-sol drive offline > May 22 12:52:00 bofh-sol scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): > ... > May 22 13:04:35 bofh-sol scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10): > May 22 13:04:35 bofh-sol drive offline > May 22 13:04:45 bofh-sol iscsi: [ID 559844 kern.info] NOTICE: iscsi session(6) iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a4 online > May 22 13:04:45 bofh-sol genunix: [ID 483743 kern.info] /scsi_vhci/disk at g600144f09844cf0000004d8376ae0002 (sd10) multipath status: degraded: path 1 iscsi0/disk at 0000iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a40001,0 is online > May 22 13:04:46 bofh-sol fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major > May 22 13:04:46 bofh-sol EVENT-TIME: Sun May 22 13:04:46 MSD 2011 > May 22 13:04:46 bofh-sol PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: bofh-sol > May 22 13:04:46 bofh-sol SOURCE: zfs-diagnosis, REV: 1.0 > May 22 13:04:46 bofh-sol EVENT-ID: 435a2ccd-e710-cd37-a18e-a46743002e56 > May 22 13:04:46 bofh-sol DESC: The ZFS pool has experienced currently unrecoverable I/O > May 22 13:04:46 bofh-sol failures. Refer to http://sun.com/msg/ZFS-8000-HC for more information. > May 22 13:04:46 bofh-sol AUTO-RESPONSE: No automated response will be taken. > May 22 13:04:46 bofh-sol IMPACT: Read and write I/Os cannot be serviced. > May 22 13:04:46 bofh-sol REC-ACTION: Make sure the affected devices are connected, then run > May 22 13:04:46 bofh-sol ''zpool clear''. > ==> > The zillions of errors are in ZFS metadata as seen in "zpool status -v", > and this condition can be cleared by forced reboot (uadmin 1 8) and > subsequent import of the "dcpool", when after a few hours it comes > online with deferred-free blocks deleted. > > Neither "zpool export" nor "zpool clear" help in this situation, > and may instead lead to a hang of the system (no IOs would > succeed, even to other pools). > > I''ve tried zpool attributes failmode=wait and =continue, with > same results. > > I am not sure what to make of the two reports (in dmesg and zpool > status) - possibly when the system bottleneck clears, the iSCSI > device becomes visible again - but IOs are no longer served. > I did not manage to restart iSCSI target and/or initiator SMF services, > because in my case importing/mounting of dcpool is wrapped in a > service which depends on the iSCSI services, and umounting/exporting > hangs - but I''d keep trying somehow (suggestions welcome, so far > I think of breaking the dependency and/or disabling my dcpool-mount > service). > > While this particular system is at home and I can afford to reboot it, > I do wonder what should be properly done in production environments? > > I.e. is it possible to multipath iSCSI instances on different local IP > addresses (i.e. localhost and NIC IP) - but that would probably fail > at the same bottleneck moment - or to connect to the zvol/rdsk/... > directly, without iSCSI? > > Thanks for ideas, > //Jim Klimov > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110522/8ee73408/attachment-0001.html>
Jim Klimov
2011-May-22 17:37 UTC
[zfs-discuss] iSCSI pool timeouts during high latency moments
2011-05-22 20:39, Richard Elling wrote:> > This means that the target closed the connection because there was > already a task in progress. > Likely this was the retry after the timeout. By default, these > timeouts are quite long, so by now > performance is already terrible. > > I''m not sure if you can make the iscsi target any slower. I''ve seen > some well-known-manufacturer''s > iscsi arrays take 2-3 minutes to respond, in a repeatable manner. The > best advice is to fix it or > replace it. I don''t think there is anything you can do on the > initiator side to get to a state of > happiness. > -- richard >Well, thanks for the advice. As my readers may know ;) I am now in the process of evacuating data from this lagging "dcpool" into the physical pool which contains the volume and serves iSCSI. I am surprised that playing with "failmode" achieved nothing - I hoped that the "dcpool" would continue working where it hang at the moment of loss of iSCSI device as soon as it is "found" again... Or do the errors imply that it was not found after all? So since I can''t seem to "fix" the ZFS/iSCSI server performance, I''ll continue along the path of "replacing" it with native filesystem datasets in the parent pool. The experiment taught me a lot, but the practical end result is not enjoyable for me however - with such lags and overheads for ZFS volume storage and most of all unreliable performance. Next time I''ll think twice if iSCSI (at all and via ZFS in particular) is an arguably better solution than large files over NFS (for VMs or whatever), and/or will plan better and try to stress-test first. In particular this should and would mean nearly filling up the pool with several TBs of (any) data... and such stress-testing is about what I did now before spec''ing a server for work and defining what it should and should not do in order to perform :) Maybe it is just my computer (Dual-core 2.8GHz P4 with 8Gb RAM) - but if such specs repeatably fail at running a single volume, then I''m at a loss to predict and construct machines for good stable ZFS/iSCSI performance machines ;\ Again, part of the lags is probably contributed to by dedup inside this "dcpool", and by using 4Kb data blocks along with 4Kb metadata blocks for its container volume "pool/dcpool" - which probably lead to high processing overheads (and certainly did lead to 2x storage consumption), and maybe to high fragmentation. -- +============================================================+ | | | ?????? ???????, Jim Klimov | | ??????????? ???????? CTO | | ??? "??? ? ??" JSC COS&HT | | | | +7-903-7705859 (cellular) mailto:jimklimov at cos.ru | | CC:admin at cos.ru,jimklimov at mail.ru | +============================================================+ | () ascii ribbon campaign - against html mail | | /\ - against microsoft attachments | +============================================================+