Hi, My OmniOS host is expreiencing slow zfs writes ( around 30 times slower ). iostat reports below error though pool is healthy. This is happening in past 4 days though no change was done to system. Is the hard disks faulty ? Please help. # zpool status -v root at host:~# zpool status -v pool: test state: ONLINE status: The pool is formatted using a legacy on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on software that does not support feature flags. config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 raidz1-3 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c2t14d0 ONLINE 0 0 0 c2t15d0 ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 c2t19d0 ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 c2t22d0 ONLINE 0 0 0 c2t23d0 ONLINE 0 0 0 raidz1-4 ONLINE 0 0 0 c2t24d0 ONLINE 0 0 0 c2t25d0 ONLINE 0 0 0 c2t26d0 ONLINE 0 0 0 c2t27d0 ONLINE 0 0 0 c2t28d0 ONLINE 0 0 0 c2t29d0 ONLINE 0 0 0 c2t30d0 ONLINE 0 0 0 raidz1-5 ONLINE 0 0 0 c2t31d0 ONLINE 0 0 0 c2t32d0 ONLINE 0 0 0 c2t33d0 ONLINE 0 0 0 c2t34d0 ONLINE 0 0 0 c2t35d0 ONLINE 0 0 0 c2t36d0 ONLINE 0 0 0 c2t37d0 ONLINE 0 0 0 raidz1-6 ONLINE 0 0 0 c2t38d0 ONLINE 0 0 0 c2t39d0 ONLINE 0 0 0 c2t40d0 ONLINE 0 0 0 c2t41d0 ONLINE 0 0 0 c2t42d0 ONLINE 0 0 0 c2t43d0 ONLINE 0 0 0 c2t44d0 ONLINE 0 0 0 spares c5t10d0 AVAIL c5t11d0 AVAIL c2t45d0 AVAIL c2t46d0 AVAIL c2t47d0 AVAIL # iostat -En c4t0d0 Soft Errors: 0 Hard Errors: 5 Transport Errors: 0 Vendor: iDRAC Product: Virtual CD Revision: 0323 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 5 No Device: 0 Recoverable: 0 Illegal Request: 1 Predictive Failure Analysis: 0 c3t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: iDRAC Product: LCDRIVE Revision: 0323 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t0d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: iDRAC Product: Virtual Floppy Revision: 0323 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 root at host:~# fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a ZFS-8000-HC Major Host : host Platform : PowerEdge-R810 Product_sn : Fault class : fault.fs.zfs.io_failure_wait Affects : zfs://pool=test faulted but still in service Problem in : zfs://pool=test faulted but still in service Description : The ZFS pool has experienced currently unrecoverable I/O failures. Refer to http://illumos.org/msg/ZFS-8000-HCfor more information. Response : No automated response will be taken. Impact : Read and write I/Os cannot be serviced. Action : Make sure the affected devices are connected, then run ''zpool clear''. Regards, Ram -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130211/d088bb7d/attachment.html>
> root at host:~# fmadm faulty > --------------- ------------------------------------ -------------- > --------- > TIME EVENT-ID MSG-ID SEVERITY > --------------- ------------------------------------ -------------- > --------- > Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a ZFS-8000-HC Major > Host : host > Platform : PowerEdge-R810 > Product_sn : > Fault class : fault.fs.zfs.io_failure_wait > Affects : zfs://pool=test > faulted but still in service > Problem in : zfs://pool=test > faulted but still in service > Description : The ZFS pool has experienced currently unrecoverable I/O > failures. Refer to http://illumos.org/msg/ZFS-8000-HC for > more information. > Response : No automated response will be taken. > Impact : Read and write I/Os cannot be serviced. > Action : Make sure the affected devices are connected, then run > ''zpool clear''. > --The pool looks healthy to me, but it it isn''t very well balanced. Have you been adding new VDEVs on your way to grow it? Check if of the VDEVs are fuller than others. I don''t have an OI/IllumOS system available ATM, but IIRC this can be done with iostat -v. Older versions of ZFS striped to all VDEVs regardless to fill, which slowed down the write speeds rather horribly if some VDEVs were full (>90%). This shouldn''t be the case with OmniOS, but it *may* be the case with an old zpool version. I don''t know. I''d check fill rate of the VDEVs first, then perhaps try to upgrade the zpool unless you have to be able to mount it on an older version of zpool (on S10 or similar). Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy at karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130211/e1075a3f/attachment.html>
Hi Roy, You are right. So it looks like re-distribution issue. Initially there were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which which we added 24 more disks and created additional vdevs. The initial vdevs are filled up and so write speed declined. Now how to find files that are present in a Vdev or a disk. That way I can remove and re-copy back to distribute data. Any other way to solve this ? Total capacity of pool - 98Tb Used - 44Tb Free - 54 Tb root at host:# zpool iostat -v capacity operations bandwidth pool alloc free read write read write ----------- ----- ----- ----- ----- ----- ----- test 54.0T 62.7T 52 1.12K 2.16M 5.78M raidz1 11.2T 2.41T 13 30 176K 146K c2t0d0 - - 5 18 42.1K 39.0K c2t1d0 - - 5 18 42.2K 39.0K c2t2d0 - - 5 18 42.5K 39.0K c2t3d0 - - 5 18 42.9K 39.0K c2t4d0 - - 5 18 42.6K 39.0K raidz1 13.3T 308G 13 100 213K 521K c2t5d0 - - 5 94 50.8K 135K c2t6d0 - - 5 94 51.0K 135K c2t7d0 - - 5 94 50.8K 135K c2t8d0 - - 5 94 51.1K 135K c2t9d0 - - 5 94 51.1K 135K raidz1 13.4T 19.1T 9 455 743K 2.31M c2t12d0 - - 3 137 69.6K 235K c2t13d0 - - 3 129 69.4K 227K c2t14d0 - - 3 139 69.6K 235K c2t15d0 - - 3 131 69.6K 227K c2t16d0 - - 3 141 69.6K 235K c2t17d0 - - 3 132 69.5K 227K c2t18d0 - - 3 142 69.6K 235K c2t19d0 - - 3 133 69.6K 227K c2t20d0 - - 3 143 69.6K 235K c2t21d0 - - 3 133 69.5K 227K c2t22d0 - - 3 143 69.6K 235K c2t23d0 - - 3 133 69.5K 227K raidz1 2.44T 16.6T 5 103 327K 485K c2t24d0 - - 2 48 50.8K 87.4K c2t25d0 - - 2 49 50.7K 87.4K c2t26d0 - - 2 49 50.8K 87.3K c2t27d0 - - 2 49 50.8K 87.3K c2t28d0 - - 2 49 50.8K 87.3K c2t29d0 - - 2 49 50.8K 87.3K c2t30d0 - - 2 49 50.8K 87.3K raidz1 8.18T 10.8T 5 295 374K 1.54M c2t31d0 - - 2 131 58.2K 279K c2t32d0 - - 2 131 58.1K 279K c2t33d0 - - 2 131 58.2K 279K c2t34d0 - - 2 132 58.2K 279K c2t35d0 - - 2 132 58.1K 279K c2t36d0 - - 2 133 58.3K 279K c2t37d0 - - 2 133 58.2K 279K raidz1 5.42T 13.6T 5 163 383K 823K c2t38d0 - - 2 61 59.4K 146K c2t39d0 - - 2 61 59.3K 146K c2t40d0 - - 2 61 59.4K 146K c2t41d0 - - 2 61 59.4K 146K c2t42d0 - - 2 61 59.3K 146K c2t43d0 - - 2 62 59.2K 146K c2t44d0 - - 2 62 59.3K 146K On Mon, Feb 11, 2013 at 10:23 PM, Roy Sigurd Karlsbakk <roy at karlsbakk.net>wrote:> > root at host:~# fmadm faulty > --------------- ------------------------------------ -------------- > --------- > TIME EVENT-ID MSG-ID > SEVERITY > --------------- ------------------------------------ -------------- > --------- > Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a ZFS-8000-HC > Major > > Host : host > Platform : PowerEdge-R810 > Product_sn : > > Fault class : fault.fs.zfs.io_failure_wait > Affects : zfs://pool=test > faulted but still in service > Problem in : zfs://pool=test > faulted but still in service > > Description : The ZFS pool has experienced currently unrecoverable I/O > failures. Refer to http://illumos.org/msg/ZFS-8000-HCfor > more information. > > Response : No automated response will be taken. > > Impact : Read and write I/Os cannot be serviced. > > Action : Make sure the affected devices are connected, then run > ''zpool clear''. > -- > > The pool looks healthy to me, but it it isn''t very well balanced. Have you > been adding new VDEVs on your way to grow it? Check if of the VDEVs are > fuller than others. I don''t have an OI/IllumOS system available ATM, but > IIRC this can be done with iostat -v. Older versions of ZFS striped to all > VDEVs regardless to fill, which slowed down the write speeds rather > horribly if some VDEVs were full (>90%). This shouldn''t be the case with > OmniOS, but it *may* be the case with an old zpool version. I don''t know. > > I''d check fill rate of the VDEVs first, then perhaps try to upgrade the > zpool unless you have to be able to mount it on an older version of zpool > (on S10 or similar). > > Vennlige hilsener / Best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 98013356 > roy at karlsbakk.net > http://blogg.karlsbakk.net/ > GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt > -- > I all pedagogikk er det essensielt at pensum presenteres intelligibelt. > Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv > anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller > eksisterer adekvate og relevante synonymer p? norsk. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130212/182336a8/attachment.html>
Ram Chander wrote:> > Hi Roy, > You are right. So it looks like re-distribution issue. Initially > there were two Vdev with 24 disks ( disk 0-23 ) for close to year. > After which which we added 24 more disks and created additional > vdevs. The initial vdevs are filled up and so write speed declined. > Now how to find files that are present in a Vdev or a disk. That way > I can remove and re-copy back to distribute data. Any other way to > solve this ? >The only way is to avoid the problem in the first place by not mixing vdev sizes in a pool. -- Ian.
On 2013-02-12 10:32, Ian Collins wrote:> Ram Chander wrote: >> >> Hi Roy, >> You are right. So it looks like re-distribution issue. Initially there >> were two Vdev with 24 disks ( disk 0-23 ) for close to year. After >> which which we added 24 more disks and created additional vdevs. The >> initial vdevs are filled up and so write speed declined. Now how to >> find files that are present in a Vdev or a disk. That way I can remove >> and re-copy back to distribute data. Any other way to solve this ? >> > The only way is to avoid the problem in the first place by not mixing > vdev sizes in a pool. >Well, that disbalance is there - in the zpool status printout we see raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares - which seems to sum up to 48 ;) Depending on disk size, it might be possible that tlvdev sizes in gigabytes were kept the same (i.e. a raidz set with twice as many disks of half size), but we have no info on this detail and it is unlikely. The disk sets being in one pool, this would still quite disbalance the load among spindles and IO buses. Beside all that - with the "older" tlvdev''s being more full than the "newer" ones, there is the disbalance which wouldn''t be avoided by not mixing vdev sizes - writes into newer ones are more likely to quickly find available "holes", while writes into older ones are more fragmented and longer data inspection is needed to find a hole - if not even the gang-block fragmentation. These two are, I believe, the basis for performance drop on "full" pools, with the measure being rather the mix of IO patterns and fragmentation of data and holes. I think there were developments in illumos ZFS to address more writes onto devices with more available space; I am not sure if the average write latency to a tlvdev was monitored and taken into account during write-targeting decisions (which would also wrap the case of failing devices which take longer to respond). I am not sure which portions nave been completed and integrated into common illumos-gate. As was suggested, you can use "zpool iostat -v 5" to monitor IOs to the pool with a fanout per TLVDEV and per disk, and witness possible patterns there. Do keep in mind, however, that for a non-failed raidz set you should see reads from only the data disks for a particular stripe, while parity disks are not used unless a checksum mismatch occurs. On the average data should be on all disks in such a manner that there is no "dedicated" parity disk, but with small IOs you are likely to notice this. If the budget permits, I''d suggest building (or leasing) another system with balanced disk sets and replicating all data onto it, then repurposing the older system - for example, to be a backup of the newer box (also after remaking the disk layout). As for the question of "which files are on the older disks" - you can as a rule of thumb use the file creation/modification time in comparison with the date when you expanded the pool ;) Closer inspection could be done with a ZDB walk to print out the DVA block addresses for blocks of a file (the DVA includes the number of the top-level vdev), but that would take some time - to determine which files you want to expect (likely some band of sizes) and then to do these zdb walks. Good luck, //Jim
Jim Klimov wrote:> On 2013-02-12 10:32, Ian Collins wrote: >> Ram Chander wrote: >>> Hi Roy, >>> You are right. So it looks like re-distribution issue. Initially there >>> were two Vdev with 24 disks ( disk 0-23 ) for close to year. After >>> which which we added 24 more disks and created additional vdevs. The >>> initial vdevs are filled up and so write speed declined. Now how to >>> find files that are present in a Vdev or a disk. That way I can remove >>> and re-copy back to distribute data. Any other way to solve this ? >>> >> The only way is to avoid the problem in the first place by not mixing >> vdev sizes in a pool. >>I was a bit quick off the mark there, I didn''t notice that some vdevs were older than others.> Well, that disbalance is there - in the zpool status printout we see > raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares > - which seems to sum up to 48 ;)The vdev sizes are about (including parity space) 14, 14, 22, 19, 19, 19TB respectively and 127TB total. So even if the data is balanced, the performance of this pool will still start to degrade once ~84TB (about 2/3 full) are used. So the only viable long term solution is a rebuild, or putting bigger drives in the two smallest vdevs. In the short term, when I''ve had similar issues I used zfs send to copy a large filesystem within the pool then renamed the copy to the original name and deleted the original. This can be repeated until you have an acceptable distribution. One last thing: unless this is some form of backup pool, or the data on it isn''t important, avoid raidz vdevs in such a large pool! -- Ian.