Hi, I''m running Opensolaris 2009.06, and I''m facing a serious performance loss with ZFS ! It''s a raidz1 pool, made of 4 x 1TB SATA disks : zfs_raid ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 In the beginning, when the pool was just created (and empty !), I had the following performances : - Read : 200 MB/s - Write : 20 MB/s (10 MB/s with compression enabled) These performances were OK at this time. However, after 2 months of production use, and a volume of data of 1TB only, the performances are near : - Read : 5 MB/s - Write : 500 KB/s !!!! The write speed is so low that is breaks any network copy (Samba or SFTP). The only solution I found to copy large files to the pool without outages is to use SFTP via Filezilla, with the activation of the bandwich limit (limit=300 KB/s) !!! In this pool, I have 18 filesystems defined : - 4 FS have a recordsize of 16KB (with a total of 100 GB of data) - 14 FS have a recordsize of 128KB (with a total of 900GB of data) There is a total of 284 snapshots on the pool, and compression is enabled. There is 3 GB of physical RAM. The usage of the pool is for daily backups, with rsync. Some big files are updated simulteanously, in different FS. So, I suspect a huge fragmentation of the files ! Or maybe..., a need of more RAM ?? Thank you for any thoughts !! Philippe -- This message posted from opensolaris.org
On Tue, 18 May 2010, Philippe wrote:> The usage of the pool is for daily backups, with rsync. Some big files are updated simulteanously, in different FS. So, I suspect a huge fragmentation of the files ! Or maybe..., a need of more RAM ??You forgot to tell us what brand/model of disks you are using, and the controller type. It seems likely that one or more of your disks are barely working from time of initial installation. Even 20 MB/s is quite slow. Use ''iostat -x 30'' with an I/O load to see if one disk is much slower than the others. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hi, The 4 disks are Western Digital ATA 1TB (one is slighlty different) : 1 x ATA-WDC WD10EACS-00D-1A01-931.51GB 3 x ATA-WDC WD10EARS-00Y-0A80-931.51GB I''ve done lots of tests (speed tests + SMART reports) with each of these 4 disk on another system (another computer, running Windows 2003 x64), and everything was fine ! The 4 disks operate well, at 50-100 MB/s (tested with Hdtune). And the access time : 14ms The controller is an LSI Logic SAS 1068-IR (MPT BIOS 6.12.00.00 - 31/10/2006) Here are some stats : 1) cp of a big file to a ZFS filesystem (128K recordsize) : ============================================================iostat -x 30 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.3 0.3 17.6 2.3 0.0 0.0 19.5 0 0 sd2 11.5 6.0 350.1 154.5 0.0 0.3 19.5 0 4 sd3 12.5 5.7 351.4 154.5 0.0 0.5 27.1 0 5 sd4 15.9 6.3 615.1 153.8 0.0 1.3 58.2 0 8 sd5 15.1 8.1 600.4 150.7 0.0 7.6 326.7 0 31 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 41.3 0.0 5289.7 0.0 0.0 1.3 31.0 0 4 sd2 4.2 24.1 214.0 1183.0 0.0 0.5 19.4 0 4 sd3 3.7 23.6 227.2 1183.0 0.0 2.1 78.5 0 12 sd4 6.6 26.4 374.2 1179.4 0.0 10.1 306.5 0 35 sd5 4.3 31.0 369.6 973.3 0.0 22.0 622.0 0 96 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 17.1 0.0 2184.6 0.0 0.0 0.5 30.6 0 2 sd2 1.6 12.3 116.4 570.9 0.0 0.6 41.3 0 3 sd3 1.6 12.1 107.6 570.9 0.0 10.3 754.7 0 33 sd4 2.1 12.6 187.1 569.4 0.0 9.4 634.7 0 28 sd5 0.4 21.7 25.6 700.6 0.0 29.5 1338.1 0 96 2) cp of a big file to a ZFS filesystem (16K recordsize) : ============================================================iostat -x 30 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.2 0.3 16.7 2.3 0.0 0.0 19.3 0 0 sd2 11.5 6.0 350.7 154.5 0.0 0.3 19.5 0 4 sd3 12.5 5.7 352.0 154.5 0.0 0.5 27.0 0 5 sd4 15.9 6.3 616.2 153.8 0.0 1.3 58.0 0 8 sd5 15.1 8.1 601.3 150.7 0.0 7.5 324.6 0 31 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 32.0 0.0 4095.9 0.0 0.0 1.0 30.8 0 3 sd2 2.0 22.4 124.2 425.0 0.0 0.1 2.3 0 2 sd3 1.9 19.4 115.9 425.0 0.0 0.6 28.7 0 14 sd4 2.3 23.6 170.9 421.8 0.0 3.2 124.7 0 15 sd5 3.2 24.5 290.6 306.6 0.0 22.5 810.5 0 94 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd2 0.0 2.0 0.0 3.0 0.0 0.0 0.7 0 0 sd3 0.1 1.1 4.3 2.0 0.0 0.0 15.9 0 1 sd4 0.1 1.4 4.3 1.9 0.0 0.0 2.9 0 0 sd5 0.2 19.8 10.7 101.8 0.0 32.1 1606.9 0 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 8.6 0.0 1096.2 0.0 0.0 0.3 29.7 0 1 sd2 0.2 4.8 10.7 267.2 0.0 0.0 7.8 0 0 sd3 0.2 5.5 6.8 268.2 0.0 0.6 107.0 0 3 sd4 0.2 9.1 11.0 265.4 0.0 6.3 678.4 0 21 sd5 0.2 21.4 6.8 104.5 0.0 31.6 1467.8 0 92 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd5 0.0 18.9 0.0 101.7 0.0 35.0 1851.6 0 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd2 0.3 5.9 15.3 279.2 0.0 0.0 6.7 0 1 sd3 0.4 5.7 23.5 279.2 0.0 1.0 161.5 0 5 sd4 0.4 11.6 23.8 275.6 0.0 11.6 964.3 0 36 sd5 0.2 20.6 13.1 107.2 0.0 30.2 1452.7 0 99 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd5 0.0 20.1 0.0 105.7 0.0 35.0 1741.2 0 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd2 0.1 4.3 6.4 196.4 0.0 0.1 13.8 0 1 sd3 0.2 4.0 10.7 196.4 0.0 0.1 15.9 0 1 sd4 0.3 5.5 17.1 195.6 0.0 0.2 31.3 0 3 sd5 0.3 22.5 17.1 127.4 0.0 31.4 1378.4 0 99 Thanks, Philippe -- This message posted from opensolaris.org
John J Balestrini
2010-May-18 15:11 UTC
[zfs-discuss] Very serious performance degradation
Howdy, Is dedup on? I was having some pretty strange problems including slow performance when dedup was on. Disabling dedup helped out a whole bunch. My system only has 4gig of ram, so that may have played a part too. Good luck! John On May 18, 2010, at 7:51 AM, Philippe wrote:> Hi, > > The 4 disks are Western Digital ATA 1TB (one is slighlty different) : > 1 x ATA-WDC WD10EACS-00D-1A01-931.51GB > 3 x ATA-WDC WD10EARS-00Y-0A80-931.51GB > > I''ve done lots of tests (speed tests + SMART reports) with each of these 4 disk on another system (another computer, running Windows 2003 x64), and everything was fine ! The 4 disks operate well, at 50-100 MB/s (tested with Hdtune). And the access time : 14ms > > The controller is an LSI Logic SAS 1068-IR (MPT BIOS 6.12.00.00 - 31/10/2006) > > Here are some stats : > > 1) cp of a big file to a ZFS filesystem (128K recordsize) : > ============================================================> iostat -x 30 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.3 0.3 17.6 2.3 0.0 0.0 19.5 0 0 > sd2 11.5 6.0 350.1 154.5 0.0 0.3 19.5 0 4 > sd3 12.5 5.7 351.4 154.5 0.0 0.5 27.1 0 5 > sd4 15.9 6.3 615.1 153.8 0.0 1.3 58.2 0 8 > sd5 15.1 8.1 600.4 150.7 0.0 7.6 326.7 0 31 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 41.3 0.0 5289.7 0.0 0.0 1.3 31.0 0 4 > sd2 4.2 24.1 214.0 1183.0 0.0 0.5 19.4 0 4 > sd3 3.7 23.6 227.2 1183.0 0.0 2.1 78.5 0 12 > sd4 6.6 26.4 374.2 1179.4 0.0 10.1 306.5 0 35 > sd5 4.3 31.0 369.6 973.3 0.0 22.0 622.0 0 96 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 17.1 0.0 2184.6 0.0 0.0 0.5 30.6 0 2 > sd2 1.6 12.3 116.4 570.9 0.0 0.6 41.3 0 3 > sd3 1.6 12.1 107.6 570.9 0.0 10.3 754.7 0 33 > sd4 2.1 12.6 187.1 569.4 0.0 9.4 634.7 0 28 > sd5 0.4 21.7 25.6 700.6 0.0 29.5 1338.1 0 96 > > > 2) cp of a big file to a ZFS filesystem (16K recordsize) : > ============================================================> iostat -x 30 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.2 0.3 16.7 2.3 0.0 0.0 19.3 0 0 > sd2 11.5 6.0 350.7 154.5 0.0 0.3 19.5 0 4 > sd3 12.5 5.7 352.0 154.5 0.0 0.5 27.0 0 5 > sd4 15.9 6.3 616.2 153.8 0.0 1.3 58.0 0 8 > sd5 15.1 8.1 601.3 150.7 0.0 7.5 324.6 0 31 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 32.0 0.0 4095.9 0.0 0.0 1.0 30.8 0 3 > sd2 2.0 22.4 124.2 425.0 0.0 0.1 2.3 0 2 > sd3 1.9 19.4 115.9 425.0 0.0 0.6 28.7 0 14 > sd4 2.3 23.6 170.9 421.8 0.0 3.2 124.7 0 15 > sd5 3.2 24.5 290.6 306.6 0.0 22.5 810.5 0 94 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.0 2.0 0.0 3.0 0.0 0.0 0.7 0 0 > sd3 0.1 1.1 4.3 2.0 0.0 0.0 15.9 0 1 > sd4 0.1 1.4 4.3 1.9 0.0 0.0 2.9 0 0 > sd5 0.2 19.8 10.7 101.8 0.0 32.1 1606.9 0 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 8.6 0.0 1096.2 0.0 0.0 0.3 29.7 0 1 > sd2 0.2 4.8 10.7 267.2 0.0 0.0 7.8 0 0 > sd3 0.2 5.5 6.8 268.2 0.0 0.6 107.0 0 3 > sd4 0.2 9.1 11.0 265.4 0.0 6.3 678.4 0 21 > sd5 0.2 21.4 6.8 104.5 0.0 31.6 1467.8 0 92 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd5 0.0 18.9 0.0 101.7 0.0 35.0 1851.6 0 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.3 5.9 15.3 279.2 0.0 0.0 6.7 0 1 > sd3 0.4 5.7 23.5 279.2 0.0 1.0 161.5 0 5 > sd4 0.4 11.6 23.8 275.6 0.0 11.6 964.3 0 36 > sd5 0.2 20.6 13.1 107.2 0.0 30.2 1452.7 0 99 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd5 0.0 20.1 0.0 105.7 0.0 35.0 1741.2 0 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.1 4.3 6.4 196.4 0.0 0.1 13.8 0 1 > sd3 0.2 4.0 10.7 196.4 0.0 0.1 15.9 0 1 > sd4 0.3 5.5 17.1 195.6 0.0 0.2 31.3 0 3 > sd5 0.3 22.5 17.1 127.4 0.0 31.4 1378.4 0 99 > > > Thanks, > Philippe > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 18 May, 2010 - Philippe sent me these 6,0K bytes:> Hi, > > The 4 disks are Western Digital ATA 1TB (one is slighlty different) : > 1 x ATA-WDC WD10EACS-00D-1A01-931.51GB > 3 x ATA-WDC WD10EARS-00Y-0A80-931.51GB > > I''ve done lots of tests (speed tests + SMART reports) with each of these 4 disk on another system (another computer, running Windows 2003 x64), and everything was fine ! The 4 disks operate well, at 50-100 MB/s (tested with Hdtune). And the access time : 14ms > > The controller is an LSI Logic SAS 1068-IR (MPT BIOS 6.12.00.00 - 31/10/2006) > > Here are some stats : > > 1) cp of a big file to a ZFS filesystem (128K recordsize) : > ============================================================> iostat -x 30 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.3 0.3 17.6 2.3 0.0 0.0 19.5 0 0 > sd2 11.5 6.0 350.1 154.5 0.0 0.3 19.5 0 4 > sd3 12.5 5.7 351.4 154.5 0.0 0.5 27.1 0 5 > sd4 15.9 6.3 615.1 153.8 0.0 1.3 58.2 0 8 > sd5 15.1 8.1 600.4 150.7 0.0 7.6 326.7 0 31 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 41.3 0.0 5289.7 0.0 0.0 1.3 31.0 0 4 > sd2 4.2 24.1 214.0 1183.0 0.0 0.5 19.4 0 4 > sd3 3.7 23.6 227.2 1183.0 0.0 2.1 78.5 0 12 > sd4 6.6 26.4 374.2 1179.4 0.0 10.1 306.5 0 35 > sd5 4.3 31.0 369.6 973.3 0.0 22.0 622.0 0 96 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 17.1 0.0 2184.6 0.0 0.0 0.5 30.6 0 2 > sd2 1.6 12.3 116.4 570.9 0.0 0.6 41.3 0 3 > sd3 1.6 12.1 107.6 570.9 0.0 10.3 754.7 0 33 > sd4 2.1 12.6 187.1 569.4 0.0 9.4 634.7 0 28 > sd5 0.4 21.7 25.6 700.6 0.0 29.5 1338.1 0 96Umm.. Service time of sd3..5 are waay too high to be good working disks. 21 writes shouldn''t take 1.3 seconds. Some of your disks are not feeling well, possibly doing block-reallocation like mad all the time, or block recovery of some form. Service times should be closer to what sd1 and 2 are doing. sd2,3,4 seems to be getting about the same amount of read+write, but their service time is 15-20 times higher. This will lead to crap performance (and probably broken array in a while). /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
I note in your iostat data below that one drive (sd5) consistently performs MUCH worse than the others, even when doing less work..... -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of John J Balestrini Sent: Tuesday, May 18, 2010 8:11 AM To: OpenSolaris ZFS discuss Subject: Re: [zfs-discuss] Very serious performance degradation Howdy, Is dedup on? I was having some pretty strange problems including slow performance when dedup was on. Disabling dedup helped out a whole bunch. My system only has 4gig of ram, so that may have played a part too. Good luck! John On May 18, 2010, at 7:51 AM, Philippe wrote:> Hi, > > The 4 disks are Western Digital ATA 1TB (one is slighlty different) : > 1 x ATA-WDC WD10EACS-00D-1A01-931.51GB > 3 x ATA-WDC WD10EARS-00Y-0A80-931.51GB > > I''ve done lots of tests (speed tests + SMART reports) with each of these 4 disk on another system (another computer, running Windows 2003 x64), and everything was fine ! The 4 disks operate well, at 50-100 MB/s (tested with Hdtune). And the access time : 14ms > > The controller is an LSI Logic SAS 1068-IR (MPT BIOS 6.12.00.00 - 31/10/2006) > > Here are some stats : > > 1) cp of a big file to a ZFS filesystem (128K recordsize) : > ============================================================> iostat -x 30 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.3 0.3 17.6 2.3 0.0 0.0 19.5 0 0 > sd2 11.5 6.0 350.1 154.5 0.0 0.3 19.5 0 4 > sd3 12.5 5.7 351.4 154.5 0.0 0.5 27.1 0 5 > sd4 15.9 6.3 615.1 153.8 0.0 1.3 58.2 0 8 > sd5 15.1 8.1 600.4 150.7 0.0 7.6 326.7 0 31 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 41.3 0.0 5289.7 0.0 0.0 1.3 31.0 0 4 > sd2 4.2 24.1 214.0 1183.0 0.0 0.5 19.4 0 4 > sd3 3.7 23.6 227.2 1183.0 0.0 2.1 78.5 0 12 > sd4 6.6 26.4 374.2 1179.4 0.0 10.1 306.5 0 35 > sd5 4.3 31.0 369.6 973.3 0.0 22.0 622.0 0 96 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 17.1 0.0 2184.6 0.0 0.0 0.5 30.6 0 2 > sd2 1.6 12.3 116.4 570.9 0.0 0.6 41.3 0 3 > sd3 1.6 12.1 107.6 570.9 0.0 10.3 754.7 0 33 > sd4 2.1 12.6 187.1 569.4 0.0 9.4 634.7 0 28 > sd5 0.4 21.7 25.6 700.6 0.0 29.5 1338.1 0 96 > > > 2) cp of a big file to a ZFS filesystem (16K recordsize) : > ============================================================> iostat -x 30 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.2 0.3 16.7 2.3 0.0 0.0 19.3 0 0 > sd2 11.5 6.0 350.7 154.5 0.0 0.3 19.5 0 4 > sd3 12.5 5.7 352.0 154.5 0.0 0.5 27.0 0 5 > sd4 15.9 6.3 616.2 153.8 0.0 1.3 58.0 0 8 > sd5 15.1 8.1 601.3 150.7 0.0 7.5 324.6 0 31 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 32.0 0.0 4095.9 0.0 0.0 1.0 30.8 0 3 > sd2 2.0 22.4 124.2 425.0 0.0 0.1 2.3 0 2 > sd3 1.9 19.4 115.9 425.0 0.0 0.6 28.7 0 14 > sd4 2.3 23.6 170.9 421.8 0.0 3.2 124.7 0 15 > sd5 3.2 24.5 290.6 306.6 0.0 22.5 810.5 0 94 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.0 2.0 0.0 3.0 0.0 0.0 0.7 0 0 > sd3 0.1 1.1 4.3 2.0 0.0 0.0 15.9 0 1 > sd4 0.1 1.4 4.3 1.9 0.0 0.0 2.9 0 0 > sd5 0.2 19.8 10.7 101.8 0.0 32.1 1606.9 0 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 8.6 0.0 1096.2 0.0 0.0 0.3 29.7 0 1 > sd2 0.2 4.8 10.7 267.2 0.0 0.0 7.8 0 0 > sd3 0.2 5.5 6.8 268.2 0.0 0.6 107.0 0 3 > sd4 0.2 9.1 11.0 265.4 0.0 6.3 678.4 0 21 > sd5 0.2 21.4 6.8 104.5 0.0 31.6 1467.8 0 92 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd5 0.0 18.9 0.0 101.7 0.0 35.0 1851.6 0 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.3 5.9 15.3 279.2 0.0 0.0 6.7 0 1 > sd3 0.4 5.7 23.5 279.2 0.0 1.0 161.5 0 5 > sd4 0.4 11.6 23.8 275.6 0.0 11.6 964.3 0 36 > sd5 0.2 20.6 13.1 107.2 0.0 30.2 1452.7 0 99 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd5 0.0 20.1 0.0 105.7 0.0 35.0 1741.2 0 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd2 0.1 4.3 6.4 196.4 0.0 0.1 13.8 0 1 > sd3 0.2 4.0 10.7 196.4 0.0 0.1 15.9 0 1 > sd4 0.3 5.5 17.1 195.6 0.0 0.2 31.3 0 3 > sd5 0.3 22.5 17.1 127.4 0.0 31.4 1378.4 0 99 > > > Thanks, > Philippe > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, 18 May 2010, Philippe wrote:> The 4 disks are Western Digital ATA 1TB (one is slighlty different) : > 1 x ATA-WDC WD10EACS-00D-1A01-931.51GB > 3 x ATA-WDC WD10EARS-00Y-0A80-931.51GB > > I''ve done lots of tests (speed tests + SMART reports) with each of these 4 disk on another system (another computer, running Windows 2003 x64), and everything was fine ! The 4 disks operate well, at 50-100 MB/s (tested with Hdtune). And the access time : 14ms > > The controller is an LSI Logic SAS 1068-IR (MPT BIOS 6.12.00.00 - 31/10/2006) > > Here are some stats : > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 8.6 0.0 1096.2 0.0 0.0 0.3 29.7 0 1 > sd2 0.2 4.8 10.7 267.2 0.0 0.0 7.8 0 0 > sd3 0.2 5.5 6.8 268.2 0.0 0.6 107.0 0 3 > sd4 0.2 9.1 11.0 265.4 0.0 6.3 678.4 0 21 > sd5 0.2 21.4 6.8 104.5 0.0 31.6 1467.8 0 92It looks like your ''sd5'' disk is performing horribly bad and except for the horrible performance of ''sd5'' (which bottlenecks the I/O), ''sd4'' would look just as bad. Regardless, the first step would be to investigate ''sd5''. If ''sd4'' is also a terrible performer, then resilvering a disk replacement of ''sd5'' may take a very long time. Use ''iostat -xen'' to obtain more information, including the number of reported errors. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Edward Ned Harvey
2010-May-19 00:55 UTC
[zfs-discuss] Very serious performance degradation
How full is your filesystem? Give us the output of "zfs list" You might be having a hardware problem, or maybe it''s extremely full. Also, if you have dedup enabled, on a 3TB filesystem, you surely want more RAM. I don''t know if there''s any rule of thumb you could follow, but offhand I''d say 16G or 32G. Numbers based on the vapor passing around the room I''m in right now.
> mm.. Service time of sd3..5 are waay too high to be > good working disks. > 21 writes shouldn''t take 1.3 seconds. > > Some of your disks are not feeling well, possibly > doing > block-reallocation like mad all the time, or block > recovery of some > form. Service times should be closer to what sd1 and > 2 are doing. > sd2,3,4 seems to be getting about the same amount of > read+write, but > their service time is 15-20 times higher. This will > lead to crap > performance (and probably broken array in a while). > > /TomasHi ! It is strange because I''ve checked the SMART data of the 4 disks, and everything seems really OK ! (on another hardware/controller, because I needed Windows to check it). Maybe it''s a problem with the SAS/SATA controller ?! One question : if I halt the server, and change the order of the disks on the SATA array, will RAIDZ still detect the array fine ???? The idea is to check if the results (big service times) depends on the drives position, or on the hard drives themself ! Thank you ! -- This message posted from opensolaris.org
> How full is your filesystem? Give us the output of > "zfs list" > You might be having a hardware problem, or maybe it''s > extremely full.Hi Edward, The "_db" filesystems have a recordsise of 16K (the others have the default 128K) : NAME USED AVAIL REFER MOUNTPOINT zfs_raid 1,02T 1,65T 28,4K /zfs_raid zfs_raid/fs1_db 8,89G 1,65T 7,73G /home/fs1_db zfs_raid/fs2 2,68G 1,65T 1,73G /home/fs2 zfs_raid/fs3 3,38G 1,65T 3,12G /home/fs3 zfs_raid/fs4 10,1G 1,65T 10,0G /home/fs4 zfs_raid/fs5 517G 1,65T 326G /home/fs5 zfs_raid/fs6_db 35,1G 1,65T 28,0G /home/fs6_db zfs_raid/fs7 9,22G 1,65T 7,67G /home/fs7 zfs_raid/fs8_db 22,7G 1,65T 21,6G /home/fs8_db zfs_raid/fs9 179G 1,65T 108G /home/fs9 zfs_raid/fs10 115G 1,65T 97,0G /home/fs10 zfs_raid/fs11_db 28,6G 1,65T 17,3G /home/fs11_db zfs_raid/fs12 17,1G 1,65T 4,70G /home/fs12 zfs_raid/fs13 9,66G 1,65T 6,77G /home/fs13 zfs_raid/fs14 4,13G 1,65T 3,12G /home/fs14 zfs_raid/fs15 15,2G 1,65T 9,48G /home/fs15 zfs_raid/fs16 14,7G 1,65T 6,59G /home/fs16 zfs_raid/fs17 7,49G 1,65T 5,31G /home/fs17 zfs_raid/fs18 41,0G 1,65T 21,6G /home/fs18> Also, if you have dedup enabled, on a 3TB filesystem, > you surely want more > RAM. I don''t know if there''s any rule of thumb you > could follow, but > offhand I''d say 16G or 32G. Numbers based on the > vapor passing around the > room I''m in right now.It seems that the "dedup" property doesn''t exist on my system ! Are you sure this capability is supported on the version of ZFS included in Opensolaris ? Thank you ! Philippe -- This message posted from opensolaris.org
On 05/19/10 09:34 PM, Philippe wrote:> Hi ! > > It is strange because I''ve checked the SMART data of the 4 disks, and everything seems really OK ! (on another hardware/controller, because I needed Windows to check it). Maybe it''s a problem with the SAS/SATA controller ?! > > One question : if I halt the server, and change the order of the disks on the SATA array, will RAIDZ still detect the array fine ???? >Yes, it will. -- Ian.
> it looks like your ''sd5'' disk is performing horribly > bad and except > for the horrible performance of ''sd5'' (which > bottlenecks the I/O), > ''sd4'' would look just as bad. Regardless, the first > step would be to > investigate ''sd5''.Hi Bob ! I''ve already tried the pool without the sd5 disk (so pool in degraded mode), but the performances was still the same... So the sd5 disk itself is not the (only) bottleneck...> Use ''iostat -xen'' to obtain more information, > including the number of > reported errors.iostat -xen extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 9 0 9 c8t0d0 0.3 0.4 14.1 3.2 0.0 0.0 0.0 17.9 0 0 0 0 0 0 c7t0d0 65.5 6.6 1234.3 97.6 0.0 1.0 0.0 14.5 0 14 0 0 0 0 c7t2d0 70.6 6.1 1229.2 97.6 0.0 1.3 0.0 16.3 0 16 0 0 0 0 c7t3d0 94.0 6.7 2349.2 97.0 0.0 3.6 0.0 36.1 0 23 0 0 0 0 c7t4d0 80.4 12.1 2306.5 91.3 0.0 16.6 0.0 179.7 0 68 0 0 0 0 c7t5d0 Thanks ! -- This message posted from opensolaris.org
> > One question : if I halt the server, and change the > order of the disks on the SATA array, will RAIDZ > still detect the array fine ???? > > > > Yes, it will.Hi ! I''ve done the moves this morning, and the high service times followed the disks ! So, I have 3 disks to replace urgently !! I''m starting with the replacement of the very bad disk, and hope the resilvering won''t take too long !! I have no choice ! Thanks to all ! Philippe -- This message posted from opensolaris.org
> I''m starting with the replacement of the very bad > disk, and hope the resilvering won''t take too long !!Replacing c7t2d0, I get the following : NAME STATE READ WRITE CKSUM zfs_raid DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c7t5d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 replacing DEGRADED 0 0 1,42K c7t2d0s0/o FAULTED 0 0 0 corrupted data c7t2d0 ONLINE 0 0 0 2,92M resilvered Why there is a faulted "c7t2d0s0/o" appearing ?? Maybe this was on the disk before the replacement ? When I''ve done the "zpool replace", I had to add "-f" to force, because ZFS told that these was a ZFS label on the disk, and that vdevs where not ok... How to get rid of this, and get a normal "c7t2d0" ?? Thank you ! Philippe -- This message posted from opensolaris.org
Current status : pool: zfs_raid state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h17m, 3,72% done, 7h22m to go config: NAME STATE READ WRITE CKSUM zfs_raid DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c7t5d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 replacing DEGRADED 0 0 339K c7t2d0s0/o FAULTED 0 0 0 corrupted data c7t2d0 ONLINE 0 0 0 12,5G resilvered errors: No known data errors Any idea ? -- This message posted from opensolaris.org
Edward Ned Harvey
2010-May-20 11:46 UTC
[zfs-discuss] Very serious performance degradation
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Philippe > > c7t2d0s0/o FAULTED 0 0 0 corrupted data > > When I''ve done the "zpool replace", I had to add "-f" to force, because > ZFS told that these was a ZFS label on the disk, and that vdevs where > not ok... > How to get rid of this, and get a normal "c7t2d0" ??There''s more than one way, but this should work: zpool create trashpooljunk c7t2d0 zpool destroy trashpooljunk and then repeat your zpool replace Also, since you''ve got "s0" on there, it means you''ve got some partitions on that drive. You could manually wipe all that out via format, but the above is pretty brainless and reliable.
On May 20, 2010, at 4:12 AM, Philippe wrote:>> I''m starting with the replacement of the very bad >> disk, and hope the resilvering won''t take too long !! > > Replacing c7t2d0, I get the following : > > NAME STATE READ WRITE CKSUM > zfs_raid DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > c7t5d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > replacing DEGRADED 0 0 1,42K > c7t2d0s0/o FAULTED 0 0 0 corrupted data > c7t2d0 ONLINE 0 0 0 2,92M resilvered > > Why there is a faulted "c7t2d0s0/o" appearing ??That represents the "old" c7t2d0> Maybe this was on the disk before the replacement ?Not on the disk, it is the disk.> When I''ve done the "zpool replace", I had to add "-f" to force, because ZFS told that these was a ZFS label on the disk, and that vdevs where not ok... > How to get rid of this, and get a normal "c7t2d0" ??Yes, ZFS is trying to prevent you from making mistakes. By default, it will not clobber a disk that appears to be in use. ...and let the resilver complete. -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
On May 20, 2010, at 4:46 AM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Philippe >> >> c7t2d0s0/o FAULTED 0 0 0 corrupted data >> >> When I''ve done the "zpool replace", I had to add "-f" to force, because >> ZFS told that these was a ZFS label on the disk, and that vdevs where >> not ok... >> How to get rid of this, and get a normal "c7t2d0" ?? > > There''s more than one way, but this should work: > zpool create trashpooljunk c7t2d0 > zpool destroy trashpooljunkPlease don''t do this. ZFS identifies disks by their GUID, not their pathname. ZFS has recognized that the new disk with path c7t2d0 is different than the old disk at that path. This is a very common scenario. Once the replacement is complete, then the old disk will be forgotten. -- richard> > and then repeat your zpool replace > > Also, since you''ve got "s0" on there, it means you''ve got some partitions on > that drive. You could manually wipe all that out via format, but the above > is pretty brainless and reliable. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
On May 20, 2010, at 4:24 AM, Philippe wrote:> Current status : > > pool: zfs_raid > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress for 0h17m, 3,72% done, 7h22m to go > config: > > NAME STATE READ WRITE CKSUM > zfs_raid DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > c7t5d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > replacing DEGRADED 0 0 339K > c7t2d0s0/o FAULTED 0 0 0 corrupted data > c7t2d0 ONLINE 0 0 0 12,5G resilvered > > errors: No known data errors > > Any idea ?action: Wait for the resilver to complete. -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
> Any idea ? > action: Wait for the resilver to complete. > -- richardVery fine ! And thank you a lot for your answers ! Philippe -- This message posted from opensolaris.org
On 20/05/2010 12:46, Edward Ned Harvey wrote:> Also, since you''ve got "s0" on there, it means you''ve got some partitions on > that drive.There are always partitions once the disk is in use by ZFS, but there may be 1 or more of them and they maybe SMI or EFI partitions. So just because there is an s0 on the end doesn''t necessarily mean that there is a non zero s1, s2 etc. -- Darren J Moffat
On Thu, 20 May 2010, Edward Ned Harvey wrote:> Also, since you''ve got "s0" on there, it means you''ve got some > partitions on that drive. You could manually wipe all that out via > format, but the above is pretty brainless and reliable.The "s0" on the old disk is a bug in the way we''re formatting the output. This was fixed in CR 6881631. Regards, markm
> ...and let the resilver complete. > -- richardHi ! pool: zfs_raid state: ONLINE scrub: resilver completed after 16h34m with 0 errors on Fri May 21 05:39:42 2010 config: NAME STATE READ WRITE CKSUM zfs_raid ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 326G resilvered errors: No known data errors Now, I just have to do the same drive replacement for the 2 other failing drives... Many thanks to you all ! Philippe -- This message posted from opensolaris.org
> Now, I just have to do the same drive replacement for > the 2 other failing drives...For information, current iostat results : extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 11 0 11 c8t0d0 44.9 0.0 5738.1 0.0 0.0 1.4 0.0 32.2 0 5 0 0 0 0 c7t0d0 0.4 241.9 3.2 1172.2 0.0 2.7 0.0 11.1 0 10 0 0 0 0 c7t2d0 0.4 31.2 3.2 846.2 0.0 28.2 0.0 891.2 0 89 0 0 0 0 c7t3d0 0.3 18.5 1.2 576.3 0.0 7.5 0.0 398.4 0 24 0 0 0 0 c7t4d0 0.3 38.4 2.5 1289.1 0.0 0.8 0.0 19.6 0 4 0 0 0 0 c7t5d0 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 11 0 11 c8t0d0 0.0 0.0 3.2 0.0 0.0 0.0 0.0 4.2 0 0 0 0 0 0 c7t0d0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 9.7 0 0 0 0 0 0 c7t2d0 0.0 27.6 0.1 701.3 0.0 35.0 0.0 1269.2 0 100 0 0 0 0 c7t3d0 0.0 19.6 0.1 713.0 0.0 20.9 0.0 1066.5 0 61 0 0 0 0 c7t4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c7t5d0 I really have to hurry up replacing c7t3d0 too (and next c7t4d0), cause they are falling rapidly (compared to the results of yesterday) !! Have a nice week-end, Philippe -- This message posted from opensolaris.org
Hi, I known that ZFS is aware of I/O errors, and can alert or disable a crappy disk. However, ZFS didn''t notice at all these "service time" problems. I think it is a good idea to integrate service time triggers in ZFS ! What to you think ? Best regards ! Philippe -- This message posted from opensolaris.org
Hi, Actually, it seems a common problem with WD "EARS" drives (advanced format) ! Please, see this other OpenSolaris thread : https://opensolaris.org/jive/thread.jspa?threadID=126637 It is worth investigating ! I quote :> Just replacing back, and here is the iostat for the new EARS drive: > http://pastie.org/889572 > > Those asvc_t''s are atrocious. As is the op/s throughput. All the other drives spend the vast > majority of the time idle, waiting for the new EARS drive to write out data. > > This is after isolating another issue to my Dell PERC 5/i''s - they apparently don''t talk nicely > with the EARS drives either. Streaming writes would push data for two seconds and pause > for ten. Random writes ... give up. > On the Intel chipset''s SATA - streaming writes are acceptable, but random writes are as > per the above url. > > Format tells me that the partition starts at sector 256.But given that ZFS writes variable > size blocks, that really shouldn''t matter. > > When plugged the EARS into a P45-based motherboard running Windows, HDTune > presents a normal looking streaming writes graph, and the average seek time is 14ms - > the drive seems healthy.-- This message posted from opensolaris.org