Currently I''m trying to figure out the best zfs layout for a thumper wrt. to read AND write performance. I did some simple mkfile 512G tests and found out, that per average ~ 500 MB/s seems to be the maximum on can reach (tried initial default setup, all 46 HDDs as R0, etc.). According to http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ArchitectureWP_062806.pdf I would assume, that much more and at least in theory a max. ~ 2.5 GB/s should be possible with R0 (assuming the throughput for a single thumper HDD is ~ 54 MB/s)... Is somebody able to enlighten me? Thanx, jel. This message posted from opensolaris.org
Richard Elling
2007-Feb-27 02:36 UTC
[zfs-discuss] understanding zfs/thunoer "bottlenecks"?
Jens Elkner wrote:> Currently I''m trying to figure out the best zfs layout for a thumper wrt. to read AND write performance.First things first. What is the expected workload? Random, sequential, lots of little files, few big files, 1 Byte iops, synchronous data, constantly changing access times, ??? In general, striped mirror is the best bet for good performance with redundancy.> I did some simple mkfile 512G tests and found out, that per average ~ 500 MB/s seems to be the maximum on can reach (tried initial default setup, all 46 HDDs as R0, etc.).How many threads? One mkfile thread may be CPU bound. -- richard> According to http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ArchitectureWP_062806.pdf I would assume, that much more and at least in theory a max. ~ 2.5 GB/s should be possible with R0 (assuming the throughput for a single thumper HDD is ~ 54 MB/s)... > > Is somebody able to enlighten me? > > Thanx, > jel. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jens Elkner writes: > Currently I''m trying to figure out the best zfs layout for a thumper wrt. to read AND write performance. > > I did some simple mkfile 512G tests and found out, that per average ~ > 500 MB/s seems to be the maximum on can reach (tried initial default > setup, all 46 HDDs as R0, etc.). > That might be a per pool limitation due to http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622 This performance feature was fixed in Nevada last week. Workaround is to create multiple pools with fewer disks. Also this http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647 is degrading a bit the perf (guesstimate of anywhere up to 10-20%). -r > According to > http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ArchitectureWP_062806.pdf > I would assume, that much more and at least in theory a max. ~ 2.5 > GB/s should be possible with R0 (assuming the throughput for a single > thumper HDD is ~ 54 MB/s)... > > Is somebody able to enlighten me? > > Thanx, > jel. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Feb 27, 2007, at 2:35 AM, Roch - PAE wrote:> > Jens Elkner writes: >> Currently I''m trying to figure out the best zfs layout for a >> thumper wrt. to read AND write performance. >> >> I did some simple mkfile 512G tests and found out, that per average ~ >> 500 MB/s seems to be the maximum on can reach (tried initial default >> setup, all 46 HDDs as R0, etc.). >> > > That might be a per pool limitation due to > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622 > > This performance feature was fixed in Nevada last week.Yep, it will affect his performance if he has compression on (which i wasn''t sure if he did or not). A striped mirror configuration is the best way to go (at least for read performance) + you''ll need multiple streams. eric
it seems there isn''t an algorithm in ZFS that detects sequential write in traditional fs such as ufs, one would trigger directio. qfs can be set to automatically go to directio if sequential IO is detected. the txg trigger of 5sec is inappropriate in this case (as stated by bug 6415647) even a 1.sec trigger can be a limiting factor , especially if you want to go above 3GBytes/sec sequential IO sd. On 2/27/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote:> > Jens Elkner writes: > > Currently I''m trying to figure out the best zfs layout for a thumper wrt. to read AND write performance. > > > > I did some simple mkfile 512G tests and found out, that per average ~ > > 500 MB/s seems to be the maximum on can reach (tried initial default > > setup, all 46 HDDs as R0, etc.). > > > > That might be a per pool limitation due to > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622 > > This performance feature was fixed in Nevada last week. > Workaround is to create multiple pools with fewer disks. > > Also this > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647 > > is degrading a bit the perf (guesstimate of anywhere up to > 10-20%). > > -r > > > > > According to > > http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ArchitectureWP_062806.pdf > > I would assume, that much more and at least in theory a max. ~ 2.5 > > GB/s should be possible with R0 (assuming the throughput for a single > > thumper HDD is ~ 54 MB/s)... > > > > Is somebody able to enlighten me? > > > > Thanx, > > jel. > > > > > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
how do i remove myself from this zfs-discuss at opensolaris.org? Selim Daoud wrote On 02/27/07 10:56 AM,:> it seems there isn''t an algorithm in ZFS that detects sequential > write > in traditional fs such as ufs, one would trigger directio. > qfs can be set to automatically go to directio if sequential IO is > detected. > the txg trigger of 5sec is inappropriate in this case (as stated by > bug 6415647) > even a 1.sec trigger can be a limiting factor , especially if you want > to go above 3GBytes/sec sequential IO > sd. > > On 2/27/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote: > >> >> Jens Elkner writes: >> > Currently I''m trying to figure out the best zfs layout for a >> thumper wrt. to read AND write performance. >> > >> > I did some simple mkfile 512G tests and found out, that per average ~ >> > 500 MB/s seems to be the maximum on can reach (tried initial default >> > setup, all 46 HDDs as R0, etc.). >> > >> >> That might be a per pool limitation due to >> >> >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622 >> >> This performance feature was fixed in Nevada last week. >> Workaround is to create multiple pools with fewer disks. >> >> Also this >> >> >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647 >> >> is degrading a bit the perf (guesstimate of anywhere up to >> 10-20%). >> >> -r >> >> >> >> > According to >> > >> http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ArchitectureWP_062806.pdf >> >> > I would assume, that much more and at least in theory a max. ~ 2.5 >> > GB/s should be possible with R0 (assuming the throughput for a single >> > thumper HDD is ~ 54 MB/s)... >> > >> > Is somebody able to enlighten me? >> > >> > Thanx, >> > jel. >> > >> > >> > This message posted from opensolaris.org >> > _______________________________________________ >> > zfs-discuss mailing list >> > zfs-discuss at opensolaris.org >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
all writes in zfs are sequential On February 27, 2007 7:56:58 PM +0100 Selim Daoud <selim.daoud at gmail.com> wrote:> it seems there isn''t an algorithm in ZFS that detects sequential > write > in traditional fs such as ufs, one would trigger directio. > qfs can be set to automatically go to directio if sequential IO is > detected. > the txg trigger of 5sec is inappropriate in this case (as stated by > bug 6415647) > even a 1.sec trigger can be a limiting factor , especially if you want > to go above 3GBytes/sec sequential IO > sd. > > On 2/27/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote: >> >> Jens Elkner writes: >> > Currently I''m trying to figure out the best zfs layout for a thumper >> > wrt. to read AND write performance. >> > >> > I did some simple mkfile 512G tests and found out, that per average ~ >> > 500 MB/s seems to be the maximum on can reach (tried initial default >> > setup, all 46 HDDs as R0, etc.). >> > >> >> That might be a per pool limitation due to >> >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=64606 >> 22 >> >> This performance feature was fixed in Nevada last week. >> Workaround is to create multiple pools with fewer disks. >> >> Also this >> >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=64156 >> 47 >> >> is degrading a bit the perf (guesstimate of anywhere up to >> 10-20%). >> >> -r >> >> >> >> > According to >> > http://www.amd.com/us-en/assets/content_type/DownloadableAssets/Archi >> > tectureWP_062806.pdf I would assume, that much more and at least in >> > theory a max. ~ 2.5 GB/s should be possible with R0 (assuming the >> > throughput for a single thumper HDD is ~ 54 MB/s)... >> > >> > Is somebody able to enlighten me? >> > >> > Thanx, >> > jel. >> > >> > >> > This message posted from opensolaris.org >> > _______________________________________________ >> > zfs-discuss mailing list >> > zfs-discuss at opensolaris.org >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
johansen-osdev at sun.com
2007-Feb-27 19:23 UTC
[zfs-discuss] understanding zfs/thunoer "bottlenecks"?
> it seems there isn''t an algorithm in ZFS that detects sequential write > in traditional fs such as ufs, one would trigger directio.There is no directio for ZFS. Are you encountering a situation in which you believe directio support would improve performance? If so, please explain. -j
indeed, a customer is doing 2TB of daily backups on a zfs filesystem the throughput doesn''t go above 400MB/s, knowing that at raw speed, the throughput goes up to 800MB/s, the gap is quite wide also, sequential IO is a very common in real life..unfortunately zfs is not performing well still sd On 2/27/07, johansen-osdev at sun.com <johansen-osdev at sun.com> wrote:> > it seems there isn''t an algorithm in ZFS that detects sequential write > > in traditional fs such as ufs, one would trigger directio. > > There is no directio for ZFS. Are you encountering a situation in which > you believe directio support would improve performance? If so, please > explain. > > -j >
Richard Elling
2007-Feb-27 21:46 UTC
[zfs-discuss] understanding zfs/thunoer "bottlenecks"?
Selim Daoud wrote:> indeed, a customer is doing 2TB of daily backups on a zfs filesystem > the throughput doesn''t go above 400MB/s, knowing that at raw speed, > the throughput goes up to 800MB/s, the gap is quite wideOK, I''ll bite. What is the workload and what is the hardware (zpool) config? A 400MB/s bandwidth is consistent with a single-threaded write workload. The disks used in thumper (Hitachi E7K500) have a media bandwidth of 31-64.8 MBytes/s. To get 800 MBytes/s you would need a zpool setup with a minimum number of effective data disks of: N = 800 / 31 N = 26 You would have no chance of doing this in a disk-to-disk backup internal to a thumper, so you''d have to source data from the network. 800 MBytes/s is possible on the network using the new Neptune 10GbE cards. You''ve only got 48 disks to work with, so mirroring may not be feasible for such a sustained high rate.> also, sequential IO is a very common in real life..unfortunately zfs > is not performing well stillZFS only does sequential writes. Why do you believe that the bottleneck is in the memory system? Are you seeing a high scan rate during the workload? -- richard
Selim Daoud
2007-Feb-27 22:24 UTC
Fwd: [zfs-discuss] understanding zfs/thunoer "bottlenecks"?
my mistake, the system is not a Thumper but rather a 6140 disk array, using 4xHBA ports on a T2000 I tried several config and from raid (zfs) , raidz and mirror (zfs) using 8 disks what I observe is a non-continuous stream of data using [zpool] iostat so at some stage the IO is interrupted, dropping the MB/s to very low values, then up again. On 2/27/07, Richard Elling <Richard.Elling at sun.com> wrote:> Selim Daoud wrote: > > indeed, a customer is doing 2TB of daily backups on a zfs filesystem > > the throughput doesn''t go above 400MB/s, knowing that at raw speed, > > the throughput goes up to 800MB/s, the gap is quite wide > > OK, I''ll bite. > What is the workload and what is the hardware (zpool) config? > A 400MB/s bandwidth is consistent with a single-threaded write workload. > > The disks used in thumper (Hitachi E7K500) have a media bandwidth of > 31-64.8 MBytes/s. To get 800 MBytes/s you would need a zpool setup with > a minimum number of effective data disks of: > N = 800 / 31 > N = 26 > > You would have no chance of doing this in a disk-to-disk backup internal > to a thumper, so you''d have to source data from the network. 800 MBytes/s > is possible on the network using the new Neptune 10GbE cards. > > You''ve only got 48 disks to work with, so mirroring may not be feasible > for such a sustained high rate. > > > also, sequential IO is a very common in real life..unfortunately zfs > > is not performing well still > > ZFS only does sequential writes. Why do you believe that the bottleneck > is in the memory system? Are you seeing a high scan rate during the > workload? > -- richard >
On Mon, Feb 26, 2007 at 06:36:47PM -0800, Richard Elling wrote:> Jens Elkner wrote: > >Currently I''m trying to figure out the best zfs layout for a thumper wrt. > >to read AND write performance. > > First things first. What is the expected workload? Random, sequential, > lots of > little files, few big files, 1 Byte iops, synchronous data, constantly > changing > access times, ???Mixed. I.e. 1) as a homes server for student''s and staff''s ~, so small and big files (BTW: what is small and what is big?) as well as compressed/text files (you know, the more space people have, the more messier they get ...) - target to samba and nfs 2) "app server" in the sence of shared nfs space, where applications get installed once and can be used everywhere, e.g. eclipse, soffice, jdk*, teX, Pro Engineer, studio 11 and the like. Later I wanna have the same functionality for firefox, thunderbird, etc. for windows clients via samba, but this requires a little bit ore tweaking to get it work aka time I do not have right now ... Anyway, when ~ 30 students start their monster app like eclipse, oxygen, soffice at once (what happens in seminars quite frequently), I would be lucky to get same performance via nfs as from a local HDD ... 3) Video streaming, i.e. capturing as well as broadcasting/editing via smb/nfs.> In general, striped mirror is the best bet for good performance with > redundancy.Yes - thought about doing a mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0 mirror c7t0d0 c0t4d0 \ mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0 \ mirror c0t2d0 c1t2d0 mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 \ mirror c0t3d0 c1t3d0 mirror c4t3d0 c5t3d0 mirror c6t3d0 c7t3d0 \ mirror c1t4d0 c7t4d0 mirror c4t4d0 c6t4d0 \ mirror c0t5d0 c1t5d0 mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 \ mirror c0t6d0 c1t6d0 mirror c4t6d0 c5t6d0 mirror c6t6d0 c7t6d0 \ mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0 mirror c6t7d0 c7t7d0 (probably removing 5th line and using those drives for hotspare). But perhaps it might be better, to split the mirrors into 3 different pools (but not sure why: my brain says no, my belly says yes ;-)).> >I did some simple mkfile 512G tests and found out, that per average ~ 500 > >MB/s seems to be the maximum on can reach (tried initial default setup, > >all 46 HDDs as R0, etc.). > > How many threads? One mkfile thread may be CPU bound.Very good point! Using 2 mkfile 256G I got (min/max/av) 473/750/630 MB/s (via zpool iostat 10) with the layout shown above and no compression enabled. Just to proof it I got with 4 mkfile 128G 407/815/588, with 3 mkfile 170G 401/788/525, 1 mkfile 512G was 397/557/476. Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
On Tue, Feb 27, 2007 at 11:35:37AM +0100, Roch - PAE wrote:> > That might be a per pool limitation due to > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622Not sure - did not use compression feature...> This performance feature was fixed in Nevada last week. > Workaround is to create multiple pools with fewer disks.Does this make sense for mirrors only as well ?> Also this > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647 > > is degrading a bit the perf (guesstimate of anywhere up to > 10-20%).Hmm - sounds similar (zpool iostat 10): pool1 4.36G 10.4T 0 5.04K 0 636M pool1 9.18G 10.4T 0 4.71K 204 591M pool1 17.8G 10.4T 0 5.21K 0 650M pool1 24.0G 10.4T 0 5.65K 0 710M pool1 30.9G 10.4T 0 6.26K 0 786M pool1 36.3G 10.4T 0 2.74K 0 339M pool1 41.5G 10.4T 0 4.27K 1.60K 533M pool1 46.7G 10.4T 0 4.19K 0 527M pool1 46.7G 10.4T 0 2.28K 1.60K 290M pool1 55.7G 10.4T 0 5.18K 0 644M pool1 59.9G 10.4T 0 6.17K 0 781M pool1 68.8G 10.4T 0 5.63K 0 702M pool1 73.8G 10.3T 0 3.93K 0 492M pool1 78.7G 10.3T 0 2.96K 0 366M pool1 83.2G 10.3T 0 5.58K 0 706M pool1 91.5G 10.3T 4 6.09K 6.54K 762M pool1 96.4G 10.3T 0 2.74K 0 338M pool1 101G 10.3T 0 3.88K 1.75K 485M pool1 106G 10.3T 0 3.85K 0 484M pool1 106G 10.3T 0 2.79K 1.60K 355M pool1 110G 10.3T 0 2.97K 0 369M pool1 119G 10.3T 0 5.20K 0 647M pool1 124G 10.3T 0 3.64K 1.80K 455M pool1 124G 10.3T 0 3.54K 0 453M pool1 128G 10.3T 0 2.77K 0 343M pool1 133G 10.3T 0 3.92K 102 491M pool1 137G 10.3T 0 2.43K 0 300M pool1 141G 10.3T 0 3.26K 0 407M pool1 148G 10.3T 0 5.35K 0 669M pool1 152G 10.3T 0 3.14K 0 392M pool1 156G 10.3T 0 3.01K 0 374M pool1 160G 10.3T 0 4.47K 0 562M pool1 164G 10.3T 0 3.04K 0 379M pool1 168G 10.3T 0 3.39K 0 424M pool1 172G 10.3T 0 3.67K 0 459M pool1 176G 10.2T 0 3.91K 0 490M pool1 183G 10.2T 4 5.58K 6.34K 699M pool1 187G 10.2T 0 3.30K 1.65K 406M pool1 195G 10.2T 0 3.24K 0 401M pool1 198G 10.2T 0 3.21K 0 401M pool1 203G 10.2T 0 3.87K 0 486M pool1 206G 10.2T 0 4.92K 0 623M pool1 214G 10.2T 0 5.13K 0 642M pool1 222G 10.2T 0 5.02K 0 624M pool1 225G 10.2T 0 4.19K 0 530M pool1 234G 10.2T 0 5.62K 0 700M pool1 238G 10.2T 0 6.21K 0 787M pool1 247G 10.2T 0 5.47K 0 681M pool1 254G 10.2T 0 3.94K 0 488M pool1 258G 10.2T 0 3.54K 0 442M pool1 262G 10.2T 0 3.53K 0 442M pool1 267G 10.2T 0 4.01K 0 504M pool1 274G 10.2T 0 5.32K 0 664M pool1 274G 10.2T 4 3.42K 6.69K 438M pool1 278G 10.2T 0 3.44K 1.70K 428M pool1 282G 10.1T 0 3.44K 0 429M pool1 289G 10.1T 0 5.43K 0 680M pool1 293G 10.1T 0 3.36K 0 419M pool1 297G 10.1T 0 3.39K 306 423M pool1 301G 10.1T 0 3.33K 0 416M pool1 308G 10.1T 0 5.48K 0 685M pool1 312G 10.1T 0 2.89K 0 360M pool1 316G 10.1T 0 3.65K 0 457M pool1 320G 10.1T 0 3.10K 0 386M pool1 327G 10.1T 0 5.48K 0 686M pool1 334G 10.1T 0 3.31K 0 406M pool1 337G 10.1T 0 5.28K 0 669M pool1 345G 10.1T 0 3.30K 0 402M pool1 349G 10.1T 0 3.48K 1.60K 437M pool1 349G 10.1T 0 3.42K 0 436M pool1 353G 10.1T 0 3.05K 0 379M pool1 358G 10.1T 0 3.81K 0 477M pool1 362G 10.1T 0 3.40K 0 425M pool1 366G 10.1T 4 3.23K 6.59K 401M pool1 370G 10.1T 0 3.47K 1.65K 432M pool1 376G 10.1T 0 4.98K 0 623M pool1 380G 10.1T 0 2.97K 0 369M pool1 384G 10.0T 0 3.52K 409 439M pool1 390G 10.0T 0 5.00K 0 626M pool1 398G 10.0T 0 3.38K 0 414M pool1 404G 10.0T 0 5.09K 0 637M pool1 408G 10.0T 0 3.18K 0 397M pool1 412G 10.0T 0 3.19K 0 397M pool1 416G 10.0T 0 4.96K 0 626M pool1 421G 10.0T 0 4.25K 0 533M pool1 427G 10.0T 0 5.06K 0 634M pool1 435G 10.0T 0 5.08K 0 637M pool1 440G 9.99T 0 5.61K 0 706M pool1 448G 9.98T 0 5.75K 0 720M pool1 457G 9.98T 0 5.59K 0 698M pool1 462G 9.97T 5 5.59K 7.99K 703M pool1 469G 9.96T 0 5.36K 357 669M pool1 476G 9.96T 0 5.91K 0 744M pool1 485G 9.95T 0 5.85K 102 731M pool1 489G 9.94T 0 6.22K 13.4K 788M pool1 498G 9.94T 1 4.06K 3.40K 498M pool1 498G 9.94T 0 3.30K 12.9K 423M pool1 503G 9.93T 0 2.53K 869 309M pool1 510G 9.92T 0 3.73K 13.0K 466M elkner.isis /pool1/elkner > mkfile 170G bla1 5.63u 511.07s 16:10.79 53.2% elkner.isis /pool1/elkner > mkfile 170G bla2 4.99u 510.42s 16:35.73 51.7% elkner.isis /pool1/elkner > mkfile 170G bla3 5.02u 504.54s 15:47.73 53.7% Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
On Wed, Feb 28, 2007 at 11:45:35AM +0100, Roch - PAE wrote:> > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622Any estimations, when we''ll see a [feature] fix for U3? Should I open a call, to perhaps rise the priority for the fix?> The bug applies to checksum as well. Although the fix now in > the gate only addreses compression. > There is a per pool limit on throuput due to checksum. > Multiple pools may help.Yepp. However, pooling the the disks also means to limit the I/O for a single task to max. #disksOfPool*IOperDisk/2. So pooling would make sense to me, if one has a lot of tasks and is able to force them to a "dedicated" pool... So my conclusion is, the more pools the ore aggregate "bandwitdh" (if one is able to distribute the work properly over all disks), but the less bandwith for a single task :((> > > This performance feature was fixed in Nevada last week. > > > Workaround is to create multiple pools with fewer disks. > > > > Does this make sense for mirrors only as well ?> Yep.OK, since I can''t get out more than ~1GB/s (only one PCI-X slot left for a 10Mbps NIC), I decided to split into 2m*12 + 2m*10 + s*2 (see below). But I do not wanna rise the write perf limit: It already dropped to average ~ 345 MB/s :(((((((> > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647 > > > > > > is degrading a bit the perf (guesstimate of anywhere up to > > > 10-20%).I would guess, even up to 45% ...> Check out iostat 1 and you will see the ''0s'' : not good.Yes - saw even 5 consecutive 0s ... :( Here the layout (inkl. "test" results), I actually have in mind for production: 1. pool for big files (sources tarballs, multimedia, iso images): ------------------------------------------------------------------ zpool create -n pool1 \ mirror c0t0d0 c1t0d0 mirror c6t0d0 c7t0d0 \ mirror c4t1d0 c5t1d0 \ mirror c0t2d0 c1t2d0 mirror c6t2d0 c7t2d0 \ mirror c4t3d0 c5t3d0 \ mirror c0t4d0 c1t4d0 mirror c6t4d0 c7t4d0 \ mirror c4t5d0 c5t5d0 \ mirror c0t6d0 c1t6d0 mirror c6t6d0 c7t6d0 \ mirror c4t7d0 c5t7d0 \ spare c4t0d0 c4t4d0 \ (2x 256G) write(min/max/aver): 0 674 343.7 2. pool for mixed stuff (homes, apps): -------------------------------------- zpool create -n pool2 \ \ mirror c0t1d0 c1t1d0 mirror c6t1d0 c7t1d0 \ mirror c4t2d0 c5t2d0 \ mirror c0t3d0 c1t3d0 mirror c6t3d0 c7t3d0 \ \ mirror c0t5d0 c1t5d0 mirror c6t5d0 c7t5d0 \ mirror c4t6d0 c5t6d0 \ mirror c0t7d0 c1t7d0 mirror c6t7d0 c7t7d0 \ spare c4t0d0 c4t4d0 \ (2x 256G) write(min/max/aver): 0 600 386.0 1. + 2. (2x 256G) write(min/max/aver): 0 1440 637.9 1. + 2. (4x 128G) write(min/max/aver): 3.5 1268 709.5 (381+328.5) Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
On Mar 19, 2007, at 7:26 PM, Jens Elkner wrote:> On Wed, Feb 28, 2007 at 11:45:35AM +0100, Roch - PAE wrote: > >>>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622 > > Any estimations, when we''ll see a [feature] fix for U3? > Should I open a call, to perhaps rise the priority for the fix?6460622 zio_nowait() doesn''t live up to its name was putback into snv_59 and should make it into s10u4. Once that happens, then the appropriate patches will be available. eric
Wade.Stuart at fallon.com
2007-Mar-20 17:27 UTC
[zfs-discuss] ZFS resilver/snap/scrub resetting status?
Folks, Is there any update on the progress of fixing the resilver/snap/scrub reset issues? If the bits have been pushed is there a patch for Solaris 10U3? http://bugs.opensolaris.org/view_bug.do?bug_id=6343667 Also the scrub/resilver priority setting? http://bugs.opensolaris.org/view_bug.do?bug_id=6494473 Wade
eric kustarz
2007-Mar-20 19:42 UTC
[zfs-discuss] ZFS resilver/snap/scrub resetting status?
On Mar 20, 2007, at 10:27 AM, Wade.Stuart at fallon.com wrote:> > > > > Folks, > > Is there any update on the progress of fixing the resilver/ > snap/scrub > reset issues? If the bits have been pushed is there a patch for > Solaris > 10U3? > > http://bugs.opensolaris.org/view_bug.do?bug_id=6343667Matt and Mark are working on this. The current rough estimate is s10u5.> > Also the scrub/resilver priority setting? > > http://bugs.opensolaris.org/view_bug.do?bug_id=6494473No one is actively working on this, but feel free to grab it. eric