Hi,
I have a pool of 22 1T SATA disks in a RAIDZ3 configuration. It is filled with
files of an average size of 2MB. I filled it randomly to resemble the expected
workload in production use.
Problems arise when I try to scrub/resilver this pool. This operation takes the
better part of a week (!). During this time the disk being resilvered is at 100%
utilisation with >300 writes/s, but only 3MB/s, which is only about 3% of its
best case performance.
Having a window of one week with degraded redundancy is intolerable. It is quite
likely that one loses more disks during this period, eventually leading to a
total loss of the pool, not to mention the degraded performance during this
period. In fact, in previous tests I lost a pool in a 6x11 RAIDZ2 configuration.
I skimmed through the code of resilver and found out that it just enumerates all
object in the pool and checks them one by one, having maxinflight I/O-request in
parallel. Because this does not take the order of data ondisk into account it
leads to this pathological performance. Also I found Bug 6678033 stating that a
prefetch might fix this.
Now my questions:
1) Are there tunings that could speed up resilver, possibly with a negative
effect on normal performance? I thought of raising recordsize to the expected
filesize of 2MB. Could this help?
2) What is the state of the fix? When will it be ready?
3) Do you have any configuration hints for setting up a pool layout which might
help resilver performance? (aside from using hardware RAID instead of RAIDZ)
Thanks for any hints.
sensille
--
This message posted from opensolaris.org