I know that performance has been discussed often here, but I
have just gone through some testing in preparation for deploying a
large configuration (120 drives is a large configuration for me) and I
wanted to share my results, both to share the results as well as to
see if anyone sees anything wrong in either my methodology or results.
First the hardware and requirements. We have an M4000
production server and a T2000 test server. The storage resides in five
J4400 dual attached to the T2000 (and soon to be connected to the
M4000 as well). The drives are all 750 GB SATA disks. So we have 120
drives. The data is currently residing on other storage and will be
migrated to the new storage as soon as we are happy with the
configuration. There is about 20 TB or data today, and we need growth
to at least 40 TB. We also need a small set of drives for testing. My
plan is to use 80 to 100 drives for production and 20 drives for test.
The I/O pattern is a small number of large sequential writes (to load
the data) followed by lots of random reads and some random writes (5%
sequential writes, 10% random writes, 85% random reads). The files are
relatively small, as they are scans (TIFF) of documents, median size
is 23KB. The data is divided into projects, each of which varies in
size between almost nothing up to almost 50 million objects. We
currently have multiple zpools (based on department) and multiple
datasets in each (based on project). The plan for the new storage is
to go with one zpool, and then have a dataset per department, and
datasets within the departments for each project.
Based on recommendations from our local Sun / Oracle staff, we
are planning on using raidz2 for recoverability reasons over mirroring
(to get a comparable level of fault tolerance with mirrors would
require three-way mirrors, and that does not get us the capacity we
need). I have been testing various raidz2 configurations to confirm
the data I have found regarding performance vs. number of vdevs and
size of raidz2 vdevs. I used 40 disks out of the 120 and used the same
40 disks (after culling out any that showed unusual asvc_t via iostat.
I used filebench for the testing as it seemed to generate real
differences based on zpool configuration (other tools I tried show no
statistical difference between zpool configurations).
See
https://spreadsheets.google.com/pub?key=0AtReWsGW-SB1dFB1cmw0QWNNd0RkR1ZnN0JEb2RsLXc&hl=en&output=html
for a summary of the results. The random read numbers agree with what
is expected (performance scales linearly with the number of vdevs).
The random write numbers also agree with the expected result, except
for the 4 vdevs of 10 disk raidz2 which showed higher performance than
expected. The sequential write performance actually was fairly
consistent and even showed a slight improvement with fewer vdevs of
more disks. Based on these results, and our capacity needs, I am
planning to go with 5 disk raidz2 vdevs. Since we have five J4400, I
am considering using one disk in each of the five arrays per vdev, so
that a complete failure of a J4400 does not cause any loss of data.
What is the general opinion of that approach and does anyone know how
to map the MPxIO device name back to a physical drive ?
Does anyone see any issues with either the results or the
tentative plan for the new storage layout ? Thanks, in advance, for
your feedback.
P.S. Let me know if you want me to post the filebench workloads I
used, they are the defaults with a few small tweeks (random workload
ran 64 threads, for example).
--
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players