I had posted at the Sun forums, but it was recommended to me to try here as well. For reference, please see http://forums.sun.com/thread.jspa?threadID=5351916&tstart=0. In the process of a large SAN migration project we are moving many large volumes from the old SAN to the new. We are making use of the ''replace'' function to replace the old volumes with similar or larger new volumes. This process is moving very slowly, sometimes as slow as only moving one percentage of data every 10 minutes. Is there any way to streamline this method? The system is Solaris 10 08/07. How much is dependent on the activity of the box? How about on the architecture of the box? The primary system in question at this point is a T2000 with 8GB of RAM and a 4-core CPU. This server has 6 4Gb fibre channel connections to our SAN environment. At times this server is quite busy because it is our backup server, but performance seems no better when backup operations have ceased their daily activities. Our pools are only stripes. Would we expect better performance from a mirror or raidz pool? It is worrisome that if the environment were compromised by a failed disk that it could take so long to replace and correct the usual redundancies (if it was a mirror or raidz pool). I have previously applied the kernel change described here: http://blogs.digitar.com/jjww/?itemid=52 I just moved a 1TB volume which took approx. 27h. -- This message posted from opensolaris.org
Have you considered moving to 10/08 ? ZFS resilver performance is much improved in this release, and I suspect that code might help you. You can easily test upgrading with Live Upgrade. I did the transition using LU and was very happy with the results. For example, I added a disk to a mirror and resilvering the new disk took about 6 min for almost 300GB, IIRC. Blake On Mon, Dec 1, 2008 at 11:04 PM, Alan Rubin <alan.rubin at nt.gov.au> wrote:> I had posted at the Sun forums, but it was recommended to me to try here as well. For reference, please see http://forums.sun.com/thread.jspa?threadID=5351916&tstart=0. > > In the process of a large SAN migration project we are moving many large volumes from the old SAN to the new. We are making use of the ''replace'' function to replace the old volumes with similar or larger new volumes. This process is moving very slowly, sometimes as slow as only moving one percentage of data every 10 minutes. Is there any way to streamline this method? The system is Solaris 10 08/07. How much is dependent on the activity of the box? How about on the architecture of the box? The primary system in question at this point is a T2000 with 8GB of RAM and a 4-core CPU. This server has 6 4Gb fibre channel connections to our SAN environment. At times this server is quite busy because it is our backup server, but performance seems no better when backup operations have ceased their daily activities. > > Our pools are only stripes. Would we expect better performance from a mirror or raidz pool? It is worrisome that if the environment were compromised by a failed disk that it could take so long to replace and correct the usual redundancies (if it was a mirror or raidz pool). > > I have previously applied the kernel change described here: http://blogs.digitar.com/jjww/?itemid=52 > > I just moved a 1TB volume which took approx. 27h. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
We will be considering it in the new year, but that will not happen in time to affect our current SAN migration. -- This message posted from opensolaris.org
Would any of this have to do with the system being a T2000? Would ZFS resilvering be affected by single threadedness, slowish US-T1 clock speed or lack of strong FPU performance? On 12/1/08, Alan Rubin <alan.rubin at nt.gov.au> wrote:> We will be considering it in the new year, but that will not happen in time > to affect our current SAN migration. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- -- Matt Walburn http://mattwalburn.com
It''s something we''ve considered here as well. -- This message posted from opensolaris.org
I think we found the choke point. The silver lining is that it isn''t the T2000 or ZFS. We think it is the new SAN, an Hitachi AMS1000, which has 7200RPM SATA disks with the cache turned off. This system has a very small cache, and when we did turn it on for one of the replacement LUNs we saw a 10x improvement - until the cache filled up about 1 minute later (was using zpool iostat). Oh well. -- This message posted from opensolaris.org
alan.rubin at nt.gov.au said:> I think we found the choke point. The silver lining is that it isn''t the > T2000 or ZFS. We think it is the new SAN, an Hitachi AMS1000, which has > 7200RPM SATA disks with the cache turned off. This system has a very small > cache, and when we did turn it on for one of the replacement LUNs we saw a > 10x improvement - until the cache filled up about 1 minute later (was using > zpool iostat). Oh well.We have experience with a T2000 connected to the HDS 9520V, predecessor to the AMS arrays, with SATA drives, and it''s likely that your AMS1000 SATA has similar characteristics. I didn''t see if you''re using Sun''s drivers to talk to the SAN/array, but we are using Solaris-10 (and Sun drivers + MPXIO), and since the Hitachi storage isn''t automatically recognized (sd/ssd, scsi_vhci), it took a fair amount of tinkering to get parameters adjusted to work well with the HDS storage. The combination that has given us best results with ZFS is: (a) Tell the array to ignore SYNCHRONIZE_CACHE requests from the host. (b) Balance drives within each AMS disk shelf across both array controllers. (c) Set the host''s max queue depth to 4 for the SATA LUN''s (sd/ssd driver). (d) Set the host''s disable_disksort flag (sd/ssd driver) for HDS LUN''s. Here''s the reference we used for setting the parameters in Solaris-10: http://wikis.sun.com/display/StorageDev/Parameter+Configuration Note that the AMS uses read-after-write verification on SATA drives, so you only have half the IOP''s for writes that the drives are capable of handling. We''ve found that small RAID volumes (e.g. a two-drive mirror) are unbelievably slow, so you''d want to go toward having more drives per RAID group, if possible. Honestly, if I recall correctly what I saw in your "iostat" listings earlier, your situation is not nearly as "bad" as with our older array. You don''t seem to be driving those HDS LUN''s to the extreme busy states that we have seen on our 9520V. It was not unusual for us to see LUN''s at 100% busy, 100% wait, with 35 ops total in the "actv" and "wait" columns, and I don''t recall seeing any 100%-busy devices in your logs. But getting the FC queue-depth (max-throttle) setting to match what the array''s back-end I/O can handle greatly reduced the long "zpool status" and other I/O-related hangs that we were experiencing. And disabling the host-side FC queue-sorting greatly improved the overall latency of the system when busy. Maybe it''ll help yours too. Regards, Marion
Thanks for the tips. I''m not sure if they will be relevant, though. We don''t talk directly with the AMS1000. We are using a USP-VM to virtualize all of our storage and we didn''t have to add anything to the drv configuration files to see the new disk (mpxio was already turned on). We are using the Sun drivers and mpxio and we didn''t require any tinkering to see the new LUNs. -- This message posted from opensolaris.org
alan.rubin at nt.gov.au said:> Thanks for the tips. I''m not sure if they will be relevant, though. We > don''t talk directly with the AMS1000. We are using a USP-VM to virtualize > all of our storage and we didn''t have to add anything to the drv > configuration files to see the new disk (mpxio was already turned on). We > are using the Sun drivers and mpxio and we didn''t require any tinkering to > see the new LUNs.Yes, the fact that the USP-VM was recognized automatically by Solaris drivers is a good sign. I suggest that you check to see what queue-depth and disksort values you ended up with from the automatic settings: echo "*ssd_state::walk softstate |::print -t struct sd_lun un_throttle" \ | mdb -k The "ssd_state" would be "sd_state" on an x86 machine (Solaris-10). The "un_throttle" above will show the current max_throttle (queue depth); Replace it with "un_min_throttle" to see the min, and "un_f_disksort_disabled" to see the current queue-sort setting. The HDS docs for 9500 series suggested 32 as the max_throttle to use, and the default setting (Solaris-10) was 256 (hopefully with the USP-VM you get something more reasonable). And while 32 did work for us, i.e. no operations were ever lost as far as I could tell, the array back-end -- the drives themselves, and the internal SATA shelf connections, have an actual queue depth of four for each array controller. The AMS1000 has the same limitation for SATA shelves, according to our HDS engineer. In short, Solaris, especially with ZFS, functions much better if it does not try to send more FC operations to the array than the actual physical devices can handle. We were actually seeing NFS client operations hang for minutes at a time when the SAN-hosted NFS server was making its ZFS devices busy -- and this was true even if clients were using different devices than the busy ones. We do not see these hangs after making the described changes, and I believe this is because the OS is no longer waiting around for a response from devices that aren''t going to respond in a reasonable amount of time. Yes, having the USP between the host and the AMS1000 will affect things; There''s probably some huge cache in there somewhere. But unless you''ve got cache of hundreds of GB in size, at some point a resilver operation is going to end up running at the speed of the actual back-end device. Regards, Marion