We have a virtualized environment of T-Series where each host has either zones or LDoms. All of the virtual systems will have their own dedicated storage on ZFS (and some may also get raw LUNs). All the SAN storage is delivered in fixed sized 33GB LUNs. The question I have to the community is whether it would be better to have a pool per virtual system, or create a large pool and carve out ZFS file systems per virtual system? The trade-offs are with the one large pool, you''ll be able to take advantage of dedup when it becomes available in Solaris, however you then have all your eggs in one basket. It was originally thought that if we have a pool per virtual host, then when you migrate a host, you could migrate the storage too, but that''s not looking feasible in this environment. What is probability of corruption with ZFS in Solaris 10 U6 and up in a SAN environment? Have people successfully recovered? In this environment the redundancy is performed at the hardware level, not at the host. So there''s no chance of self-healing here. Yes, its been discussed, but because of the legacy storage environment thats shared with other non-ZFS systems, they require redundancy at the hardware level, and they won''t budge on that and won''t do additional redundancy at the ZFS level. So given the environment, would it be better for lots of small pools, or a large shared pool? Thanks, Brian
On Tue, 2 Mar 2010, Brian Kolaci wrote:> > What is probability of corruption with ZFS in Solaris 10 U6 and up in a SAN > environment? Have people successfully recovered?The probability of corruption in a "SAN environment" depends entirely on your SAN environment. With proper design, the probability should be "zero". If it is non-zero then there must be a design defect in your SAN hardware which is liable to do harm at any time (aka "snake in the grass").> In this environment the redundancy is performed at the hardware level, not at > the host. So there''s no chance of self-healing here. Yes, its been > discussed, but because of the legacy storage environment thats shared with > other non-ZFS systems, they require redundancy at the hardware level, and > they won''t budge on that and won''t do additional redundancy at the ZFS level.That is unfortunate. In this case, probabilty of failure increases from "zero".> So given the environment, would it be better for lots of small > pools, or a large shared pool?I think that a larger shared pool will be more satisfying and less wasteful of resources. However, a large pool written to a single huge SAN LUN suffers from concurrency issues. ZFS loses the ability to intelligently schedule I/O for individual disks and instead must use the strategy to post a lot of (up to 35) simultaneous I/Os and hope for the best. Bob P.S. The term "zero" is quoted since it does not account for Murphy''s Law. -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Mar 2, 2010, at 7:08 AM, Brian Kolaci wrote:> We have a virtualized environment of T-Series where each host has either zones or LDoms. > All of the virtual systems will have their own dedicated storage on ZFS (and some may also get raw LUNs). All the SAN storage is delivered in fixed sized 33GB LUNs.There really isn''t a losing proposition here, whatever you do the data can be reasonably safe with good performance.> The question I have to the community is whether it would be better to have a pool per virtual system, or create a large pool and carve out ZFS file systems per virtual system? The trade-offs are with the one large pool, you''ll be able to take advantage of dedup when it becomes available in Solaris, however you then have all your eggs in one basket. It was originally thought that if we have a pool per virtual host, then when you migrate a host, you could migrate the storage too, but that''s not looking feasible in this environment.Fewer pools are easier to manage and use space more efficiently.> What is probability of corruption with ZFS in Solaris 10 U6 and up in a SAN environment? Have people successfully recovered? > > In this environment the redundancy is performed at the hardware level, not at the host. So there''s no chance of self-healing here. Yes, its been discussed, but because of the legacy storage environment thats shared with other non-ZFS systems, they require redundancy at the hardware level, and they won''t budge on that and won''t do additional redundancy at the ZFS level.heh, how would they know you''ve set copies=2? ;-)> So given the environment, would it be better for lots of small pools, or a large shared pool?I''d go for the easiest path to manage. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)
On Mar 2, 2010, at 11:09 AM, Bob Friesenhahn wrote:> On Tue, 2 Mar 2010, Brian Kolaci wrote: >> >> What is probability of corruption with ZFS in Solaris 10 U6 and up in a SAN environment? Have people successfully recovered? > > The probability of corruption in a "SAN environment" depends entirely on your SAN environment. With proper design, the probability should be "zero". If it is non-zero then there must be a design defect in your SAN hardware which is liable to do harm at any time (aka "snake in the grass").The only problem here is that they''ve already had a few corruptions. They were on U3 however, but I realize a lot of changes went in on U6 and I''m not aware of any since then. Unfortunately I wan''t told about them until after they rebuilt the pools, and the devices were reused. All of the corruptions I''m aware of were OS disk images on ZFS in the control domain. One I believe happened when they filled the pool because of a snapshot in place, and another when the control domain was patched & rebooted while it had guests still running. But I don''t know the extent of the corruption whether it was pool-wide or just files and/or metadata. I don''t know if anything could have been recovered. I know my only experience with a corrupted pool (which I''m still working on) there were 2 files and one dataset found bad. And a recent one was due to someone accidentally adding the same devices to two different pools. I had him quickly copy as much data off as he could (got 5 out of 7 disk images copied) before it paniced. So if there is corruption, can it be safely isolated so as to not affect other datasets or LDoms? Or would it be likely to take down the whole pool?> >> In this environment the redundancy is performed at the hardware level, not at the host. So there''s no chance of self-healing here. Yes, its been discussed, but because of the legacy storage environment thats shared with other non-ZFS systems, they require redundancy at the hardware level, and they won''t budge on that and won''t do additional redundancy at the ZFS level. > > That is unfortunate. In this case, probabilty of failure increases from "zero".Yes, and need to expect the SA''s to do the unexpected.> >> So given the environment, would it be better for lots of small pools, or a large shared pool? > > I think that a larger shared pool will be more satisfying and less wasteful of resources. However, a large pool written to a single huge SAN LUN suffers from concurrency issues. ZFS loses the ability to intelligently schedule I/O for individual disks and instead must use the strategy to post a lot of (up to 35) simultaneous I/Os and hope for the best.This is what I had written in my document too. But all the SAN LUNs are only 33GB (yes, even when they need over a TB there are lots of 33GB LUNs pooled). They''re trying to get very efficient with storage, which is why I started down this path. But I''m also trying to measure the risk, if any, for moving to a larger pool rather than a bunch of smaller per-virtual-host pools. Regardless of the approach, the LUNs are all 33GB and thats the granularity of all allocations.
On Tue, 2 Mar 2010, Brian Kolaci wrote:> > So if there is corruption, can it be safely isolated so as to not > affect other datasets or LDoms? Or would it be likely to take down > the whole pool?It seems like you are asking if there could be a software bug or a firmware/hardware bug in the SAN. It is always possible for there to be a bug. The bug might inflict the host system even if it uses many smaller pools and the bug might not even be related to zfs.>From an administrative and performance standpoint it seems like thelarge pool is better.>> I think that a larger shared pool will be more satisfying and less >> wasteful of resources. However, a large pool written to a single > > This is what I had written in my document too. But all the SAN LUNs > are only 33GB (yes, even when they need over a TB there are lots of > 33GB LUNs pooled). They''re trying to get very efficient with > storage, which is why I started down this path. But I''m also trying > to measure the risk, if any, for moving to a larger pool rather than > a bunch of smaller per-virtual-host pools. Regardless of the > approach, the LUNs are all 33GB and thats the granularity of all > allocations.The large pool will surely be more efficient with storage since it is unlikely that all consumers will want/need to consume all of the space pre-allocated for them. In fact, if there was an expected per-consumer average of less than 50% consumption, you could use the large pool with mirroring and still have enough space. :-) Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/