Calculating the availability and economic trade-offs of configurations is hard. Rule of thumb seems to rule. I recently profiled an availability/reliability tool on StorageMojo.com that uses Bayesian analysis to estimate datacenter availability. You can quickly (minutes, not days) model systems and compare availability and recovery times as well as OpEx and CapEx implications. One hole: AFAIK, ZFS isn''t in their product catalog. There''s a free version of the tool at http://www.twinstrata.com/ Feedback on the tool from this group is invited. Robin StorageMojo.com Date: Tue, 17 Feb 2009 21:36:38 -0800 From: Richard Elling <richard.elling at gmail.com> To: Toby Thain <toby at telegraphics.com.au> Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] ZFS on SAN? Message-ID: <499B9E66.2010702 at gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Toby Thain wrote:> Not at all. You''ve convinced me. Your servers will never, ever lose > power unexpectedly.Methinks living in Auckland has something to do with that :-) http://en.wikipedia.org/wiki/1998_Auckland_power_crisis When services are reliable, then complacency brings risk. My favorite example recently is the levees in New Orleans. Katrina didn''t top the levees, they were undermined. -- richard -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090218/9bfa083b/attachment.html>
Robin, From recollection the business case for investment in power protection technology was relatively simple. We calculated what the downtime per hour was worth and how frequently it happened. We used to have several if not more incidents per year and that would cause major system outages. When you have over 1000 staff and multiple remote sites depending on your data center (now data centers, plural). Calculate cost per hour for staff wages alone and it becomes quite easy to justify. (I am not even going to fact in loss of reputation and the media in this, or our most important customer. Our students) I cannot *stress* just how important power and environment protection is to data. It is the main consideration I take into account when deploying new sites. (This discussion went off list yesterday and I was mentioning these same things there). My analogy here is what would be the first thing NASA designs into a new space craft? Life Support. Without it you don''t even leave the ground. Electricity *is* the lifeblood of available storage. Case in point. Last year we had an arsonist set fire to a critical point in out campus infrastructure last year which burnt down a building that just happened to have one of the main communication and power trenches running through it. Knocked out around 5 buildings on that campus for two weeks. Immense upheaval and disruption followed. Our brand new DR data center was on that site. Kept running because of redundant fibre paths to the SAN switches and core routers so that we could still provide service to the rest of the campus and maintain active DR to our primary site. Emergency power via generator was also available until main power could be rerouted to the data center as well. I will take a look at the twinstrata website. (as should others). Sorry to all if we are diverging too much from zfs-discuss. /Scott This stuff does happen. When you have been around for a while you see it. Robin Harris wrote:> Calculating the availability and economic trade-offs of configurations > is hard. Rule of thumb seems to rule. > > I recently profiled an availability/reliability tool > on StorageMojo.com that uses Bayesian analysis to estimate datacenter > availability. You can quickly (minutes, not days) model systems and > compare availability and recovery times as well as OpEx and CapEx > implications. > > One hole: AFAIK, ZFS isn''t in their product catalog. There''s a free > version of the tool at http://www.twinstrata.com/ > > Feedback on the tool from this group is invited. > > Robin > StorageMojo.com > > > > Date: Tue, 17 Feb 2009 21:36:38 -0800 > From: Richard Elling <richard.elling at gmail.com > <mailto:richard.elling at gmail.com>> > To: Toby Thain <toby at telegraphics.com.au > <mailto:toby at telegraphics.com.au>> > Cc: zfs-discuss at opensolaris.org <mailto:zfs-discuss at opensolaris.org> > Subject: Re: [zfs-discuss] ZFS on SAN? > Message-ID: <499B9E66.2010702 at gmail.com > <mailto:499B9E66.2010702 at gmail.com>> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Toby Thain wrote: >> Not at all. You''ve convinced me. Your servers will never, ever lose >> power unexpectedly. > > Methinks living in Auckland has something to do with that :-) > http://en.wikipedia.org/wiki/1998_Auckland_power_crisis > > When services are reliable, then complacency brings risk. > My favorite example recently is the levees in New Orleans. > Katrina didn''t top the levees, they were undermined. > -- richard > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- _______________________________________________________________________ Scott Lawson Systems Architect Manukau Institute of Technology Information Communication Technology Services Private Bag 94006 Manukau City Auckland New Zealand Phone : +64 09 968 7611 Fax : +64 09 968 7641 Mobile : +64 27 568 7611 mailto:scott at manukau.ac.nz http://www.manukau.ac.nz ________________________________________________________________________ perl -e ''print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'' ________________________________________________________________________
>>>>> "sl" == Scott Lawson <Scott.Lawson at manukau.ac.nz> writes:sl> Electricity *is* the lifeblood of available storage. I never meant to suggest computing machinery could run without electricity. My suggestion is, if your focus is _reliability_ rather than availability, meaning you don''t want to lose the contents of a pool, you should think about what happens when power goes out, not just how to make sure power Never goes out Ever Absolutely because we Paid and our power is PERFECT. * pools should not go corrupt when power goes out. * UPS does not replace need for NVRAM''s to have batteries in it because there are things between the UPS and the NVRAM like cords and power supplies, and the UPS themselves are not reliable enough if you have only one, and the controller containing the NVRAM may need to be hard-booted because of bugs. * supplying superexpensive futuristic infalliable fancypower to all disk shelves does not mean the SYNC CACHE command can be thrown out. maybe the power is still not infalliable, or maybe there will be SAN outages or blown controllers or shelves with junky software in them that hang the whole array when one drive goes bad. If you really care about availability: * reliability crosses into availability if you are planning to have fragile pools backed by a single SAN LUN, which may become corrupt if they lose power. Maybe you''re planning to destroy the pool and restore from backup in that case, and you have some carefully-planned offsite backup heirarchy that''s always recent enough to capture all the data you care about. But, a restore could take days, which turns two minutes of unavailable power into one day of unavailable data. If there were no reliability problem causing pool loss during power loss, two minutes unavailable power maybe means 10min of unavailable data. * there are reported problems with systems that take hours to boot up, ex. with thousands of filesystems, snapshots, or nfs exports, which isn''t exactly a reliability problem, but is a problem. That open issue falls into the above outage-magnification category, too. I just don''t like the idea people are building fancy space-age data centers and then thinking they can safely run crappy storage software that won''t handle power outages because they''re above having to worry about all that little-guy nonsense. A big selling point of the last step-forward in filesystems (metadata logging) was that they''d handle power failures with better consistency guarantees and faster reboots---at the time, did metadata logging appeal only to people with unreliable power? I hope not. never mind those of us who find these filsystem features important because we''d like cheaper or smaller systems, with cords that we sometimes trip over, that are still useful. I think having such protections in the storage software and having them actually fully working not just imaginary or fragile, is always useful, isn''t something you can put yourself above by ``careful power design'''' or ``paying for it'''' because without them, in a disaster you''ve got this brittle house-of-cards system that cracks once you deviate from the specific procedures you''ve planned. I''m glad your disaster planning has stood the test of practice so well. But we''re supposed to have an industry baseline right now that databases and MTA''s and NFS servers and their underlying filesystems can lose power without losing any data, and I think we should stick to that rather than letting it slip. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090218/ead1bcbe/attachment.bin>
On Wed, 18 Feb 2009, Miles Nordin wrote:> > I just don''t like the idea people are building fancy space-age data > centers and then thinking they can safely run crappy storage software > that won''t handle power outages because they''re above having to worry > about all that little-guy nonsense. A big selling point of the lastLuckily that is not a concern for the members of this list since you posted to the ZFS mailing list and we all use ZFS rather than some "crappy storage software". Thanks for expressing your concern though. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Zoltan Farkas
2009-Feb-18 23:13 UTC
[zfs-discuss] zpool iostat not reporting corect info on snv_107
Zpool iostat 5 reports: rpool 115G 349G 91 0 45.7K 0 rpool 115G 349G 90 0 45.5K 0 rpool 115G 349G 89 0 44.6K 0 rpool 115G 349G 93 0 47.9K 0 rpool 115G 349G 90 0 45.0K 0 rpool 115G 349G 84 31 67.5K 144K rpool 115G 349G 92 0 111K 0 rpool 115G 349G 90 0 45.3K 0 while iotop.d from the drace toolkit reports: 2009 Feb 18 18:09:31, load: 0.15, disk_r: 26560 KB, disk_w: 0 KB UID PID PPID CMD DEVICE MAJ MIN D BYTES 101 8697 1 trackerd sd5 50 320 R 13369344 101 8697 1 trackerd sd4 50 256 R 13828096 Iostat -xn 5 also reports consistent readings with iotop. I lauched tracker to rebuild its index and the hard drive get decent use (you can hear it) and zpool iostat somehow does not see it ... Is this a known problem ? regards --zoly
Miles Nordin wrote:>>>>>> "sl" == Scott Lawson <Scott.Lawson at manukau.ac.nz> writes: >>>>>> > > sl> Electricity *is* the lifeblood of available storage. > > I never meant to suggest computing machinery could run without > electricity. My suggestion is, if your focus is _reliability_ rather > than availability, meaning you don''t want to lose the contents of a > pool, you should think about what happens when power goes out, not > just how to make sure power Never goes out Ever Absolutely because we > Paid and our power is PERFECT. >My focus is on both. And I understand that nothing is ever perfect, only that one should strive for it if possible. But when one lives in a place like NZ where our power grid system is creaky, it starts becoming a real liability that needs mitigation thats all. I am sure there are plenty of ZFS users in the same boat.> * pools should not go corrupt when power goes out. >Absolutely agree.> * UPS does not replace need for NVRAM''s to have batteries in it > because there are things between the UPS and the NVRAM like cords > and power supplies, and the UPS themselves are not reliable enough > if you have only one, and the controller containing the NVRAM may > need to be hard-booted because of bugs. >Fully understand this too. If you use as I do hardware RAID arrays behind zpool vdevs then it is very important that this stuff is maintained and that the batteries backing the RAID array write caches are good and that you can have power available to allow them to flush cache to disk before the batteries go flat. This is certainly true of any file system that is built upon LUNS from hardware backed RAID arrays.> * supplying superexpensive futuristic infalliable fancypower to all > disk shelves does not mean the SYNC CACHE command can be thrown > out. maybe the power is still not infalliable, or maybe there will > be SAN outages or blown controllers or shelves with junky software > in them that hang the whole array when one drive goes bad. >In general why I use mirrored vdevs with LUNS provided from two different arrays geographically isolated, less likely to be a problem hopefully. But yes anything that ignores SYNC CACHE could pose a serious problem if it is hidden by an array controller from ZFS.> If you really care about availability: > > * reliability crosses into availability if you are planning to have > fragile pools backed by a single SAN LUN, which may become corrupt > if they lose power. Maybe you''re planning to destroy the pool and > restore from backup in that case, and you have some > carefully-planned offsite backup heirarchy that''s always recent > enough to capture all the data you care about. But, a restore > could take days, which turns two minutes of unavailable power into > one day of unavailable data. If there were no reliability problem > causing pool loss during power loss, two minutes unavailable power > maybe means 10min of unavailable data. >Agreed and is why I would recommend against a single hardware RAID SAN LUN for a zpool. At bare minimum for this you would want to use copies=2 if you really care about your data. IF you don''t care about the data then no problems, go ahead. I do use zpools for transient data that I don''t care about and favor capacity over resiliency. (main think I want is L2ARC for these, think squid proxy server caches)> * there are reported problems with systems that take hours to boot > up, ex. with thousands of filesystems, snapshots, or nfs exports, > which isn''t exactly a reliability problem, but is a problem. That > open issue falls into the above outage-magnification category, too. >Have seen this myself. Not nice after a system reboot. Can''t recall if I have seen it recently though? Seem to recall it was more around S10 U2 or U3.> I just don''t like the idea people are building fancy space-age data > centers and then thinking they can safely run crappy storage software > that won''t handle power outages because they''re above having to worry > about all that little-guy nonsense. A big selling point of the last > step-forward in filesystems (metadata logging) was that they''d handle > power failures with better consistency guarantees and faster > reboots---at the time, did metadata logging appeal only to people with > unreliable power? I hope not. >I am just trying to put forward the perspective of a big user here. I have already generated numerous off list posts with people wanting more info on the methodology that we like to use. If I can be of help to people I will.> never mind those of us who find these filsystem features important > because we''d like cheaper or smaller systems, with cords that we > sometimes trip over, that are still useful. I think having such > protections in the storage software and having them actually fully > working not just imaginary or fragile, is always useful,Absolutely. It is all part of the big picture. Albeit probably the *the* most important part. Consistency of your data is the paramount concern for all people that store it. I just like to make sure it''s also available and not just consistent on disk.> isn''t > something you can put yourself above by ``careful power design'''' or > ``paying for it'''' because without them, in a disaster you''ve got this > brittle house-of-cards system that cracks once you deviate from the > specific procedures you''ve planned. >Generally in system builds their resilience to failure events is tested at the time of commissioning and then not tested again once the system goes live, has patches applied, security updates, bugs introduced etc. So in my experience trying to mitigate outage risks as much as possible is a good idea and is the root behind the idea that I am trying to convey. Calling this a house of cards because of adherence to operational procedures is at least somewhat better than it being a mud hut.> I''m glad your disaster planning has stood the test of practice so > well. But we''re supposed to have an industry baseline right now that > databases and MTA''s and NFS servers and their underlying filesystems > can lose power without losing any data, and I think we should stick to > that rather than letting it slip. >Absolutely.> ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >