Robin Axelsson
2010-Mar-16 11:35 UTC
[zfs-discuss] Usage of hot spares and hardware allocation capabilities.
I''ve been informed that newer versions of ZFS supports the usage of hot spares which is denoted for drives that are not in use but available for resynchronization/resilvering should one of the original drives fail in the assigned storage pool. I''m a little sceptical about this because even the hot spare will be running for the same duration as the other disks in the pool and therefore will be exposed to the same levels of hardware degradation and failures unless it is put to sleep during the time it is not being used for storage. So, is there a sleep/hibernation/standby mode that the hot spares operate in or are they on all the time regardless of whether they are in use or not? Usually the hot spare is on a not so well-performing SAS/SATA controller, so given the scenario of a hard drive failure upon which a hot spare has been used for resilvering of say a raidz2 cluster, can I move the resilvered hot spare to the faster controller by letting it take the faulty hard drive''s space using the "zpool offline", "zpool online" commands? To be more general; are the hard drives in the pool "hard coded" to their SAS/SATA channels or can I swap their connections arbitrarily if I would want to do that? Will zfs automatically identify the association of each drive of a given pool or tank and automatically reallocate them to put the pool/tank/filesystem back in place? -- This message posted from opensolaris.org
Khyron
2010-Mar-20 00:22 UTC
[zfs-discuss] Usage of hot spares and hardware allocation capabilities.
Responses inline... On Tue, Mar 16, 2010 at 07:35, Robin Axelsson <gu99roax at student.chalmers.se>wrote:> I''ve been informed that newer versions of ZFS supports the usage of hot > spares which is denoted for drives that are not in use but available for > resynchronization/resilvering should one of the original drives fail in the > assigned storage pool. >That is the definition of a hot spare, at least informally. ZFS has supported this for some time (if not from the beginning; I''m not in a position to answer that). It is *not* new.> > I''m a little sceptical about this because even the hot spare will be > running for the same duration as the other disks in the pool and therefore > will be exposed to the same levels of hardware degradation and failures > unless it is put to sleep during the time it is not being used for storage. > So, is there a sleep/hibernation/standby mode that the hot spares operate in > or are they on all the time regardless of whether they are in use or not? >Not that I am aware of or have heard others report. No such "sleep mode" exists. Sounds like you want a Copan storage system. AFAIK, hot spares are always spinning, that''s why they are hot.> > Usually the hot spare is on a not so well-performing SAS/SATA controller, > so given the scenario of a hard drive failure upon which a hot spare has > been used for resilvering of say a raidz2 cluster, can I move the resilvered > hot spare to the faster controller by letting it take the faulty hard > drive''s space using the "zpool offline", "zpool online" commands? >Usually? That''s not my experience, from multiple vendors hardware RAID arrays. Usually it''s on a channel used by storage disks. Maybe someone else has seen otherwise. I''d be personally curious to know what system puts a spare on a lower performance channel. That risks slowing the entire device (RAID set/group) when the hot spare kicks in. As for your questions, that doesn''t make a lot of sense to me. I don''t even get how that would work, but I''m not "Wile E. Coyote, Super Genius" either.> > To be more general; are the hard drives in the pool "hard coded" to their > SAS/SATA channels or can I swap their connections arbitrarily if I would > want to do that? Will zfs automatically identify the association of each > drive of a given pool or tank and automatically reallocate them to put the > pool/tank/filesystem back in place? >No. Each disk in the pool has a unique ID, as I understand. Thus, you should be able to move a disk to another location (channel, slot) and it would still be a part of the same pool and VDEV. All of that said, I saw this post when it originally came in. I notice no one has responded to it until now. I don''t know about anyone else, but I know that I was offended when I read this. I know for myself, I wasn''t sure how to take this when I read it. Maybe you should not assume that people on this list don''t know what hot sparing is, or that ZFS just learned. Just a suggestion. -- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it''s a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100319/44f1d6fa/attachment.html>
Tonmaus
2010-Mar-20 10:24 UTC
[zfs-discuss] Usage of hot spares and hardware allocation capabilities.
> So, is there a > sleep/hibernation/standby mode that the hot spares > operate in or are they on all the time regardless of > whether they are in use or not?This depends on the power-save options of your hardware, not on ZFS. Arguably, there is less ware on the heads for a hot spare. I guess that many modern disks will park the heads after a certain time, or spin even down, unless the controller prevents that. The question is if the disk comes back fast enough when required - your bets are on the controller supporting that properly. As it seems, there is little focus on that matter at SUN and among community members. At least my own investigations how to best make use of power save options like most SoHo NAS boxes offer returned only dire results.> Usually the hot spare is on a not so well-performing > SAS/SATA controller,There is no room for "not so well-performing" controllers in my servers. I would not allow wasting PCIe slots, backplanes for anything that doesn''t live up to specs (my requirements). That being said, JBOD HBAs are those that perform best with ZFS and those happen to be not very expensive. Additionally, I will avoid a checker-board of components striving for keeping things as simple as possible.> To be more general; are the hard drives in the pool > "hard coded" to their SAS/SATA channels or can I swap > their connections arbitrarily if I would want to do > that? Will zfs automatically identify the association > of each drive of a given pool or tank and > automatically reallocate them to put the > pool/tank/filesystem back in place?This works very well, given your controller properly supports it. I tried that on an Areca 1170 a couple of weeks ago, with interesting results that turned out to be an Areca firmware flaw. You may find the thread on this list. I would recommend that you do such tests when implementing your array before going in production with it. Analogue aspects may apply for - Hotswapping - S.M.A.R.T. - replace failing components or change configuration - transfer a whole array to another host (list is not comprehensive) I think at this moment you have two choices to be sure that all "advertised" ZFS features will be available in your system: - learning it the hard way by try and error - use SUN hardware, or another turnkey solution that offers ZFS, such as NexentaStore A popular approach is following along the rails of what is being used by SUN, a prominent example being the LSI 106x SAS HBAs in "IT" mode. Regards, Tonmaus -- This message posted from opensolaris.org
Robin Axelsson
2010-Mar-20 13:14 UTC
[zfs-discuss] Usage of hot spares and hardware allocation capabilities.
I know about those SoHo boxes and the whatnot, they keep spinning up and down all the time and the worst thing is that you cannot disable this sleep/powersave feature on most of these devices. I believe I have seen a "sleep mode" support when I skimmed through the feature lists of the LSI controllers but I''m not sure right now. My idea is rather that the "hot spares" (or perhaps we should say "cold spares" then) are off all the time until they are needed or when a user initiated/scheduled system integrity check is being conducted. They could go up for a "test spin" at each occasion a scrub is initiated which is not too frequently. Perhaps I was a little too conclusive with my assumptions regarding ZFS and OpenSolaris. I figured that real enterprise applications rather use Solaris together with carefully selected hardware whereas OpenSolaris is more aimed at lower-budget/mainstream applications as a way of gaining a wider acceptance for OpenSolaris and ZFS (and of course to help the development of Solaris too, unless there are other plans ...). It has been discussed in many places that file systems do not change as frequently as the operating systems which is considered to be an issue when it comes to the implementation of newer and better technology. I intend to use a raidz2 setup on 8 disks attached to an LSI SAS1068 (LSI SAS3081E-R) based controller and if I decide to use a hot spare I will attach it to the SB750 controller of the system. If the hot spare kicks in I would probably want to swap it with the faulty hard drive on the LSI controller.> ... you should be able to move a disk to another location (channel, slot) > and it would still be a part of the same pool and VDEV. > > This works very well, given your controller properly supports it. ...I hope you are absolutely sure about this. The main reason I asked this question comes from the thread "Intel SASUC8I worth every penny" in this forum section where the thread starter warned that one should use "zpool export" "zpool import" when migrating a tank from one (set of) controller(s) to another. I didn''t mean to hurt anyone''s feelings here. I''m new to OpenSolaris and ZFS. When I asked these questions I just had finished reading the "OpenSolaris Bible" and the ZFS administration guide (819-5461) together with some of the pages on the opensolaris.com website, so I was merely quoting my sources when I said "newer versions of x supports y". Thank you for your replies they have been insightful. -- This message posted from opensolaris.org
Tonmaus
2010-Mar-20 14:35 UTC
[zfs-discuss] Usage of hot spares and hardware allocation capabilities.
> I know about those SoHo boxes and the whatnot, they > keep spinning up and down all the time and the worst > thing is that you cannot disable this sleep/powersave > feature on most of these devices.That to judge is in the eye of the beholder. We have a couple of Thecus NAS boxes and some LVM Raids on Ubuntu in the company which work like a charm with WD green drives spinning down on inactivity. A hot spare is typically inactive most of the time and it does spin down unless required. That''s because there are people in the Linux world who have a focus on implementing and maintaining power save options. I think that''s great.> I believe I have seen a "sleep mode" support when I > skimmed through the feature lists of the LSI > controllers but I''m not sure right now.Neither am I. LSIs feature list is centred around SAS support on their own drivers. What exactly will work after you have added SAS Expanders (which I haven''t but might do in the future) and attached SATA drives (which I have but might change some of them to SAS), running the native kernel driver (mpt) instead of LSI''s proprietary one, changing the firmware from LSI default IR to IT mode (which you will typically do for MPxIO), is a challenge facing the possible permutations. Bottom line: you will have to try.> My idea is rather that the "hot spares" (or perhaps > we should say "cold spares" then) are off all the > time until they are needed or when a user > initiated/scheduled system integrity check is being > conducted. They could go up for a "test spin" at each > occasion a scrub is initiated which is not too > frequently.I wouldn''t know anything that will work like you figure, including zfs. I.e., a hot spare is completely inactive during a scrub, but each ''zpool status'' command will return the state of the spare - which will work for a spun-down drive as well btw. The idea of test-spinning a hot spare is quite far-fetched on that background. Scrub as well isn''t a generic function to do a "system integrity" test, but has specific functionality in respect to a zpool. Putting things in the usual categories, your spare requirements are closer to having a cold spare, I would say. Maybe you can find a method starting from there.> > I figured > that real enterprise applications rather use Solaris > together with carefully selected hardware whereas > OpenSolaris is more aimed at lower-budget/mainstream > applications as a way of gaining a wider acceptance > for OpenSolaris and ZFSThat''s a selective perspective I would say. I.e., Fishworks is derived from Opensolaris directly while the feature set of Solaris is again quite different. The salient point for the enterprise solutions is that you pay for the services, among other that expensive engineers have figured out which components will provide the functions you have requested, and there will be somebody you can bark up to if things don''t work as advertised. For the white box, open approach this is up to you.>It has been discussed in many places that > file systems do not change as frequently as the > operating systems which is considered to be an issue > when it comes to the implementation of newer and > better technology.FWIW, ZFS is mutating quite dynamically.> > ... you should be able to move a disk to another > location (channel, slot) > > and it would still be a part of the same pool and > VDEV. > > > > This works very well, given your controller > properly supports it. ... > > I hope you are absolutely sure about this. The main > reason I asked this question comes from the thread > "Intel SASUC8I worth every penny" in this forum > section where the thread starter warned that one > should use "zpool export" "zpool import" when > migrating a tank from one (set of) controller(s) to > another.It''s not obvious that my assertion about mpt functions will apply when your want move from a contoller I figure is an AMD onboard (?) to an LSI SAS mpt. Instead, you will have to investigate what driver module your onboard controller will use, and what the properties of this driver are. Additionally, you will have to cross-check if the two drivers will have a behaviour that is compatible, i.e. the receiving controller doesn''t do any things based on implicit assumptions the sending controller didn''t provide. To give you an example what will happen when you have a drive on a controller that is driven by ahci (7M), the difference will be, among other, that before you pull the drive you will have to unconfigure it in cfgadm as an additional step. If you don''t observe that, you can blow things up. Moreover, ahci (7M) according to the specs I know will not support power save. Bottom line: you will have to find out. What the "warning" is concerned: migrating a whole pool is not the same thing as swapping slots within a pool. I.e., if you pull more than the allowed number (failover resilience) from your pool at the same time while the pool is hot, you will simply destroy the pool. Regards, Tonmaus -- This message posted from opensolaris.org
Bob Friesenhahn
2010-Mar-20 18:00 UTC
[zfs-discuss] Usage of hot spares and hardware allocation capabilities.
On Sat, 20 Mar 2010, Robin Axelsson wrote:> My idea is rather that the "hot spares" (or perhaps we should say > "cold spares" then) are off all the time until they are needed or > when a user initiated/scheduled system integrity check is being > conducted. They could go up for a "test spin" at each occasion a > scrub is initiated which is not too frequently.Solaris does include a power management function which is cable of spinning down idle disks. The disks are then spun up if they are accessed.> Perhaps I was a little too conclusive with my assumptions regarding > ZFS and OpenSolaris. I figured that real enterprise applications > rather use Solaris together with carefully selected hardware whereas > OpenSolaris is more aimed at lower-budget/mainstream applications as > a way of gaining a wider acceptance for OpenSolaris and ZFS (and of > course to help the development of Solaris too, unless there are > other plans ...). It has been discussed in many places that file > systems do not change as frequently as the operating systems which > is considered to be an issue when it comes to the implementation of > newer and better technology.It seems that most of your assumptions about Solaris/OpenSolaris are completely bogus and based on some other operating system. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Robin Axelsson
2010-Mar-21 11:19 UTC
[zfs-discuss] Usage of hot spares and hardware allocation capabilities.
The motherboard is AMD based and it has two controllers; one OnChip controller that is integrated into the SouthBridge chip (SB750) and an OnBoard controller using the JMicron JMB362 chip and a JMB322 port multiplier. Both controllers supports both AHCI and Native IDE mode which can be configured via the BIOS. I saw that Solaris has a power management tool called dtpower but from what I can see it cannot use different settings for individual drives. If I could, I would set the drives in the pool to be always on and the spares to turn off after say one hour. The problem is that the same setting applies to [u]all[/u] drives which is not so useful and if I want to access one drive I don''t want to wake up all the drives in the same pool (in this case the hot spares) or drives belonging to any other pool for that matter.>From what I can read in the OpenSolaris documentation cfgadm is used for connecting and disconnecting PCI hardware. I cannot see how it is used for disconnecting individual drives without disconnecting the entire HBA.Hard drives sure work like a charm until the day they fail. The WD EcoGreen probably has been around for about a year and my guess is that you purchased them about half a year ago. I''m willing to bet that you will not have the same opinion about them in about 4 years from now. I don''t know much about AHCI in OpensSolaris but it seems like there is an intention to implement power management features in the future: http://hub.opensolaris.org/bin/view/Community+Group+device_drivers/AHCI It says that port multipliers are not supported yet, I''m not really sure what that is supposed to mean. Port multipliers are usually not visible to the system (if I understand it correctly) and I couldn''t see any complaints about the JMB322 in the Hardware Compatibility Check Tool, (it wasn''t even visible). I found the following in the LSI user manual by the way: Supports power management ? Supports PCI power management 1.2 ? Supports Active State Power Management (ASPM), including the L0, L0s, L1 states, by placing links in a power-savings mode during times of no link activity Thank you for reminding me to flash the HBA to IT-Mode firmware, I was going to forget that. I''ll try out the behavior of ZFS once I get the system together, I''m still waiting for additional computer parts (a CPU backplate to mount the heat sink is missing). I''ll see if it is possible to at least cold swap hard drives between controllers and I already have a preconceived notion that this won''t be a problem. -- This message posted from opensolaris.org
zfs ml
2010-Mar-21 16:22 UTC
[zfs-discuss] Usage of hot spares and hardware allocation capabilities.
On 3/21/10 4:19 AM, Robin Axelsson wrote:> The motherboard is AMD based and it has two controllers; one OnChip controller that is integrated into the SouthBridge chip (SB750) and an OnBoard controller using the JMicron JMB362 chip and a JMB322 port multiplier. Both controllers supports both AHCI and Native IDE mode which can be configured via the BIOS. > > I saw that Solaris has a power management tool called dtpower but from what I can see it cannot use different settings for individual drives. If I could, I would set the drives in the pool to be always on and the spares to turn off after say one hour. The problem is that the same setting applies to [u]all[/u] drives which is not so useful and if I want to access one drive I don''t want to wake up all the drives in the same pool (in this case the hot spares) or drives belonging to any other pool for that matter. > > From what I can read in the OpenSolaris documentation cfgadm is used for connecting and disconnecting PCI hardware. I cannot see how it is used for disconnecting individual drives without disconnecting the entire HBA. > > Hard drives sure work like a charm until the day they fail. The WD EcoGreen probably has been around for about a year and my guess is that you purchased them about half a year ago. I''m willing to bet that you will not have the same opinion about them in about 4 years from now. > > I don''t know much about AHCI in OpensSolaris but it seems like there is an intention to implement power management features in the future: > > http://hub.opensolaris.org/bin/view/Community+Group+device_drivers/AHCIWe are living in the future: http://constantin.glez.de/blog/2010/03/opensolaris-home-server-scripting-2-setting-power-management> > It says that port multipliers are not supported yet, I''m not really sure what that is supposed to mean. Port multipliers are usually not visible to the system (if I understand it correctly) and I couldn''t see any complaints about the JMB322 in the Hardware Compatibility Check Tool, (it wasn''t even visible). > > I found the following in the LSI user manual by the way: > > Supports power management > ? Supports PCI power management 1.2 > ? Supports Active State Power Management (ASPM), including the > L0, L0s, L1 states, by placing links in a power-savings mode > during times of no link activity > > Thank you for reminding me to flash the HBA to IT-Mode firmware, I was going to forget that. I''ll try out the behavior of ZFS once I get the system together, I''m still waiting for additional computer parts (a CPU backplate to mount the heat sink is missing). I''ll see if it is possible to at least cold swap hard drives between controllers and I already have a preconceived notion that this won''t be a problem.