Hello. We''re using ZFS via iSCSI on a S10U8 system. As the ZFS Best Practices Guide http://j.mp/zfs-bp states, it''s advisable to use redundancy (ie. RAIDZ, mirroring or whatnot), even if the underlying storage does its own RAID thing. Now, our storage does RaID and the storage people say, it is impossible to have it export iSCSI devices which have no redundancy/ RAID. For "political" reasons, they seem to be unwilling to export 3 devices to the sun server. Would it be possible to use slices / format on this iSCSI device and then use 3 slices to make up one zdev? I mean, I know it''s "possible" to do that, but would it be advisable? Or would it rather be advisable to try to "force" the storage guys to export 3 devices (even if they still do their own RAID) and then setup RAIDZ using these 3 devices as "whole disks"? Actually, were would there be a difference? I mean, those iSCSI devices anyway don''t represent real disks/spindles, but it''s just some sort of abstractation. So, if they''d give me 3x400 GB compared to 1200 GB in one huge lump like they do now, it could be, that those would use the same spots on the real hard drives. Taking this into consideration, would there actually be a difference, when setting up a RAIDZ across 3 "whole disk" iSCSI devices, compared to doing a RAIDZ using 3 slices on a iSCSI "disk" device? Best regards, Alexander
On Tue, Sep 21, 2010 at 05:48:09PM +0200, Alexander Skwar wrote:> > We''re using ZFS via iSCSI on a S10U8 system. As the ZFS Best > Practices Guide http://j.mp/zfs-bp states, it''s advisable to use > redundancy (ie. RAIDZ, mirroring or whatnot), even if the underlying > storage does its own RAID thing. > > Now, our storage does RaID and the storage people say, it is > impossible to have it export iSCSI devices which have no redundancy/ > RAID.If you have a reliable Iscsi SAN and a reliable storage device, you don''t need the additional redundancy provided by ZFS.> Actually, were would there be a difference? I mean, those iSCSI > devices anyway don''t represent real disks/spindles, but it''s just > some sort of abstractation. So, if they''d give me 3x400 GB compared > to 1200 GB in one huge lump like they do now, it could be, that > those would use the same spots on the real hard drives.Suppose they gave you two huge lumps of storage from the SAN, and you mirrored them with ZFS. What would you do if ZFS reported that one of its two disks had failed and needed to be replaced? You can''t do disk management with ZFS in this situation anyway because those aren''t real disks. Disk management all has to be done on the SAN storage device. -- -Gary Mills- -Unix Group- -Computer and Network Services-
Hi! 2010/9/23 Gary Mills <mills at cc.umanitoba.ca>> > On Tue, Sep 21, 2010 at 05:48:09PM +0200, Alexander Skwar wrote: > > > > We''re using ZFS via iSCSI on a S10U8 system. As the ZFS Best > > Practices Guide http://j.mp/zfs-bp states, it''s advisable to use > > redundancy (ie. RAIDZ, mirroring or whatnot), even if the underlying > > storage does its own RAID thing. > > > > Now, our storage does RaID and the storage people say, it is > > impossible to have it export iSCSI devices which have no redundancy/ > > RAID. > > If you have a reliable Iscsi SAN and a reliable storage device, you > don''t need the additional redundancy provided by ZFS.Okay. This contradicts the ZFS Best Practices Guide, which states: # For production environments, configure ZFS so that # it can repair data inconsistencies. Use ZFS redundancy, # such as RAIDZ, RAIDZ-2, RAIDZ-3, mirror, or copies > 1, # regardless of the RAID level implemented on the # underlying storage device. With such redundancy, faults in the # underlying storage device or its connections to the host can # be discovered and repaired by ZFS.> > > Actually, were would there be a difference? I mean, those iSCSI > > devices anyway don''t represent real disks/spindles, but it''s just > > some sort of abstractation. So, if they''d give me 3x400 GB compared > > to 1200 GB in one huge lump like they do now, it could be, that > > those would use the same spots on the real hard drives. > > Suppose they gave you two huge lumps of storage from the SAN, and you > mirrored them with ZFS. ?What would you do if ZFS reported that one of > its two disks had failed and needed to be replaced? ?You can''t do disk > management with ZFS in this situation anyway because those aren''t real > disks. ?Disk management all has to be done on the SAN storage device.Yes. I was rather thinking about RAIDZ instead of mirroring. Anyway. Without redundancy, ZFS cannot do recovery, can it? As far as I understand, it could detect block level corruption, even if there''s not redundancy. But it could not correct such a corruption. Or is that a wrong understanding? If I got the gist of what you wrote, it boils down to how reliable the SAN is? But also SANs could have "block level" corruption, no? I''m a bit confused, because of the (perceived?) contra- diction to the Best Practices Guide? :) Best regards, Alexander -- ???? Lifestream (Twitter, Blog, ?) ??http://alexs77.soup.io/? ?? ? ? Chat (Jabber/Google Talk) ? a.skwar at gmail.com , AIM: alexws77? ?
On Fri, Sep 24, 2010 at 12:01:35AM +0200, Alexander Skwar wrote:> > > > Suppose they gave you two huge lumps of storage from the SAN, and you > > mirrored them with ZFS. ?What would you do if ZFS reported that one of > > its two disks had failed and needed to be replaced? ?You can''t do disk > > management with ZFS in this situation anyway because those aren''t real > > disks. ?Disk management all has to be done on the SAN storage device. > > Yes. I was rather thinking about RAIDZ instead of mirroring.I was just using a simpler example.> Anyway. Without redundancy, ZFS cannot do recovery, can > it? As far as I understand, it could detect block level corruption, > even if there''s not redundancy. But it could not correct such a > corruption. > > Or is that a wrong understanding?That''s correct, but it also should never happen.> If I got the gist of what you wrote, it boils down to how reliable > the SAN is? But also SANs could have "block level" corruption, > no? I''m a bit confused, because of the (perceived?) contra- > diction to the Best Practices Guide? :)The real problem is that ZFS was not designed to run in a SAN environment, that is one where all of the disk management and sufficient redundancy reside in the storage device on the SAN. ZFS certainly can''t do any disk management in this situation. Error detection and correction is still a debatable issue, one that quickly becomes exceedingly complex. The decision rests on probabilities rather than certainties. -- -Gary Mills- -Unix Group- -Computer and Network Services-
Alexander Skwar wrote:> Okay. This contradicts the ZFS Best Practices Guide, > which states: > > # For production environments, configure ZFS so that > # it can repair data inconsistencies. Use ZFS > redundancy, > # such as RAIDZ, RAIDZ-2, RAIDZ-3, mirror, or copies > > 1, > # regardless of the RAID level implemented on the > # underlying storage device. With such redundancy, > faults in the > # underlying storage device or its connections to the > host can > # be discovered and repaired by ZFS.<snip>> Anyway. Without redundancy, ZFS cannot do recovery, > can > it? As far as I understand, it could detect block > level corruption, > even if there''s not redundancy. But it could not > correct such a > corruption. > > Or is that a wrong understanding? > > If I got the gist of what you wrote, it boils down to > how reliable > the SAN is? But also SANs could have "block level" > corruption, > no? I''m a bit confused, because of the (perceived?) > contra- > diction to the Best Practices Guide? :)This comes down to how much you trust your "storage device" whatever that may be. If you have full faith in your SAN (and I don''t have full faith in it, no matter what its make/model), then ignore ZFS redundancy. When I first deployed a hardware RAID solution around 1995, the vendor proudly stated that the device could scrub mirrors and correct errors. I asked when it found a discrepancy, how did it know which side of the mirror was correct? He stammered for a while, but it basically came down to the device flipping a coin. ZFS will ensure integrity, even when the underlying device fumbles. When you mirror the iSCSI devices, be sure that they are configured in such a way that a failure on one iSCSI "device" does not imply a failure on the other iSCSI device. As a simple example, if you sliced a disk into three partitions and then presented them as a three way mirror to ZFS, then a single disk failure will wipe out everything, even though you have the illusion of redundancy at the ZFS level. I have seen some systems where the SAN has presented what appeared to be independent devices, but a failure on the underlying disk faulted both devices, rendering ZFS helpless. Good luck, Marty -- This message posted from opensolaris.org
Hello. 2010/9/24 Marty Scholes <martyscholes at yahoo.com>:> ZFS will ensure integrity, even when the underlying device fumbles.Yes.> When you mirror the iSCSI devices, be sure that they are configured > in such a way that a failure on one iSCSI "device" does not imply a > failure on the other iSCSI device.Very good point! What we''re actually going to do, is to use mirroring. I found out, that "them storage people" have the storage mirrored on two locations and I''m going to get one device from both locations. Since it''s at two locations in a fail over environment (as far as the storage is concerned), I''d say it''s certain that one side of the mirror failing doesn''t harm the other side. Thanks, Alexander -- ???? Lifestream (Twitter, Blog, ?) ??http://alexs77.soup.io/? ?? ? ? Chat (Jabber/Google Talk) ? a.skwar at gmail.com , AIM: alexws77? ?
Hello again! 2010/9/24 Gary Mills <mills at cc.umanitoba.ca>:> On Fri, Sep 24, 2010 at 12:01:35AM +0200, Alexander Skwar wrote:>> Yes. I was rather thinking about RAIDZ instead of mirroring. > > I was just using a simpler example.Understood. Like I just wrote, we''re actually now going to use mirroring, so that I''ve got "some space" from site A and an identical amount from site B. Got to correct my other posting - I don''t know if they are doing fail over on the storage side. Need to find out. BTW: To refer back to my original question: If disk management is to be ignored (because SAN does it for us), are there any differences (even if internally) if I use a slices on "disk" devices, compared to whole disk, as far as ZFS is concerned? Of course, it''s understood that it''s generally a rather bad idea to make a RAIDZ from 3 slices on the same disk device, if this actually represents one spindle. Not to be recommended, at least not for production, at least not generally. But what about a SAN environment? How do 3 "pseudo devices" of 400 GB each differ compared to one huge 1200 GB lump? Even though it''s not really an issue for me anymore, since we''re going to do mirroring from 2 sites, it''s a question which I find interesting ;) From my limited knowledge, I''d say that there is *NO* difference.>> Anyway. Without redundancy, ZFS cannot do recovery, can >> it? As far as I understand, it could detect block level corruption, >> even if there''s not redundancy. But it could not correct such a >> corruption. >> >> Or is that a wrong understanding? > > That''s correct, but it also should never happen."should" ;)> >> If I got the gist of what you wrote, it boils down to how reliable >> the SAN is? But also SANs could have "block level" corruption, >> no? I''m a bit confused, because of the (perceived?) contra- >> diction to the Best Practices Guide? :) > > The real problem is that ZFS was not designed to run in a SAN > environment, that is one where all of the disk management and > sufficient redundancy reside in the storage device on the SAN. ?ZFS > certainly can''t do any disk management in this situation.Yes, disk management of course can''t be done.>?Error > detection and correction is still a debatable issue, one that quickly > becomes exceedingly complex.When I''ve got a mirror in ZFS, I''ve already got the needed redundancy that ZFS requires, so that it can do its magic, don''t I? Thanks, Alexander -- ???? Lifestream (Twitter, Blog, ?) ??http://alexs77.soup.io/? ?? ? ? Chat (Jabber/Google Talk) ? a.skwar at gmail.com , AIM: alexws77? ?