thr3ads.net - zfs discuss - [zfs-discuss] storage type for ZFS [Apr 2007]

If this information is useful, please help other people find it:
Share via:

tester

2007-Apr-17 09:09 UTC

[zfs-discuss] storage type for ZFS

The paragraph below is from ZFS admin guide

Traditional Volume Management
As described in ?ZFS Pooled Storage? on page 18, ZFS eliminates the need for a
separate volume
manager. ZFS operates on raw devices, so it is possible to create a storage pool
comprised of logical
volumes, either software or hardware. This configuration is not recommended, as
ZFS works best
when it uses raw physical devices. Using logical volumes might sacrifice
performance, reliability, or
both, and should be avoided

Does this mean EMC/Hitachi and other SAN provisioned storage(RAID LUN) is not 
suitable  storage for ZFS? Please clarify

thanks
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-Apr-17 11:22 UTC

head link

[zfs-discuss] storage type for ZFS

Hello tester,

Tuesday, April 17, 2007, 11:09:34 AM, you wrote:

t> The paragraph below is from ZFS admin guide

t> Traditional Volume Management
t> As described in ?ZFS Pooled Storage? on page 18, ZFS eliminates the need
for a separate volume
t> manager. ZFS operates on raw devices, so it is possible to create
t> a storage pool comprised of logical
t> volumes, either software or hardware. This configuration is not
recommended, as ZFS works best
t> when it uses raw physical devices. Using logical volumes might
t> sacrifice performance, reliability, or
t> both, and should be avoided

t> Does this mean EMC/Hitachi and other SAN provisioned storage(RAID
t> LUN) is not  suitable  storage for ZFS? Please clarify

It''s about *software* volume managers.

It means that it''s not generally recommended to create a raid volume
using VxVM or SVM and then put zfs on top of it. It will work it''s
just not recommended. In some situations it would actually make sense
(like creating raid-5 in SVM on VxVM and put zf on top of it if you
need much better random io read throughput).


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Richard L. Hamilton

2007-Apr-18 05:35 UTC

head link

[zfs-discuss] Re: storage type for ZFS

Well, no; his quote did say "software or hardware".  The theory is
apparently
that ZFS can do better at detecting (and with redundancy, correcting) errors
if it''s dealing with raw hardware, or as nearly so as possible.  Most
SANs
_can_ hand out raw LUNs as well as RAID LUNs, the folks that run them are
just not used to doing it.

Another issue that may come up with SANs and/or hardware RAID:
supposedly, storage systems with large non-volatile caches will tend to have
poor performance with ZFS, because ZFS issues cache flush commands as
part of committing every transaction group; this is worse if the filesystem
is also being used for NFS service.  Most such hardware can be
configured to ignore cache flushing commands, which is safe as long as
the cache is non-volatile.

The above is simply my understanding of what I''ve read; I could be way
off
base, of course.
 
 
This message posted from opensolaris.org

Roch - PAE

2007-Apr-18 08:40 UTC

head link

[zfs-discuss] Re: storage type for ZFS

Richard L. Hamilton writes:
 > Well, no; his quote did say "software or hardware".  The theory
is apparently
 > that ZFS can do better at detecting (and with redundancy, correcting)
errors
 > if it''s dealing with raw hardware, or as nearly so as possible. 
Most SANs
 > _can_ hand out raw LUNs as well as RAID LUNs, the folks that run them are
 > just not used to doing it.
 > 
 > Another issue that may come up with SANs and/or hardware RAID:
 > supposedly, storage systems with large non-volatile caches will tend to
have
 > poor performance with ZFS, because ZFS issues cache flush commands as
 > part of committing every transaction group; this is worse if the
filesystem
 > is also being used for NFS service.  Most such hardware can be
 > configured to ignore cache flushing commands, which is safe as long as
 > the cache is non-volatile.
 > 
 > The above is simply my understanding of what I''ve read; I could
be way off
 > base, of course.
 >  

Sounds good to me. The first point is easy to understand. If
you  rely on ZFS for   data reconstruction; carve virtual
luns out of your storage and  mirror those luns in ZFS, then it''s
possible that  both  copies of  a mirrored block  end up  on a
single physical device.

Performance wise, the  ZFS  I/O scheduler might interact  in
interesting way with  the one in   the storage, but I  don''t
know if this has been studied in depth.

-r


 >  
 > This message posted from opensolaris.org
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Robert Milkowski

2007-Apr-18 09:15 UTC

head link

[zfs-discuss] Re: storage type for ZFS

Hello Richard,

Wednesday, April 18, 2007, 7:35:24 AM, you wrote:

RLH> Well, no; his quote did say "software or hardware".

Right, I missed that.


RLH>   The theory is apparently
RLH> that ZFS can do better at detecting (and with redundancy, correcting)
errors
RLH> if it''s dealing with raw hardware, or as nearly so as possible.
Most SANs
RLH> _can_ hand out raw LUNs as well as RAID LUNs, the folks that run them
are
RLH> just not used to doing it.


Detecting errors by zfs is the same regardless when redundancy is done
- by default checksums are checked for each block in every pool
configuration.  Now correction is another story and basically you need
to create redundant pool (exception to that are meta data, and with
introduction of user ditto block also user data to some extent).


Now when it comes to HW RAID I wouldn''t actually recommend doing RAID
in ZFS, at least not always and not now.
First reason is lacking hot spare support in zfs right now.
In many scenarios when one disk goes wild zfs won''t really notice and
your hot spare won''t kick-in, and you end up doing silly things while
your pool is not serving data.
As I understand the problem with hot spare is being worked on.


Then if you want RAID5 (yes, people want RAID5) zfs could give you
much less performance for some workloads and for such workloads HW
RAID5 (or even SVM, VxVM RAID-5 with zfs on top) with zfs on top as a
file system (or additionally with dynamic striping between hw r5 luns)
actually makes sense.

If you need RAID-10 and you need lot of bandwidth for sequential
writes (or even not sequential in case of zfs) doing it in software
will halve your actual performance in most cases as you have to move
out twice much data when doing software raid.

Also exposing disks as LUNs is not that well tested on arrays simply
just it''s been used much less. For example I had a problem with EMC
CX3-40 when each disk was exposed as a LUN and raid-10 zfs pool was
created - when I pulled out a
disk the array didn''t catch it on a lun level neither a host - IOs
were queuing up, I waited for about an hour (this was a test system)
and nothing happened. Of course hot spare didn''t kick in.

SVM, VxVM or HW hot spares just work.
ZFS hot spare support is far behind right now.
It has better potential but it just doesn''t work properly in many
failure scenarios.

RLH> Another issue that may come up with SANs and/or hardware RAID:
RLH> supposedly, storage systems with large non-volatile caches will tend to
have
RLH> poor performance with ZFS, because ZFS issues cache flush commands as
RLH> part of committing every transaction group; this is worse if the
filesystem
RLH> is also being used for NFS service.  Most such hardware can be
RLH> configured to ignore cache flushing commands, which is safe as long as
RLH> the cache is non-volatile.

In most arrays (if not in all) exposing physical disk as a lun won''t
solve above, so it''s not a choice between doing raid in hw or in zfs.


As always, if you really care about performance and availability you
have to know what you are doing. And while zfs does some miracles in
some environments it actually makes sense to do raid in HW or other
volume manager and use zfs solely as a file system.
In case when doing raid in HW and using zfs as a file system I would
recommend always exposing 3 luns or more (or at least 2) and then do a
dynamic striping on zfs side. Of course those luns should be on
different physical disks. That way you should have a better protection
for your meta data.



-- 
Best regards,
 Robert Milkowski                      mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

tester

2007-Apr-18 09:20 UTC

head link

[zfs-discuss] Re: storage type for ZFS

is cache flushing part of the SCSI protocol? If not, how does ZFS become aware
of the array specific command?

Thanks
 
 
This message posted from opensolaris.org

Leon Koll

2007-Apr-18 15:18 UTC

head link

[zfs-discuss] Re: storage type for ZFS

Yes, it is:
SYNCHRONIZE_CACHE(10) opcode 0x35
SYNCHRONIZE_CACHE(16) opcode 0x91


[i]-- leon[/i]
 
 
This message posted from opensolaris.org

eric kustarz

2007-Apr-18 20:53 UTC

head link

[zfs-discuss] Re: storage type for ZFS

On Apr 18, 2007, at 2:35 AM, Richard L. Hamilton wrote:
> Well, no; his quote did say "software or hardware".  The theory
is
> apparently
> that ZFS can do better at detecting (and with redundancy,  
> correcting) errors
> if it''s dealing with raw hardware, or as nearly so as possible.   
> Most SANs
> _can_ hand out raw LUNs as well as RAID LUNs, the folks that run  
> them are
> just not used to doing it.
>
> Another issue that may come up with SANs and/or hardware RAID:
> supposedly, storage systems with large non-volatile caches will  
> tend to have
> poor performance with ZFS, because ZFS issues cache flush commands as
> part of committing every transaction group; this is worse if the  
> filesystem
> is also being used for NFS service.  Most such hardware can be
> configured to ignore cache flushing commands, which is safe as long as
> the cache is non-volatile.
The non-volatile cache issues are being covered by:
6462690 sd driver should set SYNC_NV bit when issuing SYNCHRONIZE  
CACHE to SBC-2 devices
PSARC 2007/053

The PSARC case has been approved, and Grant is finishing up the code  
changes.

eric

Robert Milkowski

2007-Apr-19 01:41 UTC

head link

[zfs-discuss] Re: storage type for ZFS

Hello eric,

Wednesday, April 18, 2007, 10:53:59 PM, you wrote:

ek> On Apr 18, 2007, at 2:35 AM, Richard L. Hamilton wrote:
>> Well, no; his quote did say "software or hardware".  The
theory is
>> apparently
>> that ZFS can do better at detecting (and with redundancy,  
>> correcting) errors
>> if it''s dealing with raw hardware, or as nearly so as
possible.
>> Most SANs
>> _can_ hand out raw LUNs as well as RAID LUNs, the folks that run  
>> them are
>> just not used to doing it.
>>
>> Another issue that may come up with SANs and/or hardware RAID:
>> supposedly, storage systems with large non-volatile caches will  
>> tend to have
>> poor performance with ZFS, because ZFS issues cache flush commands as
>> part of committing every transaction group; this is worse if the  
>> filesystem
>> is also being used for NFS service.  Most such hardware can be
>> configured to ignore cache flushing commands, which is safe as long as
>> the cache is non-volatile.
ek> The non-volatile cache issues are being covered by:
ek> 6462690 sd driver should set SYNC_NV bit when issuing SYNCHRONIZE  
ek> CACHE to SBC-2 devices
ek> PSARC 2007/053

ek> The PSARC case has been approved, and Grant is finishing up the code  
ek> changes.

Has an analysis of most common storage system been done on how they
treat SYNC_NV bit and if any additional tweaking is needed? Would such
analysis be publicly available?

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

eric kustarz

2007-Apr-20 13:36 UTC

head link

[zfs-discuss] Re: storage type for ZFS

>
> Has an analysis of most common storage system been done on how they
> treat SYNC_NV bit and if any additional tweaking is needed? Would such
> analysis be publicly available?
>
I am not aware of any analysis and would love to see it done (i''m  
sure any vendors who are lurking on this list that support the  
SYNC_NV would surely want to speak up now).

Due to not every vendor not supporting SYNC_NV, our solution is to  
first see if SYNC_NV is supported and if not, then provide a config  
file (as a short term necessity) that you can hardcore certain  
products to act as if they support SYNC_NV (which we would then not  
send a flushing of the cache).  If the SYNC_NV bit is not supported  
and the config file is not updated for the device, then we do what we  
do today.

But if anyone knows for certain if a particular device supports  
SYNC_NV, please post...

eric

Robert Milkowski

2007-Apr-20 14:30 UTC

head link

[zfs-discuss] Re: storage type for ZFS

Hello eric,

Friday, April 20, 2007, 3:36:20 PM, you wrote:
>>
>> Has an analysis of most common storage system been done on how they
>> treat SYNC_NV bit and if any additional tweaking is needed? Would such
>> analysis be publicly available?
>>
ek> I am not aware of any analysis and would love to see it done
(i''m
ek> sure any vendors who are lurking on this list that support the  
ek> SYNC_NV would surely want to speak up now).

ek> Due to not every vendor not supporting SYNC_NV, our solution is to  
ek> first see if SYNC_NV is supported and if not, then provide a config  
ek> file (as a short term necessity) that you can hardcore certain  
ek> products to act as if they support SYNC_NV (which we would then not  
ek> send a flushing of the cache).  If the SYNC_NV bit is not supported  
ek> and the config file is not updated for the device, then we do what we
ek> do today.

ek> But if anyone knows for certain if a particular device supports  
ek> SYNC_NV, please post...

Why config file and not a property for a pool?
Ahhhh.... pool can have disks from different arrays :)

Useful thing would be to ba able to keep that config file in a pool so
if one exports/imports to different server... you get the idea.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Reasonably Related Threads

Search for more possibly parallel threads

zfs discuss - Apr 2007 - storage type for ZFS

[zfs-discuss] storage type for ZFS

[zfs-discuss] storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

[zfs-discuss] Re: storage type for ZFS

Reasonably Related Threads