2012-03-26 7:58, Yuri Vorobyev wrote:> Hello.
>
> What the best practices for choosing ZFS volume volblocksize setting for
> VMware VMFS-5?
> VMFS-5 block size is 1Mb. Not sure how it corresponds with ZFS.
>
> Setup details follow:
> - 11 pairs of mirrors;
> - 600Gb 15k SAS disks;
> - SSDs for L2ARC and ZIL
> - COMSTAR FC target;
> - about 30 virtual machines, mostly Windows (so underlying file systems
> is NTFS with 4k block)
> - 3 ESXi hosts.
>
> Also, i will glad to hear volume layout suggestions.
> I see several options:
> - one big zvol with size equal size of the pool;
> - one big zvol with size of the pool-20% (to avoid fragmentation);
> - several zvols (size?).
>
> Thanks for attention.
You will still see fragmentation because that''s the way
ZFS works never overwriting recently-live data. It will
try to combine new updates to the pool (a transaction
group, TXG) as few large writes if contiguous blocks of
free space permit. And yes, reserving some space unused
should help against fragmentation.
I asked on the list, but got no response, whether space
reserved as an unused zvol can be used for such antifrag
reservation (to forbid other datasets'' writes into the
bytes otherwise unallocated).
Regarding the question on "many zvols": this can be useful.
For example, when using dedicated datasets (either zvols
or filesystem datasets) for each VM, you can easily clone
golden images into preconfigured VM guests on the ZFS
side with near-zero overhead (like Sun VDI does). You
can also easily expand a (cloned) zvol and then resize
the guest FS with its mechanisms, but shrinking is tough
(if at all possible) if you ever need it.
Also you could store VMs or their parts (i.e. their system
disks vs. data disks, or critical VMs vs. testing VMs) in
differently configured datasets (perhaps hosted on different
disks in the future, like 15k RPM vs. 7k RPM) - then it
would make sense to use different zvols (and pools).
Or perhaps if you''ll make a backup store for Veem backup
or any other solution, and emulate a tape drive for bulk
storage, you might want a zvol with maximum volblocksize
while you might use something else for live VM images...
Also note that if you ever plan to use ZFS snapshots,
then in case of zvols the system will reserve another
zvol size when making even the first snapshot (i.e. you
have a 1Gb swap, if you make a snapshot even when it''s
empty, reservation becomes 2Gb with 1Gb available to
user as the block device). This allows to completely
rewrite the zvol block device contents and guarantee
you''ll have enough space to keep both the snapshot
and new data.
So if snapshots are planned, zvols shouldn''t exceed
half of pool size. There is no such problem with
filesystem snapshots - those only use what has been
allocated and referred (not deleted) by some snapshot.
Also beware that if you use really small volblocksizes,
your pool''s metadata to address the volume''s blocks
would take a considerable amount of overhead compared
to your userdata (I''d expect roughly 1md:1ud with
minimal blocksize). I had such problems (and numbers)
a year ago and wrote on Sun Forums, parts of my woes
might make it to this list''s archive.
Again, this should not be a big problem with files
because those use variable length blocks and tend to
use large ones when there is enough pending writes,
so metadata portion is smaller.
So... my counter-question to you and the list: are there
substantial benefits to using ZFS as an iSCSI/zvol/VMFS5
provider instead of publishing an NFS service and storing
VM images as files? Both resources can be shared to several
clients (ESX hosts). I think that for a number of reasons the
NFS/files variant is more flexible. What are its drawbacks?
I see that you plan to make a COMSTAR FC target, so that
networking nuance is one reason for iSCSI vs. "IP over FC
to make NFS"... but in general, over jumbo ethernet -
which tool suits the task better? :)
I heard that VMWare has some smallish limit on the number
of NFS connections, but 30 should be bearable...
HTH,
//Jim Klimov