thr3ads.net - zfs discuss - [zfs-discuss] Pools inside pools [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Markus Kovero

2010-Sep-22 14:06 UTC

[zfs-discuss] Pools inside pools

Hi, I''m asking for opinions here, any possible disaster happening or
performance issues related in setup described below.
Point being to create large pool and smaller pools within where you can monitor
easily iops and bandwidth usage without using dtrace or similar techniques.

1. Create pool

# zpool create testpool mirror c1t1d0 c1t2d0

2. Create volume inside a pool we just created

# zfs create -V 500g testpool/testvolume

3. Create pool from volume we just did

# zpool create anotherpool /dev/zvol/dsk/testpool/testvolume

After this, anotherpool can be monitored via zpool iostat nicely and compression
can be used in testpool to save resources without having compression effect in
anotherpool.

zpool export/import seems to work, although flag -d needs to be used, are there
any caveats in this setup? How writes are handled?
Is it safe to create pool consisting several ssd''s and use volumes from
it as log-devices? Is it even supported?

Yours
Markus Kovero

Pawel Jakub Dawidek

2010-Sep-22 17:12 UTC

head link

[zfs-discuss] Pools inside pools

On Wed, Sep 22, 2010 at 02:06:27PM +0000, Markus Kovero
wrote:> Hi, I''m asking for opinions here, any possible disaster happening
or performance issues related in setup described below.
> Point being to create large pool and smaller pools within where you can
monitor easily iops and bandwidth usage without using dtrace or similar
techniques.
> 
> 1. Create pool
> 
> # zpool create testpool mirror c1t1d0 c1t2d0
> 
> 2. Create volume inside a pool we just created
> 
> # zfs create -V 500g testpool/testvolume
> 
> 3. Create pool from volume we just did
> 
> # zpool create anotherpool /dev/zvol/dsk/testpool/testvolume
> 
> After this, anotherpool can be monitored via zpool iostat nicely and
compression can be used in testpool to save resources without having compression
effect in anotherpool.
> 
> zpool export/import seems to work, although flag -d needs to be used, are
there any caveats in this setup? How writes are handled?
> Is it safe to create pool consisting several ssd''s and use volumes
from it as log-devices? Is it even supported?
Such configuration was known to cause deadlocks. Even if it works now
(which I don''t expect to be the case) it will make your data to be
cached twice. The CPU utilization will also be much higher, etc.
All in all I strongly recommend against such setup.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100922/d3f75b0b/attachment.bin>

Markus Kovero

2010-Sep-22 18:15 UTC

head link

[zfs-discuss] Pools inside pools

> Such configuration was known to cause deadlocks. Even if it works now
(which I don''t expect to be the case) it will make your data to be
cached twice. The CPU utilization > will also be much higher, etc.
> All in all I strongly recommend against such setup.
> -- 
> Pawel Jakub Dawidek                       http://www.wheelsystems.com
> pjd at FreeBSD.org                           http://www.FreeBSD.org
> FreeBSD committer                         Am I Evil? Yes, I Am!
Well, CPU utilization can be tuned downwards by disabling checksums in inner
pools as checksumming is done in main pool. I''d be interested in bug
id''s for deadlock issues and everything related. Caching twice is not
an issue, prefetching could be and it can be disabled
I don''t understand what makes it difficult for zfs to handle this kind
of setup. Main pool (testpool) should just allow any writes/reads to/from
volume, not caring what they are, where as anotherpool would just work as any
other pool consisting of any other devices.
This is quite similar setup to iscsi-replicated mirror pool, where you have
redundant pool created from iscsi volumes locally and remotely.

Yours
Markus Kovero

Erik Trimble

2010-Sep-22 18:40 UTC

head link

[zfs-discuss] Pools inside pools

On 9/22/2010 11:15 AM, Markus Kovero wrote:>> Such configuration was known to cause deadlocks. Even if it works now
(which I don''t expect to be the case) it will make your data to be
cached twice. The CPU utilization>  will also be much higher, etc.
>> All in all I strongly recommend against such setup.
>> -- 
>> Pawel Jakub Dawidek                       http://www.wheelsystems.com
>> pjd at FreeBSD.org                           http://www.FreeBSD.org
>> FreeBSD committer                         Am I Evil? Yes, I Am!
> Well, CPU utilization can be tuned downwards by disabling checksums in
inner pools as checksumming is done in main pool. I''d be interested in
bug id''s for deadlock issues and everything related. Caching twice is
not an issue, prefetching could be and it can be disabled
> I don''t understand what makes it difficult for zfs to handle this
kind of setup. Main pool (testpool) should just allow any writes/reads to/from
volume, not caring what they are, where as anotherpool would just work as any
other pool consisting of any other devices.
> This is quite similar setup to iscsi-replicated mirror pool, where you have
redundant pool created from iscsi volumes locally and remotely.
>
> Yours
> Markus Kovero
Actually, the mechanics of local pools inside pools is significantly 
different than using remote volumes (potentially exported ZFS volumes) 
to build a local pool from.

And, no, you WOULDN''T want to turn off the "inside"
pool''s checksums.
You''re assuming that this would be taken care of by the outside pool, 
but that''s a faulty assumption, since the only way this would happen 
would be if the pools somehow understood they were being nested, and 
thus could "bypass" much of the caching and I/O infrastructure related
to the inner pool.

Cacheing is also a huge issue, since ZFS isn''t known for being 
memory-slim, and as caching is done (currently) on a per-pool level, 
nested pools will consume significantly more RAM.  Without caching the 
inner pool, performance is going to suck (even if some blocks are cached 
in the outer pool, that pool has no way to do look-ahead, nor other 
actions). The nature of delayed writes can also wreck havoc with caching 
at both pool levels.

Stupid filesystems have no issues with nesting, as they''re not doing 
anything besides (essentially) direct I/O to the underlying devices. UFS 
doesn''t have its own I/O subsystem, nor do things like ext* or xfs.  
However, I''ve yet to see any "modern" filesystem do well with
nesting
itself - there''s simply too much going on under the hood, and without 
being "nested-aware" (i.e. specifically coding the filesystem to 
understand when it''s being nested), much of these backend optimizations
are a recipe for conflict .

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Markus Kovero

2010-Sep-22 19:01 UTC

head link

[zfs-discuss] Pools inside pools

> Actually, the mechanics of local pools inside pools is significantly 
> different than using remote volumes (potentially exported ZFS volumes) 
> to build a local pool from.
I don''t see how, I''m referring to method where hostA shares
local iscsi volume to hostB where volume is being mirrored with zfs to its local
volume that is shared through iscsi, resulting sync mirrored pool.
> And, no, you WOULDN''T want to turn off the "inside"
pool''s checksums.
> You''re assuming that this would be taken care of by the outside
pool,
> but that''s a faulty assumption, since the only way this would
happen
> would be if the pools somehow understood they were being nested, and 
> thus could "bypass" much of the caching and I/O infrastructure
related
> to the inner pool.
Good point. Checksums it is then.
> Cacheing is also a huge issue, since ZFS isn''t known for being 
> memory-slim, and as caching is done (currently) on a per-pool level, 
> nested pools will consume significantly more RAM.  Without caching the 
> inner pool, performance is going to suck (even if some blocks are cached 
> in the outer pool, that pool has no way to do look-ahead, nor other 
> actions). The nature of delayed writes can also wreck havoc with caching 
> at both pool levels.
Well, again, I don''t see how nested pool would consume more RAM than
invidual another pool created from dedicated disks.
Read caching takes place twice, but I don''t see it much of as problem
nowadays, just double the ram. (ofcourse, depending on workload)
look-ahead (prefetch?) hasn''t work very well anyway so it''s
gong to be disabled, cache hit isn''t great (worth it) on any workload.
Also, write caching needs to be benchmarked, but I''d say, if it works
like it should, there is no issues there, have to test it out thoroughly though.
> Stupid filesystems have no issues with nesting, as they''re not
doing
> anything besides (essentially) direct I/O to the underlying devices. UFS 
> doesn''t have its own I/O subsystem, nor do things like ext* or
xfs.
> However, I''ve yet to see any "modern" filesystem do well
with nesting
> itself - there''s simply too much going on under the hood, and
without
> being "nested-aware" (i.e. specifically coding the filesystem to 
> understand when it''s being nested), much of these backend
optimizations
> are a recipe for conflict .
> -- 
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
Thanks for your thoughts, if issues are performance related, they can be dealt
with to some extent, more I''m worrying if there is still deadlock
issues
or other general stability issues to consider, haven''t found anything
useful from bugtraq yet though.


Yours
Markus Kovero

Mattias Pantzare

2010-Sep-22 19:44 UTC

head link

[zfs-discuss] Pools inside pools

On Wed, Sep 22, 2010 at 20:15, Markus Kovero <Markus.Kovero at nebula.fi>
wrote:>
>
>> Such configuration was known to cause deadlocks. Even if it works now
(which I don''t expect to be the case) it will make your data to be
cached twice. The CPU utilization > will also be much higher, etc.
>> All in all I strongly recommend against such setup.
>
>> --
>> Pawel Jakub Dawidek ? ? ? ? ? ? ? ? ? ? ? http://www.wheelsystems.com
>> pjd at FreeBSD.org ? ? ? ? ? ? ? ? ? ? ? ? ? http://www.FreeBSD.org
>> FreeBSD committer ? ? ? ? ? ? ? ? ? ? ? ? Am I Evil? Yes, I Am!
>
> Well, CPU utilization can be tuned downwards by disabling checksums in
inner pools as checksumming is done in main pool. I''d be interested in
bug id''s for deadlock issues and everything related. Caching twice is
not an issue, prefetching could be and it can be disabled
> I don''t understand what makes it difficult for zfs to handle this
kind of setup. Main pool (testpool) should just allow any writes/reads to/from
volume, not caring what they are, where as anotherpool would just work as any
other pool consisting of any other devices.
> This is quite similar setup to iscsi-replicated mirror pool, where you have
redundant pool created from iscsi volumes locally and remotely.
ZFS needs free memory for writes. If you fill your memory with dirty
data zfs has to flush that data to disk. If that disk is a virtual
disk in zfs on the same computer those writes need more memory from
the same memory pool and you have a deadlock.
If you write to a zvol on a different host (via iSCSI) those writes
use memory in a different memory pool (on the other computer). No
deadlock.

Maurice Volaski

2010-Sep-22 21:56 UTC

head link

[zfs-discuss] Pools inside pools

>If you write to a zvol on a different host (via iSCSI) those writes
>use memory in a different memory pool (on the other computer). No
>deadlock.
I would expect in a usual configuration that one side of a mirrored 
iSCSI-based pool would be on the same host as it''s underlying
zvol''s
pool.
-- 

Maurice Volaski, maurice.volaski at einstein.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University

Markus Kovero

2010-Sep-23 05:18 UTC

head link

[zfs-discuss] Pools inside pools

>>If you write to a zvol on a different host (via iSCSI) those writes
>>use memory in a different memory pool (on the other computer). No
>>deadlock.
>I would expect in a usual configuration that one side of a mirrored 
>iSCSI-based pool would be on the same host as it''s underlying
zvol''s
>pool.
Thats what I was after. Would using log-device in inner pool make things
different then? If presumed workload is eg. serving nfs.

Yours
Markus Kovero

Haudy Kazemi

2010-Sep-23 06:48 UTC

head link

[zfs-discuss] Pools inside pools

Mattias Pantzare wrote:> On Wed, Sep 22, 2010 at 20:15, Markus Kovero <Markus.Kovero at
nebula.fi> wrote:
>   
>>     
>>> Such configuration was known to cause deadlocks. Even if it works
now (which I don''t expect to be the case) it will make your data to be
cached twice. The CPU utilization > will also be much higher, etc.
>>> All in all I strongly recommend against such setup.
>>>       
>>> --
>>> Pawel Jakub Dawidek                      
http://www.wheelsystems.com
>>> pjd at FreeBSD.org                           http://www.FreeBSD.org
>>> FreeBSD committer                         Am I Evil? Yes, I Am!
>>>       
>> Well, CPU utilization can be tuned downwards by disabling checksums in
inner pools as checksumming is done in main pool. I''d be interested in
bug id''s for deadlock issues and everything related. Caching twice is
not an issue, prefetching could be and it can be disabled
>> I don''t understand what makes it difficult for zfs to handle
this kind of setup. Main pool (testpool) should just allow any writes/reads
to/from volume, not caring what they are, where as anotherpool would just work
as any other pool consisting of any other devices.
>> This is quite similar setup to iscsi-replicated mirror pool, where you
have redundant pool created from iscsi volumes locally and remotely.
>>     
>
> ZFS needs free memory for writes. If you fill your memory with dirty
> data zfs has to flush that data to disk. If that disk is a virtual
> disk in zfs on the same computer those writes need more memory from
> the same memory pool and you have a deadlock.
> If you write to a zvol on a different host (via iSCSI) those writes
> use memory in a different memory pool (on the other computer). No
> deadlock.Isn''t this a matter of not keeping enough free memory as a workspace?  
By free memory, I am referring to unallocated memory and also 
recoverable main memory used for shrinkable read caches (shrinkable by 
discarding cached data).  If the system keeps enough free and 
recoverable memory around for workspace, why should the deadlock case 
ever arise?  Slowness and page swapping might be expected to arise (as a 
result of a shrinking read cache and high memory pressure), but 
deadlocks too?

It sounds like deadlocks from the described scenario indicate the memory 
allocation and caching algorithms do not perform gracefully in the face 
of high memory pressure.  If the deadlocks do not occur when different 
memory pools are involved (by using a second computer), that tells me 
that memory allocation decisions are playing a role.  Additional data 
should not be accepted for writes when the system determines memory 
pressure is so high that it it may not be able to flush everything to disk.

Here is one article about memory pressure (on Windows, but the issues 
apply cross-OS):
http://blogs.msdn.com/b/slavao/archive/2005/02/01/364523.aspx

(How does virtualization fit into this picture?  If both OpenSolaris 
systems are actually running inside of different virtual machines, on 
top of the same host, have we isolated them enough to allow pools inside 
pools without risk of deadlocks? )

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100923/cd35a12e/attachment.html>

Haudy Kazemi

2010-Sep-23 06:49 UTC

head link

[zfs-discuss] Pools inside pools

Erik Trimble wrote:>  On 9/22/2010 11:15 AM, Markus Kovero wrote:
>>> Such configuration was known to cause deadlocks. Even if it works 
>>> now (which I don''t expect to be the case) it will make
your data to
>>> be cached twice. The CPU utilization>  will also be much higher,
etc.
>>> All in all I strongly recommend against such setup.
>>> -- 
>>> Pawel Jakub Dawidek                      
http://www.wheelsystems.com
>>> pjd at FreeBSD.org                           http://www.FreeBSD.org
>>> FreeBSD committer                         Am I Evil? Yes, I Am!
>> Well, CPU utilization can be tuned downwards by disabling checksums 
>> in inner pools as checksumming is done in main pool. I''d be 
>> interested in bug id''s for deadlock issues and everything
related.
>> Caching twice is not an issue, prefetching could be and it can be 
>> disabled
>> I don''t understand what makes it difficult for zfs to handle
this
>> kind of setup. Main pool (testpool) should just allow any 
>> writes/reads to/from volume, not caring what they are, where as 
>> anotherpool would just work as any other pool consisting of any other 
>> devices.
>> This is quite similar setup to iscsi-replicated mirror pool, where 
>> you have redundant pool created from iscsi volumes locally and
remotely.
>>
>> Yours
>> Markus Kovero
>
> Actually, the mechanics of local pools inside pools is significantly 
> different than using remote volumes (potentially exported ZFS volumes) 
> to build a local pool from.
>
> And, no, you WOULDN''T want to turn off the "inside"
pool''s checksums.
> You''re assuming that this would be taken care of by the outside
pool,
> but that''s a faulty assumption, since the only way this would
happen
> would be if the pools somehow understood they were being nested, and 
> thus could "bypass" much of the caching and I/O infrastructure
related
> to the inner pool.
What is an example of where a checksummed outside pool would not be able 
to protect a non-checksummed inside pool?  Would an intermittent 
RAM/motherboard/CPU failure that only corrupted the inner pool''s block 
before it was passed to the outer pool (and did not corrupt the outer 
pool''s block) be a valid example?

If checksums are desirable in this scenario, then redundancy would also 
be needed to recover from checksum failures.

Pools understanding nesting would be a win.  Another win that might 
benefit from this pool-to-pool communication interface would be a ZFS 
client (shim? driver?) that would extend ZFS checksum protection all the 
way out across the network to the workstations accessing ZFS pools.  ZFS 
offers no protection against corruption between the CIFS/NFS server and 
the CIFS/NFS client.  (The client would need to mount the pool directly 
in the current structure).

----
To quote myself from May 2010:

If someone wrote a "ZFS client", it''d be possible to get over
the wire
data protection.  This would be continuous from the client computer all 
the way to the storage device.  Right now there is data protection from 
the server to the storage device.  The best protected apps are those 
running on the same server that has mounted the ZFS pool containing the 
data they need (in which case they are protected by ZFS checksums and by 
ECC RAM, if present).

A "ZFS client" would run on the computer connecting to the ZFS server,
in order to extend ZFS''s protection and detection out across the
network.

In one model, the ZFS client could be a proxy for communication between 
the client and the server running ZFS.  It would extend the filesystem 
checksumming across the network, verifying checksums locally as data was 
requested, and calculating checksums locally before data was sent that 
the server would re-check.  Recoverable checksum failures would be 
transparent except for performance loss, unrecoverable failures would be 
reported as unrecoverable using the standard OS unrecoverable checksum 
error message (Windows has one that it uses for bad sectors on drives 
and optical media).  The local client checksum calculations would be 
useful in detecting network failures, and local hardware instability.  
(I.e. if most/all clients start seeing checksum failures...look at the 
network; if only one client sees checksum failures, check that client''s
hardware.)

An extension to the ZFS client model would allow multi-level ZFS systems 
to better coordinate their protection and recover from more scenarios.  
By multi-level ZFS, I mean ZFS stacked on ZFS, say via iSCSI.  An 
example (I''m sure there are better ones) would be 3 servers, each with
3
data disks.  Each disk is made into its own non-redundant pool (making 9 
non-redundant pools).  These pools are in turn shared via iSCSI.  One of 
the servers creates RAIDZ1 groups using 1 disk from each of the 3 servers.
With a means for ZFS systems to communicate, a failure of any 
non-redundant lower level device need not trigger a system halt of that 
lower system, because it will know from the higher level system that the 
device can be repaired/replaced using the higher level redundancy.

A key to making this happen is an interface to request a block and its 
related checksum (or if speaking of CIFS, to request a file, its related 
blocks, and their checksums.)
----

The ability to grow/shrink RAIDZ by adding and removing devices is still 
more important, and so is the ability to rebalance pools when a pool is 
grown.

>
> Cacheing is also a huge issue, since ZFS isn''t known for being 
> memory-slim, and as caching is done (currently) on a per-pool level, 
> nested pools will consume significantly more RAM.This tells me that nesting itself isn''t a cause for additional RAM 
consumption.  The number of pools is the cause.  Minimize the number of 
pools to minimize RAM consumption.
> Without caching the inner pool, performance is going to suck (even if 
> some blocks are cached in the outer pool, that pool has no way to do 
> look-ahead, nor other actions). The nature of delayed writes can also 
> wreck havoc with caching at both pool levels.What about not caching the outer pool?  Then can we view the inner pool 
as using a (now larger) cache to make up for a ''big slow
storage''
device.  The inner pool knows which files are being used so can do 
look-ahead.
> Stupid filesystems have no issues with nesting, as they''re not
doing
> anything besides (essentially) direct I/O to the underlying devices. 
> UFS doesn''t have its own I/O subsystem, nor do things like ext* or
> xfs.  However, I''ve yet to see any "modern" filesystem
do well with
> nesting itself - there''s simply too much going on under the hood,
and
> without being "nested-aware" (i.e. specifically coding the
filesystem
> to understand when it''s being nested), much of these backend 
> optimizations are a recipe for conflict .
>
Sounds like tunneling TCP over TCP, vs TCP over UDP.  In the former case 
optimizations and retries on errors can lead to quickly degraded 
performance.  In the latter, the lower layer doesn''t try to maintain 
integrity and instead leaves that job to the application.

TCP over TCP:  ZFS over ZFS
TCP over UDP:  ZFS over UFS
UDP over UDP:  UFS over UFS

Markus Kovero

2010-Sep-23 06:54 UTC

head link

[zfs-discuss] Pools inside pools

> Isn''t this a matter of not keeping enough free memory as a
workspace?? By free memory, I am referring to unallocated memory and also
recoverable main memory used for shrinkable read caches (shrinkable by
discarding cached  data).? If the system keeps enough free and recoverable
memory around for workspace, why should the deadlock case ever arise?? Slowness
and page swapping might be expected to arise (as a result of a shrinking read
cache and high >memory pressure), but deadlocks too?
> It sounds like deadlocks from the described scenario indicate the memory
allocation and caching algorithms do not perform gracefully in the face of high
memory pressure.? If the deadlocks do not occur when different memory pools  are
involved (by using a second computer), that tells me that memory allocation
decisions are playing a role.? Additional data should not be accepted for writes
when the system determines memory pressure is so high that it it may not > be
able to flush everything to disk.
> Here is one article about memory pressure (on Windows, but the issues apply
cross-OS):
> http://blogs.msdn.com/b/slavao/archive/2005/02/01/364523.aspx
> (How does virtualization fit into this picture?? If both OpenSolaris
systems are actually running inside of different virtual machines, on top of the
same host, have we isolated them enough to allow pools inside pools without risk
of deadlocks? )
I haven''t noticed any deadlock issues so far in low memory conditions
when doing nested pools (in replicated configuration), atleast in snv134. Maybe
I haven''t tried hard enough, anyway, wouldn''t log-device in
innerpool help in this situation?

Yours
Markus Kovero

Markus Kovero

2010-Sep-23 06:58 UTC

head link

[zfs-discuss] Pools inside pools

> What is an example of where a checksummed outside pool would not be able 
> to protect a non-checksummed inside pool?  Would an intermittent 
> RAM/motherboard/CPU failure that only corrupted the inner pool''s
block
> before it was passed to the outer pool (and did not corrupt the outer 
> pool''s block) be a valid example?
> If checksums are desirable in this scenario, then redundancy would also 
> be needed to recover from checksum failures.

That is excellent point also, what is the point for checksumming if you cannot
recover from it? At this kind of configuration one would benefit
performance-wise not having to calculate checksums again.
Checksums in outer pools effectively protect from disk issues, if hardware fails
so data is corrupted isn''t outer pools redundancy going to handle it
for inner pool also.
Only thing comes to mind is that IF something happens to outerpool, innerpool is
not aware anymore of possibly broken data which can lead issues.

Yours
Markus Kovero

Haudy Kazemi

2010-Sep-23 07:14 UTC

head link

[zfs-discuss] Pools inside pools

Markus Kovero wrote:>> What is an example of where a checksummed outside pool would not be
able
>> to protect a non-checksummed inside pool?  Would an intermittent 
>> RAM/motherboard/CPU failure that only corrupted the inner
pool''s block
>> before it was passed to the outer pool (and did not corrupt the outer 
>> pool''s block) be a valid example?
>>     
>
>   
>> If checksums are desirable in this scenario, then redundancy would also
>> be needed to recover from checksum failures.
>>     
>
>
> That is excellent point also, what is the point for checksumming if you
cannot recover from it?Checksum errors can tell you there is probably a problem worthy of 
attention.  They can prevent you from making things worse by stopping 
you in your tracks until whatever triggered them is resolved, or enough 
redundancy is available to overcome the errors.  This is why operating 
system kernels panic/abend/BSOD when they detect that the system state 
has been changed in an unknown way which could have unpredictable (and 
likely bad) results on further operations.

Redundancy is useful when you can''t recover the data by simply asking 
for it to be re-sent or by getting it from another source.  
Communications buses and protocols will use checksums to detect 
corruption and resends/retries to recover from checksum failures.  That 
strategy doesn''t work when you are talking about your end storage
media.

> At this kind of configuration one would benefit performance-wise not having
to calculate checksums again.
> Checksums in outer pools effectively protect from disk issues, if hardware
fails so data is corrupted isn''t outer pools redundancy going to handle
it for inner pool also.
> Only thing comes to mind is that IF something happens to outerpool,
innerpool is not aware anymore of possibly broken data which can lead issues.
>
> Yours
> Markus Kovero
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100923/815b04f5/attachment.html>

Mattias Pantzare

2010-Sep-23 08:00 UTC

head link

[zfs-discuss] Pools inside pools

On Thu, Sep 23, 2010 at 08:48, Haudy Kazemi <kaze0010 at umn.edu>
wrote:> Mattias Pantzare wrote:
>>
>> ZFS needs free memory for writes. If you fill your memory with dirty
>> data zfs has to flush that data to disk. If that disk is a virtual
>> disk in zfs on the same computer those writes need more memory from
>> the same memory pool and you have a deadlock.
>> If you write to a zvol on a different host (via iSCSI) those writes
>> use memory in a different memory pool (on the other computer). No
>> deadlock.
>
> Isn''t this a matter of not keeping enough free memory as a
workspace?  By
> free memory, I am referring to unallocated memory and also recoverable
main> memory used for shrinkable read caches (shrinkable by discarding cached
> data).  If the system keeps enough free and recoverable memory around for
> workspace, why should the deadlock case ever arise?  Slowness and page
> swapping might be expected to arise (as a result of a shrinking read cache
> and high memory pressure), but deadlocks too?
Yes. But what is enough reserved free memory? If you need 1Mb for a normal
configuration you might need 2Mb when you are doing ZFS on ZFS. (I am just
guessing).

This is the same problem as mounting an NFS server on itself via NFS. Also
not supported.

The system has shrinkable caches and so on, but that space will sometimes
run out. All of it. There is also swap to use, but if that is on ZFS....

These things are also very hard to test.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100923/fe79b4d3/attachment.html>

Nicolas Williams

2010-Sep-23 17:44 UTC

head link

[zfs-discuss] Pools inside pools

On Thu, Sep 23, 2010 at 06:58:29AM +0000, Markus Kovero
wrote:> > What is an example of where a checksummed outside pool would not be
able
> > to protect a non-checksummed inside pool?  Would an intermittent 
> > RAM/motherboard/CPU failure that only corrupted the inner
pool''s block
> > before it was passed to the outer pool (and did not corrupt the outer 
> > pool''s block) be a valid example?
> 
> > If checksums are desirable in this scenario, then redundancy would
also
> > be needed to recover from checksum failures.
> 
> That is excellent point also, what is the point for checksumming if
> you cannot recover from it? At this kind of configuration one would
> benefit performance-wise not having to calculate checksums again.
The benefit of checksumming in the "inner tunnel", as it were (the
inner
pool), is to provide one more layer of protection relative to iSCSI.
But without redundancy in the inner pool you cannot recover from
failures, as you point out.  And you must have checksumming in the outer
pool, so that it can be scrubbed.

It''s tempting to say that the inner pool should not checksum at all,
and
that iSCSI and IPsec should be configured correctly to provide
sufficient protection to the inner pool.  Another possibility is to have
a remote ZFS protocol of sorts, but then you begin to wonder if
something like Lustre (married to ZFS) isn''t better.
> Checksums in outer pools effectively protect from disk issues, if
> hardware fails so data is corrupted isn''t outer pools redundancy
going
> to handle it for inner pool also.
Yes.

Nico
--

Markus Kovero

2010-Sep-28 21:06 UTC

head link

[zfs-discuss] Pools inside pools

> Yes. But what is enough reserved free memory? If you need 1Mb for a normal
configuration you might need 2Mb when you are doing ZFS on ZFS. (I am just
guessing).
> This is the same problem as mounting an NFS server on itself via NFS. Also
not supported.
> The system has shrinkable caches and so on, but that space will sometimes
run out. All of it. There is also swap to use, but if that is on ZFS....
> These things are also very hard to test.
I was able to see opensolaris snv_134 to become unresponsive due lack of memory
with nested pool configuration today. It took around 12hours issuing writes
around 1,2-1,5GB/s with system that had 48GB of ram.
Anyway, setting zfs_arc_max in /etc/system seemed to do the trick, seems to
behave like expected even under heavier load. Performance is actually pretty
good.

Yours
Markus Kovero

zfs discuss - Sep 2010 - Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools

[zfs-discuss] Pools inside pools