Hi, I''m asking for opinions here, any possible disaster happening or performance issues related in setup described below. Point being to create large pool and smaller pools within where you can monitor easily iops and bandwidth usage without using dtrace or similar techniques. 1. Create pool # zpool create testpool mirror c1t1d0 c1t2d0 2. Create volume inside a pool we just created # zfs create -V 500g testpool/testvolume 3. Create pool from volume we just did # zpool create anotherpool /dev/zvol/dsk/testpool/testvolume After this, anotherpool can be monitored via zpool iostat nicely and compression can be used in testpool to save resources without having compression effect in anotherpool. zpool export/import seems to work, although flag -d needs to be used, are there any caveats in this setup? How writes are handled? Is it safe to create pool consisting several ssd''s and use volumes from it as log-devices? Is it even supported? Yours Markus Kovero
On Wed, Sep 22, 2010 at 02:06:27PM +0000, Markus Kovero wrote:> Hi, I''m asking for opinions here, any possible disaster happening or performance issues related in setup described below. > Point being to create large pool and smaller pools within where you can monitor easily iops and bandwidth usage without using dtrace or similar techniques. > > 1. Create pool > > # zpool create testpool mirror c1t1d0 c1t2d0 > > 2. Create volume inside a pool we just created > > # zfs create -V 500g testpool/testvolume > > 3. Create pool from volume we just did > > # zpool create anotherpool /dev/zvol/dsk/testpool/testvolume > > After this, anotherpool can be monitored via zpool iostat nicely and compression can be used in testpool to save resources without having compression effect in anotherpool. > > zpool export/import seems to work, although flag -d needs to be used, are there any caveats in this setup? How writes are handled? > Is it safe to create pool consisting several ssd''s and use volumes from it as log-devices? Is it even supported?Such configuration was known to cause deadlocks. Even if it works now (which I don''t expect to be the case) it will make your data to be cached twice. The CPU utilization will also be much higher, etc. All in all I strongly recommend against such setup. -- Pawel Jakub Dawidek http://www.wheelsystems.com pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100922/d3f75b0b/attachment.bin>
> Such configuration was known to cause deadlocks. Even if it works now (which I don''t expect to be the case) it will make your data to be cached twice. The CPU utilization > will also be much higher, etc. > All in all I strongly recommend against such setup.> -- > Pawel Jakub Dawidek http://www.wheelsystems.com > pjd at FreeBSD.org http://www.FreeBSD.org > FreeBSD committer Am I Evil? Yes, I Am!Well, CPU utilization can be tuned downwards by disabling checksums in inner pools as checksumming is done in main pool. I''d be interested in bug id''s for deadlock issues and everything related. Caching twice is not an issue, prefetching could be and it can be disabled I don''t understand what makes it difficult for zfs to handle this kind of setup. Main pool (testpool) should just allow any writes/reads to/from volume, not caring what they are, where as anotherpool would just work as any other pool consisting of any other devices. This is quite similar setup to iscsi-replicated mirror pool, where you have redundant pool created from iscsi volumes locally and remotely. Yours Markus Kovero
On 9/22/2010 11:15 AM, Markus Kovero wrote:>> Such configuration was known to cause deadlocks. Even if it works now (which I don''t expect to be the case) it will make your data to be cached twice. The CPU utilization> will also be much higher, etc. >> All in all I strongly recommend against such setup. >> -- >> Pawel Jakub Dawidek http://www.wheelsystems.com >> pjd at FreeBSD.org http://www.FreeBSD.org >> FreeBSD committer Am I Evil? Yes, I Am! > Well, CPU utilization can be tuned downwards by disabling checksums in inner pools as checksumming is done in main pool. I''d be interested in bug id''s for deadlock issues and everything related. Caching twice is not an issue, prefetching could be and it can be disabled > I don''t understand what makes it difficult for zfs to handle this kind of setup. Main pool (testpool) should just allow any writes/reads to/from volume, not caring what they are, where as anotherpool would just work as any other pool consisting of any other devices. > This is quite similar setup to iscsi-replicated mirror pool, where you have redundant pool created from iscsi volumes locally and remotely. > > Yours > Markus KoveroActually, the mechanics of local pools inside pools is significantly different than using remote volumes (potentially exported ZFS volumes) to build a local pool from. And, no, you WOULDN''T want to turn off the "inside" pool''s checksums. You''re assuming that this would be taken care of by the outside pool, but that''s a faulty assumption, since the only way this would happen would be if the pools somehow understood they were being nested, and thus could "bypass" much of the caching and I/O infrastructure related to the inner pool. Cacheing is also a huge issue, since ZFS isn''t known for being memory-slim, and as caching is done (currently) on a per-pool level, nested pools will consume significantly more RAM. Without caching the inner pool, performance is going to suck (even if some blocks are cached in the outer pool, that pool has no way to do look-ahead, nor other actions). The nature of delayed writes can also wreck havoc with caching at both pool levels. Stupid filesystems have no issues with nesting, as they''re not doing anything besides (essentially) direct I/O to the underlying devices. UFS doesn''t have its own I/O subsystem, nor do things like ext* or xfs. However, I''ve yet to see any "modern" filesystem do well with nesting itself - there''s simply too much going on under the hood, and without being "nested-aware" (i.e. specifically coding the filesystem to understand when it''s being nested), much of these backend optimizations are a recipe for conflict . -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
> Actually, the mechanics of local pools inside pools is significantly > different than using remote volumes (potentially exported ZFS volumes) > to build a local pool from.I don''t see how, I''m referring to method where hostA shares local iscsi volume to hostB where volume is being mirrored with zfs to its local volume that is shared through iscsi, resulting sync mirrored pool.> And, no, you WOULDN''T want to turn off the "inside" pool''s checksums. > You''re assuming that this would be taken care of by the outside pool, > but that''s a faulty assumption, since the only way this would happen > would be if the pools somehow understood they were being nested, and > thus could "bypass" much of the caching and I/O infrastructure related > to the inner pool.Good point. Checksums it is then.> Cacheing is also a huge issue, since ZFS isn''t known for being > memory-slim, and as caching is done (currently) on a per-pool level, > nested pools will consume significantly more RAM. Without caching the > inner pool, performance is going to suck (even if some blocks are cached > in the outer pool, that pool has no way to do look-ahead, nor other > actions). The nature of delayed writes can also wreck havoc with caching > at both pool levels.Well, again, I don''t see how nested pool would consume more RAM than invidual another pool created from dedicated disks. Read caching takes place twice, but I don''t see it much of as problem nowadays, just double the ram. (ofcourse, depending on workload) look-ahead (prefetch?) hasn''t work very well anyway so it''s gong to be disabled, cache hit isn''t great (worth it) on any workload. Also, write caching needs to be benchmarked, but I''d say, if it works like it should, there is no issues there, have to test it out thoroughly though.> Stupid filesystems have no issues with nesting, as they''re not doing > anything besides (essentially) direct I/O to the underlying devices. UFS > doesn''t have its own I/O subsystem, nor do things like ext* or xfs. > However, I''ve yet to see any "modern" filesystem do well with nesting > itself - there''s simply too much going on under the hood, and without > being "nested-aware" (i.e. specifically coding the filesystem to > understand when it''s being nested), much of these backend optimizations > are a recipe for conflict .> -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CAThanks for your thoughts, if issues are performance related, they can be dealt with to some extent, more I''m worrying if there is still deadlock issues or other general stability issues to consider, haven''t found anything useful from bugtraq yet though. Yours Markus Kovero
On Wed, Sep 22, 2010 at 20:15, Markus Kovero <Markus.Kovero at nebula.fi> wrote:> > >> Such configuration was known to cause deadlocks. Even if it works now (which I don''t expect to be the case) it will make your data to be cached twice. The CPU utilization > will also be much higher, etc. >> All in all I strongly recommend against such setup. > >> -- >> Pawel Jakub Dawidek ? ? ? ? ? ? ? ? ? ? ? http://www.wheelsystems.com >> pjd at FreeBSD.org ? ? ? ? ? ? ? ? ? ? ? ? ? http://www.FreeBSD.org >> FreeBSD committer ? ? ? ? ? ? ? ? ? ? ? ? Am I Evil? Yes, I Am! > > Well, CPU utilization can be tuned downwards by disabling checksums in inner pools as checksumming is done in main pool. I''d be interested in bug id''s for deadlock issues and everything related. Caching twice is not an issue, prefetching could be and it can be disabled > I don''t understand what makes it difficult for zfs to handle this kind of setup. Main pool (testpool) should just allow any writes/reads to/from volume, not caring what they are, where as anotherpool would just work as any other pool consisting of any other devices. > This is quite similar setup to iscsi-replicated mirror pool, where you have redundant pool created from iscsi volumes locally and remotely.ZFS needs free memory for writes. If you fill your memory with dirty data zfs has to flush that data to disk. If that disk is a virtual disk in zfs on the same computer those writes need more memory from the same memory pool and you have a deadlock. If you write to a zvol on a different host (via iSCSI) those writes use memory in a different memory pool (on the other computer). No deadlock.
>If you write to a zvol on a different host (via iSCSI) those writes >use memory in a different memory pool (on the other computer). No >deadlock.I would expect in a usual configuration that one side of a mirrored iSCSI-based pool would be on the same host as it''s underlying zvol''s pool. -- Maurice Volaski, maurice.volaski at einstein.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University
>>If you write to a zvol on a different host (via iSCSI) those writes >>use memory in a different memory pool (on the other computer). No >>deadlock.>I would expect in a usual configuration that one side of a mirrored >iSCSI-based pool would be on the same host as it''s underlying zvol''s >pool.Thats what I was after. Would using log-device in inner pool make things different then? If presumed workload is eg. serving nfs. Yours Markus Kovero
Mattias Pantzare wrote:> On Wed, Sep 22, 2010 at 20:15, Markus Kovero <Markus.Kovero at nebula.fi> wrote: > >> >>> Such configuration was known to cause deadlocks. Even if it works now (which I don''t expect to be the case) it will make your data to be cached twice. The CPU utilization > will also be much higher, etc. >>> All in all I strongly recommend against such setup. >>> >>> -- >>> Pawel Jakub Dawidek http://www.wheelsystems.com >>> pjd at FreeBSD.org http://www.FreeBSD.org >>> FreeBSD committer Am I Evil? Yes, I Am! >>> >> Well, CPU utilization can be tuned downwards by disabling checksums in inner pools as checksumming is done in main pool. I''d be interested in bug id''s for deadlock issues and everything related. Caching twice is not an issue, prefetching could be and it can be disabled >> I don''t understand what makes it difficult for zfs to handle this kind of setup. Main pool (testpool) should just allow any writes/reads to/from volume, not caring what they are, where as anotherpool would just work as any other pool consisting of any other devices. >> This is quite similar setup to iscsi-replicated mirror pool, where you have redundant pool created from iscsi volumes locally and remotely. >> > > ZFS needs free memory for writes. If you fill your memory with dirty > data zfs has to flush that data to disk. If that disk is a virtual > disk in zfs on the same computer those writes need more memory from > the same memory pool and you have a deadlock. > If you write to a zvol on a different host (via iSCSI) those writes > use memory in a different memory pool (on the other computer). No > deadlock.Isn''t this a matter of not keeping enough free memory as a workspace? By free memory, I am referring to unallocated memory and also recoverable main memory used for shrinkable read caches (shrinkable by discarding cached data). If the system keeps enough free and recoverable memory around for workspace, why should the deadlock case ever arise? Slowness and page swapping might be expected to arise (as a result of a shrinking read cache and high memory pressure), but deadlocks too? It sounds like deadlocks from the described scenario indicate the memory allocation and caching algorithms do not perform gracefully in the face of high memory pressure. If the deadlocks do not occur when different memory pools are involved (by using a second computer), that tells me that memory allocation decisions are playing a role. Additional data should not be accepted for writes when the system determines memory pressure is so high that it it may not be able to flush everything to disk. Here is one article about memory pressure (on Windows, but the issues apply cross-OS): http://blogs.msdn.com/b/slavao/archive/2005/02/01/364523.aspx (How does virtualization fit into this picture? If both OpenSolaris systems are actually running inside of different virtual machines, on top of the same host, have we isolated them enough to allow pools inside pools without risk of deadlocks? ) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100923/cd35a12e/attachment.html>
Erik Trimble wrote:> On 9/22/2010 11:15 AM, Markus Kovero wrote: >>> Such configuration was known to cause deadlocks. Even if it works >>> now (which I don''t expect to be the case) it will make your data to >>> be cached twice. The CPU utilization> will also be much higher, etc. >>> All in all I strongly recommend against such setup. >>> -- >>> Pawel Jakub Dawidek http://www.wheelsystems.com >>> pjd at FreeBSD.org http://www.FreeBSD.org >>> FreeBSD committer Am I Evil? Yes, I Am! >> Well, CPU utilization can be tuned downwards by disabling checksums >> in inner pools as checksumming is done in main pool. I''d be >> interested in bug id''s for deadlock issues and everything related. >> Caching twice is not an issue, prefetching could be and it can be >> disabled >> I don''t understand what makes it difficult for zfs to handle this >> kind of setup. Main pool (testpool) should just allow any >> writes/reads to/from volume, not caring what they are, where as >> anotherpool would just work as any other pool consisting of any other >> devices. >> This is quite similar setup to iscsi-replicated mirror pool, where >> you have redundant pool created from iscsi volumes locally and remotely. >> >> Yours >> Markus Kovero > > Actually, the mechanics of local pools inside pools is significantly > different than using remote volumes (potentially exported ZFS volumes) > to build a local pool from. > > And, no, you WOULDN''T want to turn off the "inside" pool''s checksums. > You''re assuming that this would be taken care of by the outside pool, > but that''s a faulty assumption, since the only way this would happen > would be if the pools somehow understood they were being nested, and > thus could "bypass" much of the caching and I/O infrastructure related > to the inner pool.What is an example of where a checksummed outside pool would not be able to protect a non-checksummed inside pool? Would an intermittent RAM/motherboard/CPU failure that only corrupted the inner pool''s block before it was passed to the outer pool (and did not corrupt the outer pool''s block) be a valid example? If checksums are desirable in this scenario, then redundancy would also be needed to recover from checksum failures. Pools understanding nesting would be a win. Another win that might benefit from this pool-to-pool communication interface would be a ZFS client (shim? driver?) that would extend ZFS checksum protection all the way out across the network to the workstations accessing ZFS pools. ZFS offers no protection against corruption between the CIFS/NFS server and the CIFS/NFS client. (The client would need to mount the pool directly in the current structure). ---- To quote myself from May 2010: If someone wrote a "ZFS client", it''d be possible to get over the wire data protection. This would be continuous from the client computer all the way to the storage device. Right now there is data protection from the server to the storage device. The best protected apps are those running on the same server that has mounted the ZFS pool containing the data they need (in which case they are protected by ZFS checksums and by ECC RAM, if present). A "ZFS client" would run on the computer connecting to the ZFS server, in order to extend ZFS''s protection and detection out across the network. In one model, the ZFS client could be a proxy for communication between the client and the server running ZFS. It would extend the filesystem checksumming across the network, verifying checksums locally as data was requested, and calculating checksums locally before data was sent that the server would re-check. Recoverable checksum failures would be transparent except for performance loss, unrecoverable failures would be reported as unrecoverable using the standard OS unrecoverable checksum error message (Windows has one that it uses for bad sectors on drives and optical media). The local client checksum calculations would be useful in detecting network failures, and local hardware instability. (I.e. if most/all clients start seeing checksum failures...look at the network; if only one client sees checksum failures, check that client''s hardware.) An extension to the ZFS client model would allow multi-level ZFS systems to better coordinate their protection and recover from more scenarios. By multi-level ZFS, I mean ZFS stacked on ZFS, say via iSCSI. An example (I''m sure there are better ones) would be 3 servers, each with 3 data disks. Each disk is made into its own non-redundant pool (making 9 non-redundant pools). These pools are in turn shared via iSCSI. One of the servers creates RAIDZ1 groups using 1 disk from each of the 3 servers. With a means for ZFS systems to communicate, a failure of any non-redundant lower level device need not trigger a system halt of that lower system, because it will know from the higher level system that the device can be repaired/replaced using the higher level redundancy. A key to making this happen is an interface to request a block and its related checksum (or if speaking of CIFS, to request a file, its related blocks, and their checksums.) ---- The ability to grow/shrink RAIDZ by adding and removing devices is still more important, and so is the ability to rebalance pools when a pool is grown.> > Cacheing is also a huge issue, since ZFS isn''t known for being > memory-slim, and as caching is done (currently) on a per-pool level, > nested pools will consume significantly more RAM.This tells me that nesting itself isn''t a cause for additional RAM consumption. The number of pools is the cause. Minimize the number of pools to minimize RAM consumption.> Without caching the inner pool, performance is going to suck (even if > some blocks are cached in the outer pool, that pool has no way to do > look-ahead, nor other actions). The nature of delayed writes can also > wreck havoc with caching at both pool levels.What about not caching the outer pool? Then can we view the inner pool as using a (now larger) cache to make up for a ''big slow storage'' device. The inner pool knows which files are being used so can do look-ahead.> Stupid filesystems have no issues with nesting, as they''re not doing > anything besides (essentially) direct I/O to the underlying devices. > UFS doesn''t have its own I/O subsystem, nor do things like ext* or > xfs. However, I''ve yet to see any "modern" filesystem do well with > nesting itself - there''s simply too much going on under the hood, and > without being "nested-aware" (i.e. specifically coding the filesystem > to understand when it''s being nested), much of these backend > optimizations are a recipe for conflict . >Sounds like tunneling TCP over TCP, vs TCP over UDP. In the former case optimizations and retries on errors can lead to quickly degraded performance. In the latter, the lower layer doesn''t try to maintain integrity and instead leaves that job to the application. TCP over TCP: ZFS over ZFS TCP over UDP: ZFS over UFS UDP over UDP: UFS over UFS
> Isn''t this a matter of not keeping enough free memory as a workspace?? By free memory, I am referring to unallocated memory and also recoverable main memory used for shrinkable read caches (shrinkable by discarding cached data).? If the system keeps enough free and recoverable memory around for workspace, why should the deadlock case ever arise?? Slowness and page swapping might be expected to arise (as a result of a shrinking read cache and high >memory pressure), but deadlocks too?> It sounds like deadlocks from the described scenario indicate the memory allocation and caching algorithms do not perform gracefully in the face of high memory pressure.? If the deadlocks do not occur when different memory pools are involved (by using a second computer), that tells me that memory allocation decisions are playing a role.? Additional data should not be accepted for writes when the system determines memory pressure is so high that it it may not > be able to flush everything to disk.> Here is one article about memory pressure (on Windows, but the issues apply cross-OS): > http://blogs.msdn.com/b/slavao/archive/2005/02/01/364523.aspx> (How does virtualization fit into this picture?? If both OpenSolaris systems are actually running inside of different virtual machines, on top of the same host, have we isolated them enough to allow pools inside pools without risk of deadlocks? )I haven''t noticed any deadlock issues so far in low memory conditions when doing nested pools (in replicated configuration), atleast in snv134. Maybe I haven''t tried hard enough, anyway, wouldn''t log-device in innerpool help in this situation? Yours Markus Kovero
> What is an example of where a checksummed outside pool would not be able > to protect a non-checksummed inside pool? Would an intermittent > RAM/motherboard/CPU failure that only corrupted the inner pool''s block > before it was passed to the outer pool (and did not corrupt the outer > pool''s block) be a valid example?> If checksums are desirable in this scenario, then redundancy would also > be needed to recover from checksum failures.That is excellent point also, what is the point for checksumming if you cannot recover from it? At this kind of configuration one would benefit performance-wise not having to calculate checksums again. Checksums in outer pools effectively protect from disk issues, if hardware fails so data is corrupted isn''t outer pools redundancy going to handle it for inner pool also. Only thing comes to mind is that IF something happens to outerpool, innerpool is not aware anymore of possibly broken data which can lead issues. Yours Markus Kovero
Markus Kovero wrote:>> What is an example of where a checksummed outside pool would not be able >> to protect a non-checksummed inside pool? Would an intermittent >> RAM/motherboard/CPU failure that only corrupted the inner pool''s block >> before it was passed to the outer pool (and did not corrupt the outer >> pool''s block) be a valid example? >> > > >> If checksums are desirable in this scenario, then redundancy would also >> be needed to recover from checksum failures. >> > > > That is excellent point also, what is the point for checksumming if you cannot recover from it?Checksum errors can tell you there is probably a problem worthy of attention. They can prevent you from making things worse by stopping you in your tracks until whatever triggered them is resolved, or enough redundancy is available to overcome the errors. This is why operating system kernels panic/abend/BSOD when they detect that the system state has been changed in an unknown way which could have unpredictable (and likely bad) results on further operations. Redundancy is useful when you can''t recover the data by simply asking for it to be re-sent or by getting it from another source. Communications buses and protocols will use checksums to detect corruption and resends/retries to recover from checksum failures. That strategy doesn''t work when you are talking about your end storage media.> At this kind of configuration one would benefit performance-wise not having to calculate checksums again. > Checksums in outer pools effectively protect from disk issues, if hardware fails so data is corrupted isn''t outer pools redundancy going to handle it for inner pool also. > Only thing comes to mind is that IF something happens to outerpool, innerpool is not aware anymore of possibly broken data which can lead issues. > > Yours > Markus Kovero > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100923/815b04f5/attachment.html>
On Thu, Sep 23, 2010 at 08:48, Haudy Kazemi <kaze0010 at umn.edu> wrote:> Mattias Pantzare wrote: >> >> ZFS needs free memory for writes. If you fill your memory with dirty >> data zfs has to flush that data to disk. If that disk is a virtual >> disk in zfs on the same computer those writes need more memory from >> the same memory pool and you have a deadlock. >> If you write to a zvol on a different host (via iSCSI) those writes >> use memory in a different memory pool (on the other computer). No >> deadlock. > > Isn''t this a matter of not keeping enough free memory as a workspace? By > free memory, I am referring to unallocated memory and also recoverablemain> memory used for shrinkable read caches (shrinkable by discarding cached > data). If the system keeps enough free and recoverable memory around for > workspace, why should the deadlock case ever arise? Slowness and page > swapping might be expected to arise (as a result of a shrinking read cache > and high memory pressure), but deadlocks too?Yes. But what is enough reserved free memory? If you need 1Mb for a normal configuration you might need 2Mb when you are doing ZFS on ZFS. (I am just guessing). This is the same problem as mounting an NFS server on itself via NFS. Also not supported. The system has shrinkable caches and so on, but that space will sometimes run out. All of it. There is also swap to use, but if that is on ZFS.... These things are also very hard to test. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100923/fe79b4d3/attachment.html>
On Thu, Sep 23, 2010 at 06:58:29AM +0000, Markus Kovero wrote:> > What is an example of where a checksummed outside pool would not be able > > to protect a non-checksummed inside pool? Would an intermittent > > RAM/motherboard/CPU failure that only corrupted the inner pool''s block > > before it was passed to the outer pool (and did not corrupt the outer > > pool''s block) be a valid example? > > > If checksums are desirable in this scenario, then redundancy would also > > be needed to recover from checksum failures. > > That is excellent point also, what is the point for checksumming if > you cannot recover from it? At this kind of configuration one would > benefit performance-wise not having to calculate checksums again.The benefit of checksumming in the "inner tunnel", as it were (the inner pool), is to provide one more layer of protection relative to iSCSI. But without redundancy in the inner pool you cannot recover from failures, as you point out. And you must have checksumming in the outer pool, so that it can be scrubbed. It''s tempting to say that the inner pool should not checksum at all, and that iSCSI and IPsec should be configured correctly to provide sufficient protection to the inner pool. Another possibility is to have a remote ZFS protocol of sorts, but then you begin to wonder if something like Lustre (married to ZFS) isn''t better.> Checksums in outer pools effectively protect from disk issues, if > hardware fails so data is corrupted isn''t outer pools redundancy going > to handle it for inner pool also.Yes. Nico --
> Yes. But what is enough reserved free memory? If you need 1Mb for a normal configuration you might need 2Mb when you are doing ZFS on ZFS. (I am just guessing). > This is the same problem as mounting an NFS server on itself via NFS. Also not supported.> The system has shrinkable caches and so on, but that space will sometimes run out. All of it. There is also swap to use, but if that is on ZFS....> These things are also very hard to test.I was able to see opensolaris snv_134 to become unresponsive due lack of memory with nested pool configuration today. It took around 12hours issuing writes around 1,2-1,5GB/s with system that had 48GB of ram. Anyway, setting zfs_arc_max in /etc/system seemed to do the trick, seems to behave like expected even under heavier load. Performance is actually pretty good. Yours Markus Kovero