Hi, When deploying ZFS in cluster environment it would be nice to be able to have some SSDs as local drives (not on SAN) and when pool switches over to the other node zfs would pick up the node''s local disk drives as L2ARC. To better clarify what I mean lets assume there is a 2-node cluster with 1sx 2540 disk array. Now lets put 4x SSDs in each node (as internal/local drives). Now lets assume one zfs pool would be created on top of a lun exported from 2540. Now 4x local SSDs could be added as L2ARC but because they are not visible on a 2nd node when cluster does failover it should be able to pick up the ssd''s which are local to the other node. L2ARC doesn''t contain any data which is critical to pool so it doesn''t have to be shared between node. SLOG would be a whole different story and generally it wouldn''t be possible. But L2ARC should be. -- Robert Milkowski http://milek.blogspot.com
Robert Milkowski wrote:> Hi, > > When deploying ZFS in cluster environment it would be nice to be able > to have some SSDs as local drives (not on SAN) and when pool switches > over to the other node zfs would pick up the node''s local disk drives > as L2ARC. > > To better clarify what I mean lets assume there is a 2-node cluster > with 1sx 2540 disk array. > Now lets put 4x SSDs in each node (as internal/local drives). Now lets > assume one zfs pool would be created on top of a lun exported from > 2540. Now 4x local SSDs could be added as L2ARC but because they are > not visible on a 2nd node when cluster does failover it should be able > to pick up the ssd''s which are local to the other node. > > L2ARC doesn''t contain any data which is critical to pool so it doesn''t > have to be shared between node. SLOG would be a whole different story > and generally it wouldn''t be possible. But L2ARC should be. > >Perhaps a scenario like below should be allowed: node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 node-1-ssd4 node-1# zpool export mysql node-2# zpool import mysql node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 node-2-ssd4 This is assuming that pool can be imported when some of its slog devices are not accessible. That way the pool always would have some L2ARC/SSDs not accessible but would provide L2ARC cache on each node with local SSDs. btw:> milek at r600:/rpool/tmp# mkfile 200m f1 > milek at r600:/rpool/tmp# mkfile 100m s1 > milek at r600:/rpool/tmp# zpool create test /rpool/tmp/f1 > milek at r600:/rpool/tmp# zpool status test > pool: test > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > /rpool/tmp/f1 ONLINE 0 0 0 > > errors: No known data errors > milek at r600:/rpool/tmp# zpool add test cache /rpool/tmp/s1 > cannot add to ''test'': cache device must be a disk or disk slice > milek at r600:/rpool/tmp#is there a reason why a cache device can''t be set-up on a file like for other vdevs? -- Robert Milkowski http://milek.blogspot.com
Robert Milkowski wrote:> Robert Milkowski wrote: >> Hi, >> >> When deploying ZFS in cluster environment it would be nice to be able >> to have some SSDs as local drives (not on SAN) and when pool switches >> over to the other node zfs would pick up the node''s local disk drives >> as L2ARC. >> >> To better clarify what I mean lets assume there is a 2-node cluster >> with 1sx 2540 disk array. >> Now lets put 4x SSDs in each node (as internal/local drives). Now >> lets assume one zfs pool would be created on top of a lun exported >> from 2540. Now 4x local SSDs could be added as L2ARC but because they >> are not visible on a 2nd node when cluster does failover it should be >> able to pick up the ssd''s which are local to the other node. >> >> L2ARC doesn''t contain any data which is critical to pool so it >> doesn''t have to be shared between node. SLOG would be a whole >> different story and generally it wouldn''t be possible. But L2ARC >> should be. >> >> > > Perhaps a scenario like below should be allowed: > > node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 > node-1-ssd4 > node-1# zpool export mysql > node-2# zpool import mysql > node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 > node-2-ssd4 > > > This is assuming that pool can be imported when some of its slog > devices are not accessible. > That way the pool always would have some L2ARC/SSDs not accessible but > would provide L2ARC cache on each node with local SSDs. > > > btw: > >> milek at r600:/rpool/tmp# mkfile 200m f1 >> milek at r600:/rpool/tmp# mkfile 100m s1 >> milek at r600:/rpool/tmp# zpool create test /rpool/tmp/f1 >> milek at r600:/rpool/tmp# zpool status test >> pool: test >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> test ONLINE 0 0 0 >> /rpool/tmp/f1 ONLINE 0 0 0 >> >> errors: No known data errors >> milek at r600:/rpool/tmp# zpool add test cache /rpool/tmp/s1 >> cannot add to ''test'': cache device must be a disk or disk slice >> milek at r600:/rpool/tmp# > > is there a reason why a cache device can''t be set-up on a file like > for other vdevs? >milek at r600:/rpool/tmp# zfs create -V 100m rpool/tmp/ssd1 milek at r600:/rpool/tmp# zpool add test cache /dev/zvol/rdsk/rpool/tmp/ssd1 cannot use ''/dev/zvol/rdsk/rpool/tmp/ssd1'': must be a block device or regular file milek at r600:/rpool/tmp# zpool add test cache /dev/zvol/dsk/rpool/tmp/ssd1 milek at r600:/rpool/tmp# So when I try to add a cache device on-top of a file I get an error that a cache device must be a disk or a disk slice, so when I try to add a cache device on a rdsk I get an error that it bust be a block device or regular file which suggest a regular file should work... (dsk works fine). -- Robert Milkowski http://milek.blogspot.com
Robert Milkowski wrote:> Robert Milkowski wrote: >> Hi, >> >> When deploying ZFS in cluster environment it would be nice to be able >> to have some SSDs as local drives (not on SAN) and when pool switches >> over to the other node zfs would pick up the node''s local disk drives >> as L2ARC. >> >> To better clarify what I mean lets assume there is a 2-node cluster >> with 1sx 2540 disk array. >> Now lets put 4x SSDs in each node (as internal/local drives). Now >> lets assume one zfs pool would be created on top of a lun exported >> from 2540. Now 4x local SSDs could be added as L2ARC but because they >> are not visible on a 2nd node when cluster does failover it should be >> able to pick up the ssd''s which are local to the other node. >> >> L2ARC doesn''t contain any data which is critical to pool so it >> doesn''t have to be shared between node. SLOG would be a whole >> different story and generally it wouldn''t be possible. But L2ARC >> should be. >> >> > > Perhaps a scenario like below should be allowed: > > node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 > node-1-ssd4 > node-1# zpool export mysql > node-2# zpool import mysql > node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 > node-2-ssd4 > > > This is assuming that pool can be imported when some of its slog > devices are not accessible. > That way the pool always would have some L2ARC/SSDs not accessible but > would provide L2ARC cache on each node with local SSDs.Actually it looks like it already works like that! A pool imports with its cache device unavailable just fine. Then I added another cache device. And I can still import it with the first one available but not the 2nd one. zpool status complains of course but other than that it seems to be working fine. Any thought? -- Robert Milkowski http://milek.blogspot.com
Robert Milkowski wrote:> Robert Milkowski wrote: >> Robert Milkowski wrote:>>> When deploying ZFS in cluster environment it would be nice to be able >>> to have some SSDs as local drives (not on SAN) and when pool switches >>> over to the other node zfs would pick up the node''s local disk drives >>> as L2ARC.> Any thought?The 7310/7410 uses this type of configuration, so obviously it works. When in doubt, just think What Would Fishworks Do? Wes Felter
Robert Milkowski wrote:> Robert Milkowski wrote: >> Robert Milkowski wrote: >>> Hi, >>> >>> When deploying ZFS in cluster environment it would be nice to be >>> able to have some SSDs as local drives (not on SAN) and when pool >>> switches over to the other node zfs would pick up the node''s local >>> disk drives as L2ARC. >>> >>> To better clarify what I mean lets assume there is a 2-node cluster >>> with 1sx 2540 disk array. >>> Now lets put 4x SSDs in each node (as internal/local drives). Now >>> lets assume one zfs pool would be created on top of a lun exported >>> from 2540. Now 4x local SSDs could be added as L2ARC but because >>> they are not visible on a 2nd node when cluster does failover it >>> should be able to pick up the ssd''s which are local to the other node. >>> >>> L2ARC doesn''t contain any data which is critical to pool so it >>> doesn''t have to be shared between node. SLOG would be a whole >>> different story and generally it wouldn''t be possible. But L2ARC >>> should be. >>> >>> >> >> Perhaps a scenario like below should be allowed: >> >> node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 >> node-1-ssd4 >> node-1# zpool export mysql >> node-2# zpool import mysql >> node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 >> node-2-ssd4 >> >> >> This is assuming that pool can be imported when some of its slog >> devices are not accessible. >> That way the pool always would have some L2ARC/SSDs not accessible >> but would provide L2ARC cache on each node with local SSDs. > > Actually it looks like it already works like that! > A pool imports with its cache device unavailable just fine. > Then I added another cache device. And I can still import it with the > first one available but not the 2nd one. > > zpool status complains of course but other than that it seems to be > working fine. > > Any thought? > >Ooo. That''s a scenario I hadn''t thought about. Right now, I''m doing something similar on the cheap: I have an iSCSI LUN (big ass SATA Raidz2) mounted on host A, and am using a spare 15k SAS drive locally as the L2ARC. When I export it and import it to another host, with a identical disk in the same location (.e.g. c1t1d0), I''ve done a ''zpool remove/add'', since they write different ZFS signatures on the cache drive. Works like a champ. Given that I want to use the same device location (e.g. c1t1d0) on both hosts, is there a way I can somehow add both as cache devices, and have ZFS tell them apart by the ID signature? That is, on Host A, I do this: # zpool create tank <iSCSI LUN> cache c1t1d0 # zpool export tank Then, on Host B, I''m currently doing: # zpool import tank # zpool remove tank c1t1d0 # zpool add tank cache c1t1d0 I''d obviously like to figure some way that I don''t need to do the ''zpool add/remove'' Robert''s idea looks great, but I''m assuming that all the SSD devices have different drive locations. What I need is some way of telling ZFS to use a device X as a cache device, based on it''s ZFS signature, rather than it''s physical device location, as that location might (in the past) be used by another vdev. Theoretically, I''d like to do something like this: hostA# zpool create tank <iSCSI LUN> hostA# zpool add tank cache c1t1d0 hostA# zpool export tank hostB# zpool import tank hostB# zpool add tank cache c1t1d0 And from then on, I just import/export between the two hosts, and it auto-picks the correct c1t1d0 drive. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA