thr3ads.net - zfs discuss - [zfs-discuss] L2ARC in clusters [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2009-Dec-03 17:38 UTC

[zfs-discuss] L2ARC in clusters

Hi,

 When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node''s local disk drives
as
L2ARC.

To better clarify what I mean lets assume there is a 2-node cluster with 
1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now lets 
assume one zfs pool would be created on top of a lun exported from 2540. 
Now 4x local SSDs could be added as L2ARC but because they are not 
visible on a 2nd node when cluster does failover it should be able to 
pick up the ssd''s which are local to the other node.

L2ARC doesn''t contain any data which is critical to pool so it
doesn''t
have to be shared between node. SLOG would be a whole different story 
and generally it wouldn''t be possible. But L2ARC should be.


-- 
Robert Milkowski
http://milek.blogspot.com

Robert Milkowski

2009-Dec-03 17:49 UTC

head link

[zfs-discuss] L2ARC in clusters

Robert Milkowski wrote:> Hi,
>
> When deploying ZFS in cluster environment it would be nice to be able 
> to have some SSDs as local drives (not on SAN) and when pool switches 
> over to the other node zfs would pick up the node''s local disk
drives
> as L2ARC.
>
> To better clarify what I mean lets assume there is a 2-node cluster 
> with 1sx 2540 disk array.
> Now lets put 4x SSDs in each node (as internal/local drives). Now lets 
> assume one zfs pool would be created on top of a lun exported from 
> 2540. Now 4x local SSDs could be added as L2ARC but because they are 
> not visible on a 2nd node when cluster does failover it should be able 
> to pick up the ssd''s which are local to the other node.
>
> L2ARC doesn''t contain any data which is critical to pool so it
doesn''t
> have to be shared between node. SLOG would be a whole different story 
> and generally it wouldn''t be possible. But L2ARC should be.
>
>
Perhaps a scenario like below should be allowed:

node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 node-1-ssd4
node-1# zpool export mysql
node-2# zpool import mysql
node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 node-2-ssd4


This is assuming that pool can be imported when some of its slog devices 
are not accessible.
That way the pool always would have some L2ARC/SSDs not accessible but 
would provide L2ARC cache on each node with local SSDs.


btw:
> milek at r600:/rpool/tmp# mkfile 200m f1
> milek at r600:/rpool/tmp# mkfile 100m s1
> milek at r600:/rpool/tmp# zpool create test /rpool/tmp/f1
> milek at r600:/rpool/tmp# zpool status test
>   pool: test
>  state: ONLINE
>  scrub: none requested
> config:
>
>     NAME             STATE     READ WRITE CKSUM
>     test             ONLINE       0     0     0
>       /rpool/tmp/f1  ONLINE       0     0     0
>
> errors: No known data errors
> milek at r600:/rpool/tmp# zpool add test cache /rpool/tmp/s1
> cannot add to ''test'': cache device must be a disk or disk
slice
> milek at r600:/rpool/tmp#
is there a reason why a cache device can''t be set-up on a file like for
other vdevs?

-- 

Robert Milkowski
http://milek.blogspot.com

Robert Milkowski

2009-Dec-03 17:52 UTC

head link

[zfs-discuss] L2ARC in clusters

Robert Milkowski wrote:> Robert Milkowski wrote:
>> Hi,
>>
>> When deploying ZFS in cluster environment it would be nice to be able 
>> to have some SSDs as local drives (not on SAN) and when pool switches 
>> over to the other node zfs would pick up the node''s local disk
drives
>> as L2ARC.
>>
>> To better clarify what I mean lets assume there is a 2-node cluster 
>> with 1sx 2540 disk array.
>> Now lets put 4x SSDs in each node (as internal/local drives). Now 
>> lets assume one zfs pool would be created on top of a lun exported 
>> from 2540. Now 4x local SSDs could be added as L2ARC but because they 
>> are not visible on a 2nd node when cluster does failover it should be 
>> able to pick up the ssd''s which are local to the other node.
>>
>> L2ARC doesn''t contain any data which is critical to pool so it
>> doesn''t have to be shared between node. SLOG would be a whole 
>> different story and generally it wouldn''t be possible. But
L2ARC
>> should be.
>>
>>
>
> Perhaps a scenario like below should be allowed:
>
> node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 
> node-1-ssd4
> node-1# zpool export mysql
> node-2# zpool import mysql
> node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 
> node-2-ssd4
>
>
> This is assuming that pool can be imported when some of its slog 
> devices are not accessible.
> That way the pool always would have some L2ARC/SSDs not accessible but 
> would provide L2ARC cache on each node with local SSDs.
>
>
> btw:
>
>> milek at r600:/rpool/tmp# mkfile 200m f1
>> milek at r600:/rpool/tmp# mkfile 100m s1
>> milek at r600:/rpool/tmp# zpool create test /rpool/tmp/f1
>> milek at r600:/rpool/tmp# zpool status test
>>   pool: test
>>  state: ONLINE
>>  scrub: none requested
>> config:
>>
>>     NAME             STATE     READ WRITE CKSUM
>>     test             ONLINE       0     0     0
>>       /rpool/tmp/f1  ONLINE       0     0     0
>>
>> errors: No known data errors
>> milek at r600:/rpool/tmp# zpool add test cache /rpool/tmp/s1
>> cannot add to ''test'': cache device must be a disk or
disk slice
>> milek at r600:/rpool/tmp#
>
> is there a reason why a cache device can''t be set-up on a file
like
> for other vdevs?
>
milek at r600:/rpool/tmp# zfs create -V 100m rpool/tmp/ssd1
milek at r600:/rpool/tmp# zpool add test cache /dev/zvol/rdsk/rpool/tmp/ssd1
cannot use ''/dev/zvol/rdsk/rpool/tmp/ssd1'': must be a block
device or
regular file
milek at r600:/rpool/tmp# zpool add test cache /dev/zvol/dsk/rpool/tmp/ssd1
milek at r600:/rpool/tmp#


So when I try to add a cache device on-top of a file I get an error that 
a cache device must be a disk or a disk slice, so when I try to add a 
cache device on a rdsk I get an error that it bust be a block device or 
regular file which suggest a regular file should work... (dsk works fine).

-- 
Robert Milkowski
http://milek.blogspot.com

Robert Milkowski

2009-Dec-03 18:02 UTC

head link

[zfs-discuss] L2ARC in clusters

Robert Milkowski wrote:> Robert Milkowski wrote:
>> Hi,
>>
>> When deploying ZFS in cluster environment it would be nice to be able 
>> to have some SSDs as local drives (not on SAN) and when pool switches 
>> over to the other node zfs would pick up the node''s local disk
drives
>> as L2ARC.
>>
>> To better clarify what I mean lets assume there is a 2-node cluster 
>> with 1sx 2540 disk array.
>> Now lets put 4x SSDs in each node (as internal/local drives). Now 
>> lets assume one zfs pool would be created on top of a lun exported 
>> from 2540. Now 4x local SSDs could be added as L2ARC but because they 
>> are not visible on a 2nd node when cluster does failover it should be 
>> able to pick up the ssd''s which are local to the other node.
>>
>> L2ARC doesn''t contain any data which is critical to pool so it
>> doesn''t have to be shared between node. SLOG would be a whole 
>> different story and generally it wouldn''t be possible. But
L2ARC
>> should be.
>>
>>
>
> Perhaps a scenario like below should be allowed:
>
> node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 
> node-1-ssd4
> node-1# zpool export mysql
> node-2# zpool import mysql
> node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 
> node-2-ssd4
>
>
> This is assuming that pool can be imported when some of its slog 
> devices are not accessible.
> That way the pool always would have some L2ARC/SSDs not accessible but 
> would provide L2ARC cache on each node with local SSDs.
Actually it looks like it already works like that!
A pool imports with its cache device unavailable just fine.
Then I added another cache device. And I can still import it with the 
first one available but not the 2nd one.

zpool status complains of course but other than that it seems to be 
working fine.

Any thought?


-- 
Robert Milkowski
http://milek.blogspot.com

Wes Felter

2009-Dec-03 22:04 UTC

head link

[zfs-discuss] L2ARC in clusters

Robert Milkowski wrote:> Robert Milkowski wrote:
>> Robert Milkowski wrote:
>>> When deploying ZFS in cluster environment it would be nice to be
able
>>> to have some SSDs as local drives (not on SAN) and when pool
switches
>>> over to the other node zfs would pick up the node''s local
disk drives
>>> as L2ARC.
> Any thought?
The 7310/7410 uses this type of configuration, so obviously it works. 
When in doubt, just think What Would Fishworks Do?

Wes Felter

Erik Trimble

2009-Dec-03 23:40 UTC

head link

[zfs-discuss] L2ARC in clusters

Robert Milkowski wrote:> Robert Milkowski wrote:
>> Robert Milkowski wrote:
>>> Hi,
>>>
>>> When deploying ZFS in cluster environment it would be nice to be 
>>> able to have some SSDs as local drives (not on SAN) and when pool 
>>> switches over to the other node zfs would pick up the
node''s local
>>> disk drives as L2ARC.
>>>
>>> To better clarify what I mean lets assume there is a 2-node cluster
>>> with 1sx 2540 disk array.
>>> Now lets put 4x SSDs in each node (as internal/local drives). Now 
>>> lets assume one zfs pool would be created on top of a lun exported 
>>> from 2540. Now 4x local SSDs could be added as L2ARC but because 
>>> they are not visible on a 2nd node when cluster does failover it 
>>> should be able to pick up the ssd''s which are local to the
other node.
>>>
>>> L2ARC doesn''t contain any data which is critical to pool
so it
>>> doesn''t have to be shared between node. SLOG would be a
whole
>>> different story and generally it wouldn''t be possible. But
L2ARC
>>> should be.
>>>
>>>
>>
>> Perhaps a scenario like below should be allowed:
>>
>> node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 
>> node-1-ssd4
>> node-1# zpool export mysql
>> node-2# zpool import mysql
>> node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 
>> node-2-ssd4
>>
>>
>> This is assuming that pool can be imported when some of its slog 
>> devices are not accessible.
>> That way the pool always would have some L2ARC/SSDs not accessible 
>> but would provide L2ARC cache on each node with local SSDs.
>
> Actually it looks like it already works like that!
> A pool imports with its cache device unavailable just fine.
> Then I added another cache device. And I can still import it with the 
> first one available but not the 2nd one.
>
> zpool status complains of course but other than that it seems to be 
> working fine.
>
> Any thought?
>
>

Ooo.  That''s a scenario I hadn''t thought about.

Right now, I''m doing something similar on the cheap:   I have an iSCSI 
LUN (big ass SATA Raidz2) mounted on host A, and am using a spare 15k 
SAS drive locally as the L2ARC.  When I export it and import it to 
another host, with a identical disk in the same location (.e.g. c1t1d0), 
I''ve done a ''zpool remove/add'', since they write
different ZFS
signatures on the cache drive.  Works like a champ.

Given that I want to use the same device location (e.g.  c1t1d0) on both 
hosts, is there a way I can somehow add both as cache devices, and have 
ZFS tell them apart by the ID signature?

That is, on Host A, I do this:

# zpool create tank <iSCSI LUN> cache c1t1d0
# zpool export tank

Then, on Host B, I''m currently doing:

# zpool import tank
# zpool remove tank c1t1d0
# zpool add tank cache c1t1d0

I''d obviously like to figure some way that I don''t need to do
the ''zpool
add/remove''

Robert''s idea looks great, but I''m assuming that all the SSD
devices
have different drive locations.  What I need is some way of telling ZFS 
to use a device X as a cache device, based on it''s ZFS signature,
rather
than it''s physical device location, as that location might (in the
past)
be used by another vdev.  

Theoretically, I''d like to do something like this:

hostA# zpool create tank <iSCSI LUN>
hostA# zpool add tank cache c1t1d0
hostA# zpool export tank

hostB# zpool import tank
hostB# zpool add tank cache c1t1d0

And from then on, I just import/export between the two hosts, and it 
auto-picks the correct c1t1d0 drive.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Possibly Parallel Threads

Search for more possibly parallel threads

zfs discuss - Dec 2009 - L2ARC in clusters

[zfs-discuss] L2ARC in clusters

[zfs-discuss] L2ARC in clusters

[zfs-discuss] L2ARC in clusters

[zfs-discuss] L2ARC in clusters

[zfs-discuss] L2ARC in clusters

[zfs-discuss] L2ARC in clusters

Possibly Parallel Threads