Il 24/01/2017 12:09, Lindsay Mathieson ha scritto:> On 24/01/2017 6:33 PM, Alessandro Briosi wrote: >> I'm in the process of creating a 3 server cluster, and use gluster as a >> shared storage between the 3. > > Exactly what I run - my three gluster nodes are also VM Servers > (Proxmox cluster); > >Ok, I also am going to use Proxmox. Any advise on how to configure the bricks? I plan to have a 2 node replica. Would appreciate you sharing your full setup :-)>> I have 2 switches and each server has a 4 ethernet card which I'd like >> to dedicate to the storage. >> >> For redundancy I thought I could use multipath with gluster (like with >> iscsi), but am not sure it can be done. > > > I don't think so and there isn't really a need for it. Each node in a > gluster cluster is an active server, there is no SPOF. A gluster > client (fuse or gfapi) when connecting to the cluster will download > the list of all servers. If the server it is connected to dies, it > will failover to another server. I have done this many times with > rolling live upgrades. Additionally you can specify a list of servers > for the initial connection. > >Ok, the only thing I want to avoid is if the switch goes down. This is a SPOF. Having 2 switches would allow me to maintain 1 switch while let the other handle the cluster.>> So the question is: >> can I use dm-multipath with gluster > > Probably not. > >> If not should I use nic bonding? > > Yes, balance-alb is recommenced. With three servers 2 dedicated nics > per server is optimal, I doubt you would get much benefit from 3 or 4 > nics except redundancy. With 2*1G nics I get a reliable 120 MB/s seq > writes.Ok so having 2 bonds 1 attached to each switch would work. Though I still cannot get how to make gluster use both links (or at least one with active/passive). Should I work on RRDNS and keepalived? Or use some bonding of a bond within the 2 switches with balance-rr in this case? How do other implement this?> > I experimented with balance-rr and got somewhat erratic results. > >> Is there a way to have it use 2 bonded interfaces (so if 1 switch goes >> down, the other takes up or better use both for maximal throughput)? > > I'm pretty sure you could bond 4 nics with 2 through 1 switch and 2 > through the others. That should keep working if a switch goes down. >Well I was going to use LACP which seems to be the best once configured, needs switch support, but that's not a problem. Thanks. Alessandro -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170124/e1bfe0b6/attachment.html>
On 24/01/2017 10:23 PM, Alessandro Briosi wrote:> Ok so having 2 bonds 1 attached to each switch would work. Though I > still cannot get how to make gluster use both links (or at least one > with active/passive). > Should I work on RRDNS and keepalived? Or use some bonding of a bond > within the 2 switches with balance-rr in this case? > How do other implement this?I *think* you can have the 4 nics on the one bond on a node with two through one switch, two through the other. Might be worth asking on the proxmox list as well. That way just one interface for gluster to use. I don't think it would work with LACP though. -- Lindsay Mathieson
On 24/01/2017 10:23 PM, Alessandro Briosi wrote:> Ok, I also am going to use Proxmox. Any advise on how to configure the > bricks? > I plan to have a 2 node replica. Would appreciate you sharing your > full setup :-)Three node replica - preferred to two as quorum works best with a odd number of nodes. If storage on a third node is an issue then use an arbiter node. https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/ I use sharding with 65MB shards, it makes for very fast efficient heals. Just one brick per node, but each brick is 4 disks in ZFS Raid 10 with a fast SSD log device. zpool status pool: tank state: ONLINE scan: scrub repaired 100K in 16h15m with 0 errors on Tue Jan 3 15:21:10 2017 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N2874892 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4TKR8C2 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4TKR3Y0 ONLINE 0 0 0 ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4TKR84T ONLINE 0 0 0 logs ata-KINGSTON_SHSS37A240G_50026B7266074B8A-part1 ONLINE 0 0 0 cache ata-KINGSTON_SHSS37A240G_50026B7266074B8A-part2 ONLINE 0 0 0 ZFS properties: compression=lz4 atime=off xattr=sa sync=standard acltype=posixacl gluster v info Volume Name: datastore4 Type: Replicate Volume ID: 0ba131ef-311d-4bb1-be46-596e83b2f6ce Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore4 Brick2: vng.proxmox.softlog:/tank/vmdata/datastore4 Brick3: vna.proxmox.softlog:/tank/vmdata/datastore4 Options Reconfigured: performance.readdir-ahead: on cluster.data-self-heal: on features.shard: on cluster.quorum-type: auto cluster.server-quorum-type: server nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off performance.strict-write-ordering: off performance.stat-prefetch: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off cluster.eager-lock: enable network.remote-dio: enable features.shard-block-size: 64MB cluster.granular-entry-heal: yes cluster.locking-scheme: granular -- Lindsay Mathieson