I'm revisiting Gluster for the purpose of hosting Virtual Machine images (KVM). I was considering the following configuration 2 Nodes - 1 Brick per node (replication = 2) - 2 * 1GB Eth, LACP Bonded - Bricks hosted on ZFS - VM Images accessed via Block driver (gfapi) ZFS Config: - Raid 10 - SSD SLOG and L2ARC - 4 GB RAM - Compression (lz4) Does that seem like an sane layout? Question: With the gfapi driver, does the vm image appear as a file on the host (zfs) file system? Background: I currently have our VM's hosted on Ceph using a similar config as above, minus zfs. I've found that the performance for such a small setup is terrible, the maintenance headache is high and when a drive drops out, the performance gets *really* bad. Last time I checked, gluster was much slower at healing large files than ceph, I'm hoping that has improved :) -- Lindsay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150930/470f7b5d/attachment.html>
Hello Lindsay,>From personal experience: A two node volume can get you into trouble whenone of the nodes goes down unexpectedly/crashes. At the very least, you should have an arbiter volume (introduced in Gluster 3.7) on a separate physical node. We are running oVirt VM's on top of a two node Gluster cluster and a few months ago, I ended up transferring several terabytes from one node to the other because it was the fastest way to resolve the split-brain issues after a crash of Gluster on one of the nodes. In effect, the second node did not give us any redundancy, because the vm-images in split-brain would not be available for writes. I don't think 4 GB is enough RAM, especially if you have a large L2ARC: every L2ARC entry needs an entry in ARC as well, which is always in RAM. RAM is relatively cheap nowadays, so go for at least 16 or 32. You should also count the number of spindles you have and have it not exceed the number of VM's you're running much to get decent disk IO performance. On 30 September 2015 at 07:00, Lindsay Mathieson < lindsay.mathieson at gmail.com> wrote:> I'm revisiting Gluster for the purpose of hosting Virtual Machine images > (KVM). I was considering the following configuration > > 2 Nodes > - 1 Brick per node (replication = 2) > - 2 * 1GB Eth, LACP Bonded > - Bricks hosted on ZFS > - VM Images accessed via Block driver (gfapi) > > ZFS Config: > - Raid 10 > - SSD SLOG and L2ARC > - 4 GB RAM > - Compression (lz4) > > Does that seem like an sane layout? > > Question: With the gfapi driver, does the vm image appear as a file on the > host (zfs) file system? > > > Background: I currently have our VM's hosted on Ceph using a similar > config as above, minus zfs. I've found that the performance for such a > small setup is terrible, the maintenance headache is high and when a drive > drops out, the performance gets *really* bad. Last time I checked, gluster > was much slower at healing large files than ceph, I'm hoping that has > improved :) > > -- > Lindsay > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-- Tiemen Ruiten Systems Engineer R&D Media -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150930/3353bf30/attachment.html>
Hi, I'm actually doing this on a pretty much similar system. I'm using ovirt with KVM, RAIDZ on 4 disks with LZ4 and also dedup. My brick -nodes are also ovirt-Nodes (VM-Hosts), and have 32/48 GB RAM. ZFS may use up to 18GB of it, so a little more than your setup. ovirt needs rep=3, i have 3 bricks per node. I have no complaints about speed, ovirt is using thin provisioned vm -disks as one big file per disk, self heal operations do need their time but with little impact as far as i have seen. Performance is all about network speed i would say. Running VMs directly on the bricks may improve your VMs by using L2ARC... Frank Am Mittwoch, den 30.09.2015, 15:00 +1000 schrieb Lindsay Mathieson:> I'm revisiting Gluster for the purpose of hosting Virtual Machine > images (KVM). I was considering the following configuration > > 2 Nodes > - 1 Brick per node (replication = 2) > - 2 * 1GB Eth, LACP Bonded > - Bricks hosted on ZFS > - VM Images accessed via Block driver (gfapi) > > ZFS Config: > - Raid 10 > - SSD SLOG and L2ARC > - 4 GB RAM > - Compression (lz4) > > Does that seem like an sane layout? > > Question: With the gfapi driver, does the vm image appear as a file > on the host (zfs) file system? > > > Background: I currently have our VM's hosted on Ceph using a similar > config as above, minus zfs. I've found that the performance for such > a small setup is terrible, the maintenance headache is high and when > a drive drops out, the performance gets *really* bad. Last time I > checked, gluster was much slower at healing large files than ceph, > I'm hoping that has improved :) > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten Telefon: 03821-700-0 Fax: 03821-700-240 E-Mail: info at bodden-kliniken.de Internet: http://www.bodden-kliniken.de Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 081/126/00028 Aufsichtsratsvorsitzende: Carmen Schr?ter, Gesch?ftsf?hrer: Dr. Falko Milski Der Inhalt dieser E-Mail ist ausschlie?lich f?r den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Ver?f- fentlichung, Vervielf?ltigung oder Weitergabe des Inhalts dieser E-Mail unzul?ssig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu l?schen. Bodden-Kliniken Ribnitz-Damgarten GmbH 2014 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus ***
On 30 September 2015 at 18:36, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:> At the very least, you should have an arbiter volume (introduced in > Gluster 3.7) on a separate physical node. >Running Proxmox (Debian Wheezy) so limited to 3.6, however I do have a third peer node for voting purposes.> > I don't think 4 GB is enough RAM, especially if you have a large L2ARC >Learned my lesson with earlier zfs setups :) 1GB ZIL, 10GB L2ARC.> You should also count the number of spindles you have and have it not > exceed the number of VM's you're running much to get decent disk IO > performance. >New one to me, did you mean the reverse? Number of VM's should not exceed the spindles? thanks, -- Lindsay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151001/e8a0d56e/attachment.html>
On 30 September 2015 at 18:50, Frank Rothenstein < f.rothenstein at bodden-kliniken.de> wrote:> My brick > -nodes are also ovirt-Nodes (VM-Hosts), >I should have said that my brick noders are also VM nodes (Proxmox). 64 GB Ram, E5-2620 CPU> rep=3, i have 3 bricks per node. >Are your bricks separate disks? I assumed it would be better to let zfs handle multiple disks with striping/caching and just present one brick (zpool dataset) to gluster.> Running VMs directly on the bricks may > improve your VMs by using L2ARC... >Not sure what you mean by that - you run the vm direct from the brick on zfs rather than via the gluster mount? doesn't that mess with the replication? thanks, -- Lindsay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151001/a73b81c0/attachment.html>
On 30 September 2015 at 18:36, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:> From personal experience: A two node volume can get you into trouble when > one of the nodes goes down unexpectedly/crashes. At the very least, you > should have an arbiter volume (introduced in Gluster 3.7) on a separate > physical node. >I've introduced a third node for full replica 3 now. Surprised and pleased that there is no real performance drop. This is where the docs get a bit frustrating :( https://gluster.readthedocs.org/en/release-3.7.0/Features/server-quorum/ discusses server quorum and the settings, but doesn't say what the default values are, and there is no way of reading volume settings in gluster. -- Lindsay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151011/fdda0c9c/attachment.html>