Hi Xavi At this time I'm using 'plain' bricks with XFS. I'll be moving to LVM cached bricks. There is no RAID for data bricks, but I'll be using hardware RAID10 for SSD cache disks (I can use 'writeback' cache in this case). 'small file performance' is the main reason I'm looking at different options, i.e. using formated sparse files. I spent considerable amount of time tuning 10GB/kernel/gluster to reduce latency - the small file performance improved ~50% but it's still no good enough, especially when I need to use Gluster for /home folders. I understand limitations and single point of failure in case with sparse files. I'm considering different options to provide HA (pacemaker/corosync, keepalived or using VMs - RHEV - to deliver storage). Thank you for your reply. On Tue, Sep 26, 2017 at 3:55 AM, Xavi Hernandez <jahernan at redhat.com> wrote:> Hi Dmitri, > > On 22/09/17 17:07, Dmitri Chebotarov wrote: > >> >> Hello >> >> I'm running some tests to compare performance between Gluster FUSE mount >> and formated sparse files (located on the same Gluster FUSE mount). >> >> The Gluster volume is EC (same for both tests). >> >> I'm seeing HUGE difference and trying to figure out why. >> > > Could you explain what hardware configuration are you using ? > > Do you have a plain disk for each brick formatted in XFS, or do you have > some RAID configuration ? > > >> Here is an example: >> >> GlusterFUSE mount: >> >> # cd /mnt/glusterfs >> # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1 >> 1+0 records in >> 1+0 records out >> 1073741824 bytes (1.1 GB) copied, 9.74757 s, *110 MB/s* >> >> Sparse file (located on GlusterFUSE mount): >> >> # truncate -l 100GB /mnt/glusterfs/xfs-100G.img >> # mkfs.xfs /mnt/glusterfs/xfs-100G.img >> # mount -o loop /mnt/glusterfs/xfs-100G.img /mnt/xfs-100G >> # cd /mnt/xfs-100G >> # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1 >> 1+0 records in >> 1+0 records out >> 1073741824 bytes (1.1 GB) copied, 1.20576 s, *891 MB/s* >> >> The same goes for working with small files (i.e. code file, make, etc) >> with the same data located on FUSE mount vs formated sparse file on the >> same FUSE mount. >> >> What would explain such difference? >> > > First of all, doing tests with relatively small files tends to be > misleading because of caching capacity of the operating system (to minimize > that, you can add 'conv=fsync' option to dd). You should do tests with file > sizes bigger than the amount of physical memory on servers. This way you > minimize cache effects and see the real sustained performance. > > A second important point to note is that gluster is a distributed file > system that can be accessed simultaneously by more than one client. This > means that consistency must be assured in all cases, which makes things go > to bricks sooner than local filesystems normally do. > > In your case, all data saved to the fuse volume will most probably be > present on bricks once the dd command completes. On the other side, the > test through the formatted sparse file, most probably, is keeping most of > the data in the cache of the client machine. > > Note that using the formatted sparse file makes it possible a better use > of local cache, improving (relatively) small file access, but on the other > side, this filesystem can only be used from a single client (single mount). > If this client fails for some reason, you will loose access to your data. > > >> How does Gluster work with sparse files in general? I may move some of >> the data on gluster volumes to formated sparse files.. >> > > Gluster works fine with sparse files. However you should consider the > previous points before choosing the formatted sparse files option. I guess > that the sustained throughput will be very similar for bigger files. > > Regards, > > Xavi > > >> Thank you. >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170926/14b9f0c1/attachment.html>
Have you done any testing with replica 2/3? IIRC my replica 2/3 tests out performed EC on smallfile workloads, it may be worth looking into if you can't get EC up to where you need it to be. -b ----- Original Message -----> From: "Dmitri Chebotarov" <4dimach at gmail.com> > Cc: "gluster-users" <Gluster-users at gluster.org> > Sent: Tuesday, September 26, 2017 9:57:55 AM > Subject: Re: [Gluster-users] sparse files on EC volume > > Hi Xavi > > At this time I'm using 'plain' bricks with XFS. I'll be moving to LVM cached > bricks. > There is no RAID for data bricks, but I'll be using hardware RAID10 for SSD > cache disks (I can use 'writeback' cache in this case). > > 'small file performance' is the main reason I'm looking at different options, > i.e. using formated sparse files. > I spent considerable amount of time tuning 10GB/kernel/gluster to reduce > latency - the small file performance improved ~50% but it's still no good > enough, especially when I need to use Gluster for /home folders. > > I understand limitations and single point of failure in case with sparse > files. I'm considering different options to provide HA (pacemaker/corosync, > keepalived or using VMs - RHEV - to deliver storage). > > Thank you for your reply. > > > On Tue, Sep 26, 2017 at 3:55 AM, Xavi Hernandez < jahernan at redhat.com > > wrote: > > > Hi Dmitri, > > On 22/09/17 17:07, Dmitri Chebotarov wrote: > > > > Hello > > I'm running some tests to compare performance between Gluster FUSE mount and > formated sparse files (located on the same Gluster FUSE mount). > > The Gluster volume is EC (same for both tests). > > I'm seeing HUGE difference and trying to figure out why. > > Could you explain what hardware configuration are you using ? > > Do you have a plain disk for each brick formatted in XFS, or do you have some > RAID configuration ? > > > > > Here is an example: > > GlusterFUSE mount: > > # cd /mnt/glusterfs > # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1 > 1+0 records in > 1+0 records out > 1073741824 bytes (1.1 GB) copied, 9.74757 s, *110 MB/s* > > Sparse file (located on GlusterFUSE mount): > > # truncate -l 100GB /mnt/glusterfs/xfs-100G.img > # mkfs.xfs /mnt/glusterfs/xfs-100G.img > # mount -o loop /mnt/glusterfs/xfs-100G.img /mnt/xfs-100G > # cd /mnt/xfs-100G > # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1 > 1+0 records in > 1+0 records out > 1073741824 bytes (1.1 GB) copied, 1.20576 s, *891 MB/s* > > The same goes for working with small files (i.e. code file, make, etc) with > the same data located on FUSE mount vs formated sparse file on the same FUSE > mount. > > What would explain such difference? > > First of all, doing tests with relatively small files tends to be misleading > because of caching capacity of the operating system (to minimize that, you > can add 'conv=fsync' option to dd). You should do tests with file sizes > bigger than the amount of physical memory on servers. This way you minimize > cache effects and see the real sustained performance. > > A second important point to note is that gluster is a distributed file system > that can be accessed simultaneously by more than one client. This means that > consistency must be assured in all cases, which makes things go to bricks > sooner than local filesystems normally do. > > In your case, all data saved to the fuse volume will most probably be present > on bricks once the dd command completes. On the other side, the test through > the formatted sparse file, most probably, is keeping most of the data in the > cache of the client machine. > > Note that using the formatted sparse file makes it possible a better use of > local cache, improving (relatively) small file access, but on the other > side, this filesystem can only be used from a single client (single mount). > If this client fails for some reason, you will loose access to your data. > > > > > How does Gluster work with sparse files in general? I may move some of the > data on gluster volumes to formated sparse files.. > > Gluster works fine with sparse files. However you should consider the > previous points before choosing the formatted sparse files option. I guess > that the sustained throughput will be very similar for bigger files. > > Regards, > > Xavi > > > > > Thank you. > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
Hi Ben Thank you. I just ran some tests with the same data on EC and R3 volumes (same hardware). R3 is a lot faster.... EC untar 48.879s find 2.993s rm 11.244s R3 untar 10.938s find 0.722s rm 4.144s On Wed, Sep 27, 2017 at 3:12 AM, Ben Turner <bturner at redhat.com> wrote:> Have you done any testing with replica 2/3? IIRC my replica 2/3 tests out > performed EC on smallfile workloads, it may be worth looking into if you > can't get EC up to where you need it to be. > > -b > > ----- Original Message ----- > > From: "Dmitri Chebotarov" <4dimach at gmail.com> > > Cc: "gluster-users" <Gluster-users at gluster.org> > > Sent: Tuesday, September 26, 2017 9:57:55 AM > > Subject: Re: [Gluster-users] sparse files on EC volume > > > > Hi Xavi > > > > At this time I'm using 'plain' bricks with XFS. I'll be moving to LVM > cached > > bricks. > > There is no RAID for data bricks, but I'll be using hardware RAID10 for > SSD > > cache disks (I can use 'writeback' cache in this case). > > > > 'small file performance' is the main reason I'm looking at different > options, > > i.e. using formated sparse files. > > I spent considerable amount of time tuning 10GB/kernel/gluster to reduce > > latency - the small file performance improved ~50% but it's still no good > > enough, especially when I need to use Gluster for /home folders. > > > > I understand limitations and single point of failure in case with sparse > > files. I'm considering different options to provide HA > (pacemaker/corosync, > > keepalived or using VMs - RHEV - to deliver storage). > > > > Thank you for your reply. > > > > > > On Tue, Sep 26, 2017 at 3:55 AM, Xavi Hernandez < jahernan at redhat.com > > > wrote: > > > > > > Hi Dmitri, > > > > On 22/09/17 17:07, Dmitri Chebotarov wrote: > > > > > > > > Hello > > > > I'm running some tests to compare performance between Gluster FUSE mount > and > > formated sparse files (located on the same Gluster FUSE mount). > > > > The Gluster volume is EC (same for both tests). > > > > I'm seeing HUGE difference and trying to figure out why. > > > > Could you explain what hardware configuration are you using ? > > > > Do you have a plain disk for each brick formatted in XFS, or do you have > some > > RAID configuration ? > > > > > > > > > > Here is an example: > > > > GlusterFUSE mount: > > > > # cd /mnt/glusterfs > > # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1 > > 1+0 records in > > 1+0 records out > > 1073741824 bytes (1.1 GB) copied, 9.74757 s, *110 MB/s* > > > > Sparse file (located on GlusterFUSE mount): > > > > # truncate -l 100GB /mnt/glusterfs/xfs-100G.img > > # mkfs.xfs /mnt/glusterfs/xfs-100G.img > > # mount -o loop /mnt/glusterfs/xfs-100G.img /mnt/xfs-100G > > # cd /mnt/xfs-100G > > # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1 > > 1+0 records in > > 1+0 records out > > 1073741824 bytes (1.1 GB) copied, 1.20576 s, *891 MB/s* > > > > The same goes for working with small files (i.e. code file, make, etc) > with > > the same data located on FUSE mount vs formated sparse file on the same > FUSE > > mount. > > > > What would explain such difference? > > > > First of all, doing tests with relatively small files tends to be > misleading > > because of caching capacity of the operating system (to minimize that, > you > > can add 'conv=fsync' option to dd). You should do tests with file sizes > > bigger than the amount of physical memory on servers. This way you > minimize > > cache effects and see the real sustained performance. > > > > A second important point to note is that gluster is a distributed file > system > > that can be accessed simultaneously by more than one client. This means > that > > consistency must be assured in all cases, which makes things go to bricks > > sooner than local filesystems normally do. > > > > In your case, all data saved to the fuse volume will most probably be > present > > on bricks once the dd command completes. On the other side, the test > through > > the formatted sparse file, most probably, is keeping most of the data in > the > > cache of the client machine. > > > > Note that using the formatted sparse file makes it possible a better use > of > > local cache, improving (relatively) small file access, but on the other > > side, this filesystem can only be used from a single client (single > mount). > > If this client fails for some reason, you will loose access to your data. > > > > > > > > > > How does Gluster work with sparse files in general? I may move some of > the > > data on gluster volumes to formated sparse files.. > > > > Gluster works fine with sparse files. However you should consider the > > previous points before choosing the formatted sparse files option. I > guess > > that the sustained throughput will be very similar for bigger files. > > > > Regards, > > > > Xavi > > > > > > > > > > Thank you. > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170928/588d4a76/attachment.html>