thr3ads.net - Gluster users - [Gluster-users] Low (<0.2ms) latency reads, is it possible at all? [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Willem

2013-Apr-18 18:28 UTC

[Gluster-users] Low (<0.2ms) latency reads, is it possible at all?

I'm testing GlusterFS viability for use with a typical PHP webapp (ie. lots
of small files). I don't care so much for the C in the CAP theorem, as I
have very few writes. I could live with a write propagation delay of 5
minutes (or dirty caches for up to 5 minutes).

So I'm optimizing for low latency reads of small files. My testsetup is 2
node replication. Each node is both server and gluster client. Both are in
sync. I stop glusterfs-server @ node2. @node1, I run a simple benchmark:
repeatedly (to prime the cache) open & close 1000 small files. I have
enabled the client-side io-cache and quick-read translators (see below for
config).

The results are consistently 2 ms per open (O_RDONLY) call. Which is too
slow, unfortunately, as I need < 0.2ms.

The same test with a local Gluster server and NFS mount, I get somewhat
better performance but still 0.6ms.

The same test with Linux NFS server (v3) and local mount, I get 0.12ms per
open.

I can't explain the lag using Gluster, because I can't see any traffic
being sent to node2. I would expect that using the io-cache translator and
local-only operation, the performance would approach that of the kernel FS
cache.

Is this assumption correct? If yes, how would I profile the client sub
system to detect the bottleneck?

If no, then I have to accept that 0.8ms open calls are the best that I
could squeeze out of this system. Then I'll probably look into AFS,
userspace async replication or gluster NFS mount with cachefilesd. Which
would you recommend?

Thanks a lot!
BTW I like Gluster a lot, and hope that it is also suitable for this small
files use case ;)

//Willem

PS Am testing with kernel 3.5.0-17-generic 64bit and gluster 3.2.5-1ubuntu1.

Client volfile:
+------------------------------------------------------------------------------+
  1: volume testvol-client-0
  2:     type protocol/client
  3:     option remote-host g1
  4:     option remote-subvolume /data
  5:     option transport-type tcp
  6: end-volume
  7:
  8: volume testvol-client-1
  9:     type protocol/client
 10:     option remote-host g2
 11:     option remote-subvolume /data
 12:     option transport-type tcp
 13: end-volume
 14:
 15: volume testvol-replicate-0
 16:     type cluster/replicate
 17:     subvolumes testvol-client-0 testvol-client-1
 18: end-volume
 19:
 20: volume testvol-write-behind
 21:     type performance/write-behind
 22:     option flush-behind on
 23:     subvolumes testvol-replicate-0
 24: end-volume
 25:
 26: volume testvol-io-cache
 27:     type performance/io-cache
 28:     option max-file-size 256KB
 29:     option cache-timeout 60
 30:     option priority *.php:3,*:0
 31:     option cache-size 256MB
 32:     subvolumes testvol-write-behind
 33: end-volume
 34:
 35: volume testvol-quick-read
 36:     type performance/quick-read
 37:     option cache-size 256MB
 38:     subvolumes testvol-io-cache
 39: end-volume
 40:
 41: volume testvol
 42:     type debug/io-stats
 43:     option latency-measurement off
 44:     option count-fop-hits off
 45:     subvolumes testvol-quick-read
 46: end-volume

Server volfile:
+------------------------------------------------------------------------------+
  1: volume testvol-posix
  2:     type storage/posix
  3:     option directory /data
  4: end-volume
  5:
  6: volume testvol-access-control
  7:     type features/access-control
  8:     subvolumes testvol-posix
  9: end-volume
 10:
 11: volume testvol-locks
 12:     type features/locks
 13:     subvolumes testvol-access-control
 14: end-volume
 15:
 16: volume testvol-io-threads
 17:     type performance/io-threads
 18:     subvolumes testvol-locks
 19: end-volume
 20:
 21: volume testvol-marker
 22:     type features/marker
 23:     option volume-uuid bc89684f-569c-48b0-bc67-09bfd30ba253
 24:     option timestamp-file /etc/glusterd/vols/testvol/marker.tstamp
 25:     option xtime off
 26:     option quota off
 27:     subvolumes testvol-io-threads
 28: end-volume
 29:
 30: volume /data
 31:     type debug/io-stats
 32:     option latency-measurement off
 33:     option count-fop-hits off
 34:     subvolumes testvol-marker
 35: end-volume
 36:
 37: volume testvol-server
 38:     type protocol/server
 39:     option transport-type tcp
 40:     option auth.addr./data.allow *
 41:     subvolumes /data
 42: end-volume

My benchmark to simulate PHP webapp i/o:
#!/usr/bin/env python

import sys
import os
import time
import optparse

def print_timing(func):
    def wrapper(*arg):
        t1 = time.time()
        res = func(*arg)
        t2 = time.time()
        print '%-15.15s %6d ms' % (func.func_name, int ( (t2-t1)*1000.0
))
        return res
    return wrapper


def parse_options():
    parser = optparse.OptionParser()
    parser.add_option("--path", '-p',
default="/mnt/glusterfs",
        help="Base directory for running tests (default:
/mnt/glusterfs)",
    )
    parser.add_option("--num", '-n', type="int",
default=100,
        help="Number of files per test (default: 100)",
    )
    (options, args) = parser.parse_args()
    return options

class FSBench():

    def __init__(self,path="/tmp",num=100):
        self.path = path
        self.num  = num

    @print_timing
    def test_open_read(self):
        for filename in self.get_files():
            f = open(filename)
            data = f.read()
            f.close()


    def get_files(self):
        for i in range(self.num):
            filename = self.path + "/test_%03d" % i
            yield filename

    @print_timing
    def test_stat(self):
        for filename in self.get_files():
            os.stat(filename)

    @print_timing
    def test_stat_nonexist(self):
        for filename in self.get_files():
            try:
                os.stat(filename+"blkdsflskdf")
            except OSError:
                pass

    @print_timing
    def test_write(self):
        for filename in self.get_files():
            f = open(filename,'w')
            f.write('hi there\n')
            f.close()

    @print_timing
    def test_delete(self):
        for filename in self.get_files():
            os.unlink(filename)

if __name__ == '__main__':

    options = parse_options()
    bench   = FSBench(path=options.path, num=options.num)

    bench.test_write()
    bench.test_open_read()
    bench.test_stat()
    bench.test_stat_nonexist()
    bench.test_delete()
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130418/9b8ff478/attachment.html>

Raghavendra Gowdappa

2013-Apr-24 03:31 UTC

head link

[Gluster-users] Low (<0.2ms) latency reads, is it possible at all?

Hi willem,

Please find the inlined comments:

----- Original Message -----> From: "Willem" <gwillem at gmail.com>
> To: gluster-users at gluster.org
> Sent: Thursday, April 18, 2013 11:58:46 PM
> Subject: [Gluster-users] Low (<0.2ms) latency reads, is it possible at
all?
> 
> I'm testing GlusterFS viability for use with a typical PHP webapp (ie.
lots
> of small files). I don't care so much for the C in the CAP theorem, as
I
> have very few writes. I could live with a write propagation delay of 5
> minutes (or dirty caches for up to 5 minutes).
> 
> So I'm optimizing for low latency reads of small files. My testsetup is
2
> node replication. Each node is both server and gluster client. Both are in
> sync. I stop glusterfs-server @ node2. @node1, I run a simple benchmark:
> repeatedly (to prime the cache) open & close 1000 small files. I have
> enabled the client-side io-cache and quick-read translators (see below for
> config).
> 
> The results are consistently 2 ms per open (O_RDONLY) call. Which is too
> slow, unfortunately, as I need < 0.2ms.
> 
> The same test with a local Gluster server and NFS mount, I get somewhat
> better performance but still 0.6ms.
> 
> The same test with Linux NFS server (v3) and local mount, I get 0.12ms per
> open.
> 
> I can't explain the lag using Gluster, because I can't see any
traffic being
> sent to node2. I would expect that using the io-cache translator and
> local-only operation, the performance would approach that of the kernel FS
> cache.
> 
> Is this assumption correct? If yes, how would I profile the client sub
system
> to detect the bottleneck?
> 
> If no, then I have to accept that 0.8ms open calls are the best that I
could
> squeeze out of this system. Then I'll probably look into AFS, userspace
> async replication or gluster NFS mount with cachefilesd. Which would you
> recommend?
> 
> Thanks a lot!
> BTW I like Gluster a lot, and hope that it is also suitable for this small
> files use case ;)
> 
> //Willem
> 
> PS Am testing with kernel 3.5.0-17-generic 64bit and gluster
3.2.5-1ubuntu1.
> 
> Client volfile:
>
+------------------------------------------------------------------------------+
> 1: volume testvol-client-0
> 2: type protocol/client
> 3: option remote-host g1
> 4: option remote-subvolume /data
> 5: option transport-type tcp
> 6: end-volume
> 7:
> 8: volume testvol-client-1
> 9: type protocol/client
> 10: option remote-host g2
> 11: option remote-subvolume /data
> 12: option transport-type tcp
> 13: end-volume
> 14:
> 15: volume testvol-replicate-0
> 16: type cluster/replicate
> 17: subvolumes testvol-client-0 testvol-client-1
> 18: end-volume
> 19:
> 20: volume testvol-write-behind
> 21: type performance/write-behind
> 22: option flush-behind on
> 23: subvolumes testvol-replicate-0
> 24: end-volume
> 25:
> 26: volume testvol-io-cache
> 27: type performance/io-cache
> 28: option max-file-size 256KB
> 29: option cache-timeout 60
> 30: option priority *.php:3,*:0
> 31: option cache-size 256MB
> 32: subvolumes testvol-write-behind
> 33: end-volume
> 34:
> 35: volume testvol-quick-read
> 36: type performance/quick-read
default value for option "max-file-size" is 64KB. Seems like your
files are bigger than 64KB. Can you add this option and rerun the tests? Also
can you rerun the tests by disabling quick-read and compare the results?
> 37: option cache-size 256MB
> 38: subvolumes testvol-io-cache
> 39: end-volume
> 40:
> 41: volume testvol
> 42: type debug/io-stats
> 43: option latency-measurement off
> 44: option count-fop-hits off
> 45: subvolumes testvol-quick-read
> 46: end-volume
> 
> Server volfile:
>
+------------------------------------------------------------------------------+
> 1: volume testvol-posix
> 2: type storage/posix
> 3: option directory /data
> 4: end-volume
> 5:
> 6: volume testvol-access-control
> 7: type features/access-control
> 8: subvolumes testvol-posix
> 9: end-volume
> 10:
> 11: volume testvol-locks
> 12: type features/locks
> 13: subvolumes testvol-access-control
> 14: end-volume
> 15:
> 16: volume testvol-io-threads
> 17: type performance/io-threads
> 18: subvolumes testvol-locks
> 19: end-volume
> 20:
> 21: volume testvol-marker
> 22: type features/marker
> 23: option volume-uuid bc89684f-569c-48b0-bc67-09bfd30ba253
> 24: option timestamp-file /etc/glusterd/vols/testvol/marker.tstamp
> 25: option xtime off
> 26: option quota off
> 27: subvolumes testvol-io-threads
> 28: end-volume
> 29:
> 30: volume /data
> 31: type debug/io-stats
> 32: option latency-measurement off
> 33: option count-fop-hits off
> 34: subvolumes testvol-marker
> 35: end-volume
> 36:
> 37: volume testvol-server
> 38: type protocol/server
> 39: option transport-type tcp
> 40: option auth.addr./data.allow *
> 41: subvolumes /data
> 42: end-volume
> 
> My benchmark to simulate PHP webapp i/o:
> #!/usr/bin/env python
> 
> import sys
> import os
> import time
> import optparse
> 
> def print_timing(func):
> def wrapper(*arg):
> t1 = time.time()
> res = func(*arg)
> t2 = time.time()
> print '%-15.15s %6d ms' % (func.func_name, int ( (t2-t1)*1000.0 ))
> return res
> return wrapper
> 
> 
> def parse_options():
> parser = optparse.OptionParser()
> parser.add_option("--path", '-p',
default="/mnt/glusterfs",
> help="Base directory for running tests (default:
/mnt/glusterfs)",
> )
> parser.add_option("--num", '-n', type="int",
default=100,
> help="Number of files per test (default: 100)",
> )
> (options, args) = parser.parse_args()
> return options
> 
> class FSBench():
> def __init__(self,path="/tmp",num=100):
> self.path = path
> self.num = num
> @print_timing
> def test_open_read(self):
> for filename in self.get_files():
> f = open(filename)
> data = f.read()
> f.close()
> def get_files(self):
> for i in range(self.num):
> filename = self.path + "/test_%03d" % i
> yield filename
> @print_timing
> def test_stat(self):
> for filename in self.get_files():
> os.stat(filename)
> 
> @print_timing
> def test_stat_nonexist(self):
> for filename in self.get_files():
> try:
> os.stat(filename+"blkdsflskdf")
> except OSError:
> pass
> @print_timing
> def test_write(self):
> for filename in self.get_files():
> f = open(filename,'w')
> f.write('hi there\n')
> f.close()
> @print_timing
> def test_delete(self):
> for filename in self.get_files():
> os.unlink(filename)
> if __name__ == '__main__':
> 
> options = parse_options()
> bench = FSBench(path=options.path, num=options.num)
> bench.test_write()
> bench.test_open_read()
> bench.test_stat()
> bench.test_stat_nonexist()
> bench.test_delete()
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

Marcus Bointon

2013-Apr-24 12:18 UTC

head link

[Gluster-users] Low (<0.2ms) latency reads, is it possible at all?

On 24 Apr 2013, at 14:00, "Willem" <gwillem at gmail.com> wrote:
>> I'm testing GlusterFS viability for use with a typical PHP webapp
(ie. lots
>> of small files). I don't care so much for the C in the CAP theorem,
as I
>> have very few writes. I could live with a write propagation delay of 5
>> minutes (or dirty caches for up to 5 minutes).
We know that gluster's small-file performance isn't good, and since you
can live with such long write propagation, reciprocal rsync could be a better
and simpler solution. That way you'd get much faster local performance. The
only issue really is that deletes aren't possible to propagate correctly
with 2-way rsync (because a delete at one end is indistinguishable from an add
at the other), but you may be able to live with it. csync2 aims to solve the
delete issue with a transaction database, but I could never make it work.

To get a consistent 0.2ms off anything you're going to need to be on SSDs -
you can't fit everything in your cache.

Marcus
-- 
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info at hand CRM solutions
marcus at synchromedia.co.uk | http://www.synchromedia.co.uk/

Gluster users - Apr 2013 - Low (<0.2ms) latency reads, is it possible at all?

[Gluster-users] Low (<0.2ms) latency reads, is it possible at all?

[Gluster-users] Low (<0.2ms) latency reads, is it possible at all?

[Gluster-users] Low (<0.2ms) latency reads, is it possible at all?