Harald Hannelius
2012-Mar-02 09:00 UTC
[Gluster-users] Write performance in a replicated/distributed setup with KVM?
This has probably been discussed before, but since I'm new on the list I hope You have patience with me. I have a four brick distributed/replicated setup. The computers are multi-core 16GB memory and 2*2.0TB in raid1 SATA-disks locally. The nodes are connected by 1 GB ethernet. All nodes have glusterfs 3.3beta2 installed and they are running debian 6 64bit. The underlying filesystems are xfs. I have setup a volume like so; gluster volume create virtuals replica 2 transport tcp \ adraste:/data/brick alcippe:/data/brick aethra:/data/brick helen:/data/brick Which resulted in a nice volume; # gluster volume info virtuals Volume Name: virtuals Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: adraste:/data/brick Brick2: alcippe:/data/brick Brick3: aethra:/data/brick Brick4: helen:/data/brick All seems OK so far, but write performance seems very slow. When writing to localhost:/virtuals I get single-digit MB/s performance which isn't really what I had expected. I know that the write has to go to at least two (?) nodes at the same time, but still? A single scp of a 1GB file from a node to another gives something like ~100MBps. A copy of a virtual image took 17 minutes; # time cp debtest.raw /gluster/debtest.img real 17m36.727s user 0m1.832s sys 0m14.081s # ls -lah /gluster/debtest.img -rw------- 1 root root 20G Mar 1 12:35 /gluster/debtest.img # du -ah /gluster/debtest.img 4.5G /gluster/debtest.img I noted that the processlist shows that direct-io-mode is disabled. Default should be on, or should it? Any help is really appreciated! -- Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
Harald Hannelius
2012-Mar-02 09:04 UTC
[Gluster-users] Write performance in a replicated/distributed setup with KVM?
This has probably been discussed before, but since I'm new on the list I hope You have patience with me. I have a four brick distributed/replicated setup. The computers are multi-core 16GB memory and 2*2.0TB in raid1 SATA-disks locally. The nodes are connected by 1 GB ethernet. All nodes have glusterfs 3.3beta2 installed and they are running debian 6 64bit. The underlying filesystems are xfs. I have setup a volume like so; gluster volume create virtuals replica 2 transport tcp \ adraste:/data/brick alcippe:/data/brick aethra:/data/brick helen:/data/brick Which resulted in a nice volume; # gluster volume info virtuals Volume Name: virtuals Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: adraste:/data/brick Brick2: alcippe:/data/brick Brick3: aethra:/data/brick Brick4: helen:/data/brick All seems OK so far, but write performance seems very slow. When writing to localhost:/virtuals I get single-digit MB/s performance which isn't really what I had expected. I know that the write has to go to at least two (?) nodes at the same time, but still? A single scp of a 1GB file from a node to another gives something like ~100MBps. A copy of a virtual image took 17 minutes; # time cp debtest.raw /gluster/debtest.img real 17m36.727s user 0m1.832s sys 0m14.081s # ls -lah /gluster/debtest.img -rw------- 1 root root 20G Mar 1 12:35 /gluster/debtest.img # du -ah /gluster/debtest.img 4.5G /gluster/debtest.img I noted that the processlist shows that direct-io-mode is disabled. Default should be on, or should it? Any help is really appreciated! -- Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
Harald Hannelius
2012-Mar-02 12:41 UTC
[Gluster-users] Write performance in a replicated/distributed setup with KVM?
On Fri, 2 Mar 2012, Brian Candler wrote:> On Fri, Mar 02, 2012 at 01:02:39PM +0200, Harald Hannelius wrote: >>> If both are fast: then retest using a two-node replicated volume. >> >> gluster volume create test replica 2 transport tcp >> aethra:/data/single alcippe:/data/single >> >> Volume Name: test >> Type: Replicate >> Status: Started >> Number of Bricks: 2 >> Transport-type: tcp >> Bricks: >> Brick1: aethra:/data/single >> Brick2: alcippe:/data/single >> >> # time dd if=/dev/zero bs=1M count=20000 of=/mnt/testfile >> 20000+0 records in >> 20000+0 records out >> 20971520000 bytes (21 GB) copied, 426.62 s, 49.2 MB/s >> >> real 7m6.625s >> user 0m0.040s >> sys 0m12.293s >> >> As expected, roughly half of the single node setup. I could live >> with that too. > > So next is back to the four-node setup you had before. I would expect that > to perform about the same.So would I expect too. But; # time dd if=/dev/zero bs=1M count=20000 of=/gluster/testfile 20000+0 records in 20000+0 records out 20971520000 bytes (21 GB) copied, 1058.22 s, 19.8 MB/s real 17m38.357s user 0m0.040s sys 0m12.501s # gluster volume info Volume Name: virtuals Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: adraste:/data/brick Brick2: alcippe:/data/brick Brick3: aethra:/data/brick Brick4: helen:/data/brick Options Reconfigured: cluster.data-self-heal-algorithm: diff cluster.self-heal-window-size: 1 performance.io-thread-count: 64 performance.cache-size: 536870912 performance.write-behind-window-size: 16777216 performance.flush-behind: on At the same time nagios tries to empty my cell phone battery when virtual hosts don't respond to ping anymore. That virtual host is a mailserver and it receives e-mail. I guess that sendmail+procmail+imapd generates some I/O. At least I got double figure readings this time. Sometimes I get write speeds of 5-6 MB/s.> If you have problems with high levels of concurrency, this might be a > problem with the number of I/O threads which gluster creates. You actually > only get log(2) of the number of outstanding requests in the queue. > > I made a (stupid, non-production) patch which got around this problem in > my benchmarking: > http://gluster.org/pipermail/gluster-users/2012-February/009590.html > > IMO it would be better to be able to configure the *minimum* number of I/O > threads to spawn. You can configure the maximum but it will almost never be > reached. > > Regards, > > Brian. > >-- Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020