thr3ads.net - Gluster users - [Gluster-users] Disbalanced load [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Milos Kozak

2014-Sep-03 13:11 UTC

[Gluster-users] Disbalanced load

Hi,

I am facing a quite strange problem when I do have two servers with the 
same configuration and the same hardware. Servers are connected by 
bonded 1GE. I have one volume:

[root at nodef02i 103]# gluster volume info

Volume Name: ph-fs-0
Type: Replicate
Volume ID: f8f569ea-e30c-43d0-bb94-b2f1164a7c9a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.11.100.1:/gfs/s3-sata-10k/fs
Brick2: 10.11.100.2:/gfs/s3-sata-10k/fs
Options Reconfigured:
storage.owner-gid: 498
storage.owner-uid: 498
network.ping-timeout: 2
performance.io-thread-count: 3
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

Intended to host virtual servers (KVM), the configuration is according 
to the gluster blog.


Currently I have got only one virtual server deployed on top of this 
volume in order to see effects of my stress tests. During the tests I 
write to the volume mounted through FUSE by dd (currently on one writing 
at a moment):

dd if=/dev/zero of=test2.img bs=1M count=20000 conv=fdatasync


Test 1) I run dd on nodef02i. Load on  nodef02i is max 1erl but on the 
nodef01i around 14erl (I do have 12threads CPU). After the write is done 
the load on nodef02i goes down, but the load goes up to 28erl on 
nodef01i. 20minutes it stays the same. In the mean time I can see:

[root at nodef01i 103]# gluster volume heal ph-fs-0 info
Volume ph-fs-0 is not started (Or) All the bricks are not running.
Volume heal failed

[root at nodef02i 103]# gluster volume heal ph-fs-0 info
Brick nodef01i.czprg:/gfs/s3-sata-10k/fs/
/3706a2cb0bb27ba5787b3c12388f4ebb - Possibly undergoing heal
/test.img - Possibly undergoing heal
Number of entries: 2

Brick nodef02i.czprg:/gfs/s3-sata-10k/fs/
/3706a2cb0bb27ba5787b3c12388f4ebb - Possibly undergoing heal
/test.img - Possibly undergoing heal
Number of entries: 2


[root at nodef01i 103]# gluster volume status
Status of volume: ph-fs-0
Gluster process                                         Port Online  Pid
------------------------------------------------------------------------------
Brick 10.11.100.1:/gfs/s3-sata-10k/fs                   49152 Y       56631
Brick 10.11.100.2:/gfs/s3-sata-10k/fs                   49152 Y       3372
NFS Server on localhost                                 2049 Y       56645
Self-heal Daemon on localhost                           N/A Y       56649
NFS Server on 10.11.100.2                               2049 Y       3386
Self-heal Daemon on 10.11.100.2                         N/A Y       3387

Task Status of Volume ph-fs-0
------------------------------------------------------------------------------
There are no active volume tasks

This very high load takes another 20-30minutes. During the first test I 
restarted glusterd service after 10minutes because everything seemed to 
me that the service does not work, but I could see very high load on the 
nodef01i.
Consequently, the virtual server yields errors about problems with EXT4 
filesystem - MySQL stops.



When the load culminated I tried to run the same test but from opposite 
direction. I wrote (dd) from nodef01i - test2. Happened more or less the 
same. I gained extremely high load on nodef01i and minimal load on 
nodef02i. Outputs from heal were more or less the same..


I would like to tweak this but I don?t know what I should focus on. 
Thank you for help.

Milos


-------------- next part --------------
A non-text attachment was scrubbed...
Name: test1-nodef02i.tar.bz2
Type: application/x-bzip
Size: 41399 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140903/1b7dbcf3/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test1-nodef01i.tar.bz2
Type: application/x-bzip
Size: 38546 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140903/1b7dbcf3/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2-nodef01i.tar.bz2
Type: application/x-bzip
Size: 53799 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140903/1b7dbcf3/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2-nodef02i.tar.bz2
Type: application/x-bzip
Size: 42824 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140903/1b7dbcf3/attachment-0003.bin>

Roman

2014-Sep-03 13:14 UTC

head link

[Gluster-users] Disbalanced load

Hi,

I had some issues with files generated from /dev/zero also. try real files
or /dev/urandom :)
I don't know, if there is a real issue/bug with files generated from
/dev/zero ? Devs should check them out  /me thinks.


2014-09-03 16:11 GMT+03:00 Milos Kozak <milos.kozak at lejmr.com>:
> Hi,
>
> I am facing a quite strange problem when I do have two servers with the
> same configuration and the same hardware. Servers are connected by bonded
> 1GE. I have one volume:
>
> [root at nodef02i 103]# gluster volume info
>
> Volume Name: ph-fs-0
> Type: Replicate
> Volume ID: f8f569ea-e30c-43d0-bb94-b2f1164a7c9a
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.11.100.1:/gfs/s3-sata-10k/fs
> Brick2: 10.11.100.2:/gfs/s3-sata-10k/fs
> Options Reconfigured:
> storage.owner-gid: 498
> storage.owner-uid: 498
> network.ping-timeout: 2
> performance.io-thread-count: 3
> cluster.server-quorum-type: server
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
>
> Intended to host virtual servers (KVM), the configuration is according to
> the gluster blog.
>
>
> Currently I have got only one virtual server deployed on top of this
> volume in order to see effects of my stress tests. During the tests I write
> to the volume mounted through FUSE by dd (currently on one writing at a
> moment):
>
> dd if=/dev/zero of=test2.img bs=1M count=20000 conv=fdatasync
>
>
> Test 1) I run dd on nodef02i. Load on  nodef02i is max 1erl but on the
> nodef01i around 14erl (I do have 12threads CPU). After the write is done
> the load on nodef02i goes down, but the load goes up to 28erl on nodef01i.
> 20minutes it stays the same. In the mean time I can see:
>
> [root at nodef01i 103]# gluster volume heal ph-fs-0 info
> Volume ph-fs-0 is not started (Or) All the bricks are not running.
> Volume heal failed
>
> [root at nodef02i 103]# gluster volume heal ph-fs-0 info
> Brick nodef01i.czprg:/gfs/s3-sata-10k/fs/
> /3706a2cb0bb27ba5787b3c12388f4ebb - Possibly undergoing heal
> /test.img - Possibly undergoing heal
> Number of entries: 2
>
> Brick nodef02i.czprg:/gfs/s3-sata-10k/fs/
> /3706a2cb0bb27ba5787b3c12388f4ebb - Possibly undergoing heal
> /test.img - Possibly undergoing heal
> Number of entries: 2
>
>
> [root at nodef01i 103]# gluster volume status
> Status of volume: ph-fs-0
> Gluster process                                         Port Online  Pid
> ------------------------------------------------------------
> ------------------
> Brick 10.11.100.1:/gfs/s3-sata-10k/fs                   49152 Y
>  56631
> Brick 10.11.100.2:/gfs/s3-sata-10k/fs                   49152 Y       3372
> NFS Server on localhost                                 2049 Y       56645
> Self-heal Daemon on localhost                           N/A Y       56649
> NFS Server on 10.11.100.2                               2049 Y       3386
> Self-heal Daemon on 10.11.100.2                         N/A Y       3387
>
> Task Status of Volume ph-fs-0
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
> This very high load takes another 20-30minutes. During the first test I
> restarted glusterd service after 10minutes because everything seemed to me
> that the service does not work, but I could see very high load on the
> nodef01i.
> Consequently, the virtual server yields errors about problems with EXT4
> filesystem - MySQL stops.
>
>
>
> When the load culminated I tried to run the same test but from opposite
> direction. I wrote (dd) from nodef01i - test2. Happened more or less the
> same. I gained extremely high load on nodef01i and minimal load on
> nodef02i. Outputs from heal were more or less the same..
>
>
> I would like to tweak this but I don?t know what I should focus on. Thank
> you for help.
>
> Milos
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>


-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140903/fdb50734/attachment.html>

Gluster users - Sep 2014 - Disbalanced load

[Gluster-users] Disbalanced load

[Gluster-users] Disbalanced load