thr3ads.net - Gluster users - [Gluster-users] glusterfs under high load failing? [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Roman

2014-Oct-13 16:06 UTC

head link

[Gluster-users] glusterfs under high load failing?

Hi,

I've got this kind of setup (servers run replica)

@ 10G backend
gluster storage1
gluster storage2
gluster client1

@1g backend
other gluster clients

Servers got HW RAID5 with SAS disks.

So today I've desided to create a 900GB file for iscsi target that will be
located @ glusterfs separate volume, using dd (just a dummy file filled
with zeros, bs=1G count 900)
For the first of all the process took pretty lots of time, the writing
speed was 130 MB/sec (client port was 2 gbps, servers ports were running @
1gbps).
Then it reported something like "endpoint is not connected" and all of
my
VMs on the other volume started to give me IO errors.
Servers load was around 4,6 (total 12 cores)

Maybe it was due to timeout of 2 secs, so I've made it a big higher, 10 sec.

Also during the dd image creation time, VMs very often reported me that
their disks are slow like

WARNINGs: Read IO Wait time is -0.02 (outside range [0:1]).

Is 130MB /sec is the maximum bandwidth for all of the volumes in total?
That why would we need 10g backends?

HW Raid local speed is 300 MB/sec, so it should not be an issue. any ideas
or mby any advices?

Maybe some1 got optimized sysctl.conf for 10G backend?

mine is pretty simple, which can be found from googling.

just to mention: those VM-s were connected using separate 1gbps intraface,
which means, they should not be affected by the client with 10g backend.

logs are pretty useless, they just say this during the outage

[2014-10-13 12:09:18.392910] W [client-handshake.c:276:client_ping_cbk]
0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired

[2014-10-13 12:10:08.389708] C
[client-handshake.c:127:rpc_client_ping_timer_expired]
0-HA-2TB-TT-Proxmox-cluster-client-0: server 10.250.0.1:49159 has not
responded in the last 2 seconds, disconnecting.

[2014-10-13 12:10:08.390312] W [client-handshake.c:276:client_ping_cbk]
0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
so I decided to set the timout a bit higher.

So it seems to me, that under high load GlusterFS is not useable? 130 MB/s
is not that much to get some kind of timeouts or makeing the systme so
slow, that VM-s feeling themselves bad.

Of course, after the disconnection, healing process was started, but as
VM-s lost connection to both of servers, it was pretty useless, they could
not run anymore. and BTW, when u load the server with such huge job (dd of
900GB), healing process goes soooooo slow :)

--
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141013/45fe3a4f/attachment.html>

Pranith Kumar Karampuri

2014-Oct-13 16:09 UTC

head link

[Gluster-users] glusterfs under high load failing?

Could you give your 'gluster volume info' output?

Pranith
On 10/13/2014 09:36 PM, Roman wrote:> Hi,
>
> I've got this kind of setup (servers run replica)
>
>
> @ 10G backend
> gluster storage1
> gluster storage2
> gluster client1
>
> @1g backend
> other gluster clients
>
> Servers got HW RAID5 with SAS disks.
>
> So today I've desided to create a 900GB file for iscsi target that 
> will be located @ glusterfs separate volume, using dd (just a dummy 
> file filled with zeros, bs=1G count 900)
> For the first of all the process took pretty lots of time, the writing 
> speed was 130 MB/sec (client port was 2 gbps, servers ports were 
> running @ 1gbps).
> Then it reported something like "endpoint is not connected" and
all of
> my VMs on the other volume started to give me IO errors.
> Servers load was around 4,6 (total 12 cores)
>
> Maybe it was due to timeout of 2 secs, so I've made it a big higher, 
> 10 sec.
>
> Also during the dd image creation time, VMs very often reported me 
> that their disks are slow like
>
> WARNINGs: Read IO Wait time is -0.02 (outside range [0:1]).
>
> Is 130MB /sec is the maximum bandwidth for all of the volumes in 
> total? That why would we need 10g backends?
>
> HW Raid local speed is 300 MB/sec, so it should not be an issue. any 
> ideas or mby any advices?
>
>
> Maybe some1 got optimized sysctl.conf for 10G backend?
>
> mine is pretty simple, which can be found from googling.
>
>
> just to mention: those VM-s were connected using separate 1gbps 
> intraface, which means, they should not be affected by the client with 
> 10g backend.
>
>
> logs are pretty useless, they just say  this during the outage
>
>
> [2014-10-13 12:09:18.392910] W 
> [client-handshake.c:276:client_ping_cbk] 
> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
>
> [2014-10-13 12:10:08.389708] C 
> [client-handshake.c:127:rpc_client_ping_timer_expired] 
> 0-HA-2TB-TT-Proxmox-cluster-client-0: server 10.250.0.1:49159 
> <http://10.250.0.1:49159> has not responded in the last 2 seconds, 
> disconnecting.
>
> [2014-10-13 12:10:08.390312] W 
> [client-handshake.c:276:client_ping_cbk] 
> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
>
> so I decided to set the timout a bit higher.
>
> So it seems to me, that under high load GlusterFS is not useable? 130 
> MB/s is not that much to get some kind of timeouts or makeing the 
> systme so slow, that VM-s feeling themselves bad.
>
> Of course, after the disconnection, healing process was started, but 
> as VM-s lost connection to both of servers, it was pretty useless, 
> they could not run anymore. and BTW, when u load the server with such 
> huge job (dd of 900GB), healing process goes soooooo slow :)
>
>
>
> -- 
> Best regards,
> Roman.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141013/8730b25a/attachment.html>

Justin Clift

2014-Oct-14 21:27 UTC

head link

[Gluster-users] glusterfs under high load failing?

----- Original Message -----> Hi,
> 
> I've got this kind of setup (servers run replica)
As a thought, just because the info doesn't seem to be in the
posts so far... which version(s) of GlusterFS are you using,
and on which OS's?

Just in case you're not using the latest release(s), and this
is a known bug thing. :)

Regards and best wishes,

Justin Clift

-- 
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

Gluster users - Oct 2014 - glusterfs under high load failing?

[Gluster-users] glusterfs under high load failing?

[Gluster-users] glusterfs under high load failing?

[Gluster-users] glusterfs under high load failing?