thr3ads.net - Gluster users - [Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Yuhao Zhang

2018-Aug-05 07:52 UTC

[Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes

This is a semi-production server and I can't bring it down right now. Will
try to get the monitoring output when I get a chance.

As I recall, the high CPU processes are brick daemons (glusterfsd) and htop
showed they were in status D. However, I saw zero zpool IO as clients were all
hanging.
> On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowdapp at
redhat.com> wrote:
> 
> 
> 
> On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzyzxd at gmail.com
<mailto:zzyzxd at gmail.com>> wrote:
> Hi,
> 
> I am running into a situation that heavy write causes Gluster server went
into zombie with many high CPU processes and all clients hangs, it is almost
100% reproducible on my machine. Hope someone can help.
> 
> Can you give us the output of monitioring these processes with High cpu
usage captured in the duration when your tests are running?
> 
> MON_INTERVAL=10 # can be increased for very long runs
> top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU utilization
by process
> top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU utilization
by thread
> 
> 
> I started to observe this issue when running rsync to copy files from
another server and I thought it might be because Gluster doesn't like
rsync's delta transfer with a lot of small writes. However, I was able to
reproduce this with "rsync --whole-file --inplace", or even with cp or
scp. It usually appears after starting the transfer for a few hours, but
sometimes can happen within several minutes.
> 
> Since this is a single node Gluster distributed volume, I tried to transfer
files directly onto the server bypassing Gluster clients, but it still caused
the same issue.
> 
> It is running on top of a ZFS RAIDZ2 dataset. Options are attached. Also, I
attached the statedump generated when my clients hung, and volume options.
> 
> - Ubuntu 16.04 x86_64 / 4.4.0-116-generic
> - GlusterFS 3.12.8
> 
> Thank you,
> Yuhao
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180805/874a2610/attachment.html>

Raghavendra Gowdappa

2018-Aug-05 07:55 UTC

head link

[Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes

On Sun, Aug 5, 2018 at 1:22 PM, Yuhao Zhang <zzyzxd at gmail.com> wrote:
> This is a semi-production server and I can't bring it down right now.
Will
> try to get the monitoring output when I get a chance.
>
Collecting top output doesn't require to bring down servers.

> As I recall, the high CPU processes are brick daemons (glusterfsd) and
> htop showed they were in status D. However, I saw zero zpool IO as clients
> were all hanging.
>
>
> On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowdapp at
redhat.com>
> wrote:
>
>
>
> On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzyzxd at gmail.com>
wrote:
>
>> Hi,
>>
>> I am running into a situation that heavy write causes Gluster server
went
>> into zombie with many high CPU processes and all clients hangs, it is
>> almost 100% reproducible on my machine. Hope someone can help.
>>
>
> Can you give us the output of monitioring these processes with High cpu
> usage captured in the duration when your tests are running?
>
> - MON_INTERVAL=10 # can be increased for very long runs
> - top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU
utilization
> by process
> - top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU
utilization
> by thread
>
>>
>> I started to observe this issue when running rsync to copy files from
>> another server and I thought it might be because Gluster doesn't
like
>> rsync's delta transfer with a lot of small writes. However, I was
able to
>> reproduce this with "rsync --whole-file --inplace", or even
with cp or scp.
>> It usually appears after starting the transfer for a few hours, but
>> sometimes can happen within several minutes.
>>
>> Since this is a single node Gluster distributed volume, I tried to
>> transfer files directly onto the server bypassing Gluster clients, but
it
>> still caused the same issue.
>>
>> It is running on top of a ZFS RAIDZ2 dataset. Options are attached.
Also,
>> I attached the statedump generated when my clients hung, and volume
options.
>>
>> - Ubuntu 16.04 x86_64 / 4.4.0-116-generic
>> - GlusterFS 3.12.8
>>
>> Thank you,
>> Yuhao
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180805/db7d912d/attachment.html>

Gluster users - Aug 2018 - Gluster High CPU/Clients Hanging on Heavy Writes

[Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes

[Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes