Yuhao Zhang
2018-Aug-05 07:59 UTC
[Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes
Sorry, what I meant was, if I start the transfer now and get glusterd into zombie status, it's unlikely that I can fully recover the server without a reboot.> On Aug 5, 2018, at 02:55, Raghavendra Gowdappa <rgowdapp at redhat.com> wrote: > > > > On Sun, Aug 5, 2018 at 1:22 PM, Yuhao Zhang <zzyzxd at gmail.com <mailto:zzyzxd at gmail.com>> wrote: > This is a semi-production server and I can't bring it down right now. Will try to get the monitoring output when I get a chance. > > Collecting top output doesn't require to bring down servers. > > > As I recall, the high CPU processes are brick daemons (glusterfsd) and htop showed they were in status D. However, I saw zero zpool IO as clients were all hanging. > > >> On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowdapp at redhat.com <mailto:rgowdapp at redhat.com>> wrote: >> >> >> >> On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzyzxd at gmail.com <mailto:zzyzxd at gmail.com>> wrote: >> Hi, >> >> I am running into a situation that heavy write causes Gluster server went into zombie with many high CPU processes and all clients hangs, it is almost 100% reproducible on my machine. Hope someone can help. >> >> Can you give us the output of monitioring these processes with High cpu usage captured in the duration when your tests are running? >> >> MON_INTERVAL=10 # can be increased for very long runs >> top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU utilization by process >> top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU utilization by thread >> >> >> I started to observe this issue when running rsync to copy files from another server and I thought it might be because Gluster doesn't like rsync's delta transfer with a lot of small writes. However, I was able to reproduce this with "rsync --whole-file --inplace", or even with cp or scp. It usually appears after starting the transfer for a few hours, but sometimes can happen within several minutes. >> >> Since this is a single node Gluster distributed volume, I tried to transfer files directly onto the server bypassing Gluster clients, but it still caused the same issue. >> >> It is running on top of a ZFS RAIDZ2 dataset. Options are attached. Also, I attached the statedump generated when my clients hung, and volume options. >> >> - Ubuntu 16.04 x86_64 / 4.4.0-116-generic >> - GlusterFS 3.12.8 >> >> Thank you, >> Yuhao >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180805/34acddc9/attachment.html>
Raghavendra Gowdappa
2018-Aug-05 08:03 UTC
[Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes
On Sun, Aug 5, 2018 at 1:29 PM, Yuhao Zhang <zzyzxd at gmail.com> wrote:> Sorry, what I meant was, if I start the transfer now and get glusterd into > zombie status, it's unlikely that I can fully recover the server without a > reboot. >I missed it. Thanks for the explanation :).> > On Aug 5, 2018, at 02:55, Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > > > > On Sun, Aug 5, 2018 at 1:22 PM, Yuhao Zhang <zzyzxd at gmail.com> wrote: > >> This is a semi-production server and I can't bring it down right now. >> Will try to get the monitoring output when I get a chance. >> > > Collecting top output doesn't require to bring down servers. > > >> As I recall, the high CPU processes are brick daemons (glusterfsd) and >> htop showed they were in status D. However, I saw zero zpool IO as clients >> were all hanging. >> >> >> On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowdapp at redhat.com> >> wrote: >> >> >> >> On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzyzxd at gmail.com> wrote: >> >>> Hi, >>> >>> I am running into a situation that heavy write causes Gluster server >>> went into zombie with many high CPU processes and all clients hangs, it is >>> almost 100% reproducible on my machine. Hope someone can help. >>> >> >> Can you give us the output of monitioring these processes with High cpu >> usage captured in the duration when your tests are running? >> >> - MON_INTERVAL=10 # can be increased for very long runs >> - top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU >> utilization by process >> - top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU >> utilization by thread >> >>> >>> I started to observe this issue when running rsync to copy files from >>> another server and I thought it might be because Gluster doesn't like >>> rsync's delta transfer with a lot of small writes. However, I was able to >>> reproduce this with "rsync --whole-file --inplace", or even with cp or scp. >>> It usually appears after starting the transfer for a few hours, but >>> sometimes can happen within several minutes. >>> >>> Since this is a single node Gluster distributed volume, I tried to >>> transfer files directly onto the server bypassing Gluster clients, but it >>> still caused the same issue. >>> >>> It is running on top of a ZFS RAIDZ2 dataset. Options are attached. >>> Also, I attached the statedump generated when my clients hung, and volume >>> options. >>> >>> - Ubuntu 16.04 x86_64 / 4.4.0-116-generic >>> - GlusterFS 3.12.8 >>> >>> Thank you, >>> Yuhao >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180805/29df84c0/attachment.html>
Atin Mukherjee
2018-Aug-05 13:39 UTC
[Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes
On Sun, 5 Aug 2018 at 13:29, Yuhao Zhang <zzyzxd at gmail.com> wrote:> Sorry, what I meant was, if I start the transfer now and get glusterd into > zombie status, >glusterd or glusterfsd? it's unlikely that I can fully recover the server without a reboot.> > > On Aug 5, 2018, at 02:55, Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > > > > On Sun, Aug 5, 2018 at 1:22 PM, Yuhao Zhang <zzyzxd at gmail.com> wrote: > >> This is a semi-production server and I can't bring it down right now. >> Will try to get the monitoring output when I get a chance. >> > > Collecting top output doesn't require to bring down servers. > > >> As I recall, the high CPU processes are brick daemons (glusterfsd) and >> htop showed they were in status D. However, I saw zero zpool IO as clients >> were all hanging. >> >> >> On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowdapp at redhat.com> >> wrote: >> >> >> >> On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzyzxd at gmail.com> wrote: >> >>> Hi, >>> >>> I am running into a situation that heavy write causes Gluster server >>> went into zombie with many high CPU processes and all clients hangs, it is >>> almost 100% reproducible on my machine. Hope someone can help. >>> >> >> Can you give us the output of monitioring these processes with High cpu >> usage captured in the duration when your tests are running? >> >> >> - MON_INTERVAL=10 # can be increased for very long runs >> - top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU >> utilization by process >> - top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU >> utilization by thread >> >> >> >>> I started to observe this issue when running rsync to copy files from >>> another server and I thought it might be because Gluster doesn't like >>> rsync's delta transfer with a lot of small writes. However, I was able to >>> reproduce this with "rsync --whole-file --inplace", or even with cp or scp. >>> It usually appears after starting the transfer for a few hours, but >>> sometimes can happen within several minutes. >>> >>> Since this is a single node Gluster distributed volume, I tried to >>> transfer files directly onto the server bypassing Gluster clients, but it >>> still caused the same issue. >>> >>> It is running on top of a ZFS RAIDZ2 dataset. Options are attached. >>> Also, I attached the statedump generated when my clients hung, and volume >>> options. >>> >>> - Ubuntu 16.04 x86_64 / 4.4.0-116-generic >>> - GlusterFS 3.12.8 >>> >>> Thank you, >>> Yuhao >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180805/d1bdf22e/attachment.html>