David Cunningham
2020-Aug-25 03:24 UTC
[Gluster-users] Geo-replication log file not closed
Hello, We're having an issue with the rotated gsyncd.log not being released. Here's the output of 'lsof': # lsof | grep 'gsyncd.log.1' python2 4495 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4495 4496 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4495 4507 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4508 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4508 root 5w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4508 4511 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) ... etc... Those processes are: # ps -ef | egrep '4495|4508' root 4495 1 0 Aug10 ? 00:00:59 /usr/bin/python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py --path=/nodirectwritedata/gluster/gvol0 --monitor -c /var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf --iprefix=/var :gvol0 --glusterd-uuid=b7521445-ee93-4fed-8ced-6a609fa8c7d4 nvfs10::gvol0 root 4508 4495 0 Aug10 ? 00:01:56 python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py agent gvol0 nvfs10::gvol0 --local-path /nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 9,12,11,10 And here's the relevant part of the /etc/logrotate.d/glusterfs-georep script: /var/log/glusterfs/geo-replication/*/*.log { sharedscripts rotate 52 missingok compress delaycompress notifempty postrotate for pid in `ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | awk '{print $2}'`; do /usr/bin/kill -HUP $pid > /dev/null 2>&1 || true done endscript } If I run the postrotate part manually: # ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | awk '{print $2}' 4520 # ps -aef | grep 4520 root 4520 1 0 Aug10 ? 01:24:23 /usr/sbin/glusterfs --aux-gfid-mount --acl --log-level=INFO --log-file=/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/mnt-nodirectwritedata-gluster-gvol0.log --volfile-server=localhost --volfile-id=gvol0 --client-pid=-1 /tmp/gsyncd-aux-mount-Tq_3sU Perhaps the problem is that the kill -HUP in the logrotate script doesn't act on the right process? If so, does anyone have a command to get the right PID? Thanks in advance for any help. -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200825/fa5213f6/attachment.html>
David Cunningham
2020-Aug-31 04:11 UTC
[Gluster-users] Geo-replication log file not closed
Hello all, Apparently we don't want to "kill -HUP" the two processes that have rotated log file still open: root 4495 1 0 Aug10 ? 00:00:59 /usr/bin/python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py --path=/nodirectwritedata/gluster/gvol0 --monitor -c /var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf --iprefix=/var :gvol0 --glusterd-uuid=b7521445-ee93-4fed-8ced-6a609fa8c7d4 nvfs10::gvol0 root 4508 4495 0 Aug10 ? 00:01:56 python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py agent gvol0 nvfs10::gvol0 --local-path /nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 9,12,11,10 ... a kill -HUP on those processes stops them rather than re-opening the log file. Does anyone know if these processes are supposed to have gsyncd.log open? If so, how do we tell them to close and re-open their file handle? Thanks in advance! On Tue, 25 Aug 2020 at 15:24, David Cunningham <dcunningham at voisonics.com> wrote:> Hello, > > We're having an issue with the rotated gsyncd.log not being released. > Here's the output of 'lsof': > > # lsof | grep 'gsyncd.log.1' > python2 4495 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4495 4496 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4495 4507 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4508 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4508 root 5w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4508 4511 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > ... etc... > > Those processes are: > # ps -ef | egrep '4495|4508' > root 4495 1 0 Aug10 ? 00:00:59 /usr/bin/python2 > /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py > --path=/nodirectwritedata/gluster/gvol0 --monitor -c > /var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf > --iprefix=/var :gvol0 --glusterd-uuid=b7521445-ee93-4fed-8ced-6a609fa8c7d4 > nvfs10::gvol0 > root 4508 4495 0 Aug10 ? 00:01:56 python2 > /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py agent gvol0 > nvfs10::gvol0 --local-path /nodirectwritedata/gluster/gvol0 --local-node > cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id > cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 9,12,11,10 > > And here's the relevant part of the /etc/logrotate.d/glusterfs-georep > script: > > /var/log/glusterfs/geo-replication/*/*.log { > sharedscripts > rotate 52 > missingok > compress > delaycompress > notifempty > postrotate > for pid in `ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | > awk '{print $2}'`; do > /usr/bin/kill -HUP $pid > /dev/null 2>&1 || true > done > endscript > } > > If I run the postrotate part manually: > # ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | awk '{print $2}' > 4520 > > # ps -aef | grep 4520 > root 4520 1 0 Aug10 ? 01:24:23 /usr/sbin/glusterfs > --aux-gfid-mount --acl --log-level=INFO > --log-file=/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/mnt-nodirectwritedata-gluster-gvol0.log > --volfile-server=localhost --volfile-id=gvol0 --client-pid=-1 > /tmp/gsyncd-aux-mount-Tq_3sU > > Perhaps the problem is that the kill -HUP in the logrotate script doesn't > act on the right process? If so, does anyone have a command to get the > right PID? > > Thanks in advance for any help. > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 >-- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200831/24631ee9/attachment.html>