thr3ads.net - CentOS - [CentOS] ssh stalls/hangs instead of exiting [Apr 2021]

If this information is useful, please help other people find it:
Share via:

Simon Matter

2021-Apr-14 06:16 UTC

[CentOS] ssh stalls/hangs instead of exiting

>> On 4/13/21 11:36 PM, Chris Schanzle via CentOS wrote:
>>> On 4/13/21 5:00 PM, Frank Cox wrote:
>>>> On Tue, 13 Apr 2021 22:29:26 +0200
>>>> Simon Matter wrote:
>>>>
>>>>> You could try running strace on the hanging process so see
what it's
>>>>> doing.
>>>> [frankcox at mutt temp]$ rsync -avv ../temp/ jeff:temp
>>>> opening connection using: ssh jeff rsync --server
-vvlogDtpre.iLsfxC .
>> temp  (7 args)
>>>> sending incremental file list
>>>> delta-transmission enabled
>>>> abc is uptodate
>>>> total: matches=0  hash_hits=0  false_alarms=0 data=0
>>>>
>>>> Leaving that sit there apparently doing nothing (but still not
giving
>>>> me my cursor back) I switched to another terminal window and
did the
>>>> following:
>>>>
>>>> [frankcox at mutt ~]$ ps -FA | grep rsync
>>>> frankcox    5400    2435  0 60586  3160   5 14:52 pts/0   
00:00:00
>>>> rsync -avv ../temp/ jeff:temp
>>>> frankcox    5401    5400  0 67980  7440   1 14:52 pts/0   
00:00:00
>>>> ssh
>>> jeff rsync --server -vvlogDtpre.iLsfxC . temp
>>>> frankcox    5526    5416  0 55476  1076   3 14:53 pts/1   
00:00:00
>>>> grep --color=auto rsync
>>>>
>>>> [frankcox at mutt ~]$ strace -p 5401
>>>> strace: Process 5401 attached
>>>> select(11, [5 9 10], [], NULL, NULL
>>>>
>>>> Then it just sits there with no further action.  I get my
cursor back
>>>> when I hit ctrl-c.
>>>>
>>>> [frankcox at mutt ~]$ strace -p 5400
>>>> strace: Process 5400 attached
>>>> restart_syscall(<... resuming interrupted nanosleep ...>)
= 0
>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>>
>>>> The wait4-etc line just keeps repeating endlessly until I hit
ctrl-c.
>>>>
>>>> Unfortunately, I have no idea what any of the above actually
means.
>>>> Does it tell us anything interesting?
>>>
>>> Yay!? I am glad someone else on the planet is experiencing this.?
>>> I noticed this started happening to me after updating some CentOS
Linux
>> 8
>>> systems today.
>>>
>>> I discovered if I set ForwardX11=no (either on ssh command line or
in
>> ~/.ssh/config) the hang does not happen.? But why does that matter?? No
>> updates to openssh.
>>>
>>> It is not the systemd update doing something silly with session
>>> management.? I painfully downgraded manually and rebooted to no
>>> effect.?
>>
>>> As an aside, why can't we we have nice things in life like
'dnf
>>> downgrade
>>> systemd\*' actually work?? I did the below - might be dumb, but
it
>> worked -- alternate suggestions to downgrade are appreciated -
searching
>> the list and my google-fu was off the mark today.
>>>
>>> ? cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages
>>> ? dnf downgrade $(rpm -qa systemd\* | grep 239-41.el8_3.2 | sed -e
>> 's/3\.2/3.1/' -e 's/^/.\//' -e 's/$/.rpm/')
>>>
>>> Chris
>>
>>
>> [adjusted the subject, hope that is OK.]
>>
>> Found it!? It's the dbus update to 1.12.8-12.? Downgrade to -11
>> and ssh connections close normally.
>>
>> To clarify the problem, with the new dbus, simple ssh's like:
>>
>> ssh somehost uptime
>>
>> will print the uptime, but do not return to the local shell prompt
until
>> you hit ctrl-c.? It works normally if you downgrade dbus or
>>
>> ssh -o forwardx11=no somehost uptime
>>
>> I'm sure a bug report exists somewhere, but that's something to
dig for
>> or
>> create tomorrow.
>>
>> To downgrade, packages were scattered in different locations, so I
>> copied
>> them to one directory and did
>>
>> dnf downgrade ./*
>>
>> The packages I needed to downgrade on a? x86_64 system were:
>>
>> dbus-1.12.8-11.el8.x86_64.rpm
>> dbus-common-1.12.8-11.el8.noarch.rpm
>> dbus-daemon-1.12.8-11.el8.x86_64.rpm
>> dbus-devel-1.12.8-11.el8.x86_64.rpm
>> dbus-libs-1.12.8-11.el8.x86_64.rpm
>> dbus-tools-1.12.8-11.el8.x86_64.rpm
>> dbus-x11-1.12.8-11.el8.x86_64.rpm
>
> Now that's really interesting, I was wondering why I don't see this
on
> OL8. The thing is that certain OL8 packages have an additional RPM
> revision added like .0.1. Just checked dbus and its changelog shows:
>
> * Tue Feb 16 2021 Kevin Lyons <kevin.x.lyons at oracle.com>
-1.12.8-12.0.1
> - bus: raise fd limits before dropping privs [Orabug: 31175643]
> - fix netlink poll: error 4 (Zhenzhong Duan)
>
> So OL is defnitly not 100% bug to bug compatible like the other clones :-)
>
> And it makes me a bit worried why O* fixed this on Feb 16 and the broken
> dbus packages are now (in April) installed on CentOS servers?
Sorry, maybe I'm wrong here and the OL8 addons are fixing other things?
Could someone who experiences the issue test with the OL8 dbus packages?

Thanks,
Simon

Simon Matter

2021-Apr-14 06:22 UTC

head link

[CentOS] ssh stalls/hangs instead of exiting

>>> On 4/13/21 11:36 PM, Chris Schanzle via CentOS wrote:
>>>> On 4/13/21 5:00 PM, Frank Cox wrote:
>>>>> On Tue, 13 Apr 2021 22:29:26 +0200
>>>>> Simon Matter wrote:
>>>>>
>>>>>> You could try running strace on the hanging process so
see what it's
>>>>>> doing.
>>>>> [frankcox at mutt temp]$ rsync -avv ../temp/ jeff:temp
>>>>> opening connection using: ssh jeff rsync --server
-vvlogDtpre.iLsfxC
>>>>> .
>>> temp  (7 args)
>>>>> sending incremental file list
>>>>> delta-transmission enabled
>>>>> abc is uptodate
>>>>> total: matches=0  hash_hits=0  false_alarms=0 data=0
>>>>>
>>>>> Leaving that sit there apparently doing nothing (but still
not giving
>>>>> me my cursor back) I switched to another terminal window
and did the
>>>>> following:
>>>>>
>>>>> [frankcox at mutt ~]$ ps -FA | grep rsync
>>>>> frankcox    5400    2435  0 60586  3160   5 14:52 pts/0   
00:00:00
>>>>> rsync -avv ../temp/ jeff:temp
>>>>> frankcox    5401    5400  0 67980  7440   1 14:52 pts/0   
00:00:00
>>>>> ssh
>>>> jeff rsync --server -vvlogDtpre.iLsfxC . temp
>>>>> frankcox    5526    5416  0 55476  1076   3 14:53 pts/1   
00:00:00
>>>>> grep --color=auto rsync
>>>>>
>>>>> [frankcox at mutt ~]$ strace -p 5401
>>>>> strace: Process 5401 attached
>>>>> select(11, [5 9 10], [], NULL, NULL
>>>>>
>>>>> Then it just sits there with no further action.  I get my
cursor back
>>>>> when I hit ctrl-c.
>>>>>
>>>>> [frankcox at mutt ~]$ strace -p 5400
>>>>> strace: Process 5400 attached
>>>>> restart_syscall(<... resuming interrupted nanosleep
...>) = 0
>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>>>
>>>>> The wait4-etc line just keeps repeating endlessly until I
hit ctrl-c.
>>>>>
>>>>> Unfortunately, I have no idea what any of the above
actually means.
>>>>> Does it tell us anything interesting?
>>>>
>>>> Yay!? I am glad someone else on the planet is experiencing
this.?
>>>> I noticed this started happening to me after updating some
CentOS
>>>> Linux
>>> 8
>>>> systems today.
>>>>
>>>> I discovered if I set ForwardX11=no (either on ssh command line
or in
>>> ~/.ssh/config) the hang does not happen.? But why does that
matter?? No
>>> updates to openssh.
>>>>
>>>> It is not the systemd update doing something silly with session
>>>> management.? I painfully downgraded manually and rebooted to no
>>>> effect.?
>>>
>>>> As an aside, why can't we we have nice things in life like
'dnf
>>>> downgrade
>>>> systemd\*' actually work?? I did the below - might be dumb,
but it
>>> worked -- alternate suggestions to downgrade are appreciated -
>>> searching
>>> the list and my google-fu was off the mark today.
>>>>
>>>> ? cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages
>>>> ? dnf downgrade $(rpm -qa systemd\* | grep 239-41.el8_3.2 | sed
-e
>>> 's/3\.2/3.1/' -e 's/^/.\//' -e 's/$/.rpm/')
>>>>
>>>> Chris
>>>
>>>
>>> [adjusted the subject, hope that is OK.]
>>>
>>> Found it!? It's the dbus update to 1.12.8-12.? Downgrade to -11
>>> and ssh connections close normally.
>>>
>>> To clarify the problem, with the new dbus, simple ssh's like:
>>>
>>> ssh somehost uptime
>>>
>>> will print the uptime, but do not return to the local shell prompt
>>> until
>>> you hit ctrl-c.? It works normally if you downgrade dbus or
>>>
>>> ssh -o forwardx11=no somehost uptime
>>>
>>> I'm sure a bug report exists somewhere, but that's
something to dig for
>>> or
>>> create tomorrow.
>>>
>>> To downgrade, packages were scattered in different locations, so I
>>> copied
>>> them to one directory and did
>>>
>>> dnf downgrade ./*
>>>
>>> The packages I needed to downgrade on a? x86_64 system were:
>>>
>>> dbus-1.12.8-11.el8.x86_64.rpm
>>> dbus-common-1.12.8-11.el8.noarch.rpm
>>> dbus-daemon-1.12.8-11.el8.x86_64.rpm
>>> dbus-devel-1.12.8-11.el8.x86_64.rpm
>>> dbus-libs-1.12.8-11.el8.x86_64.rpm
>>> dbus-tools-1.12.8-11.el8.x86_64.rpm
>>> dbus-x11-1.12.8-11.el8.x86_64.rpm
>>
>> Now that's really interesting, I was wondering why I don't see
this on
>> OL8. The thing is that certain OL8 packages have an additional RPM
>> revision added like .0.1. Just checked dbus and its changelog shows:
>>
>> * Tue Feb 16 2021 Kevin Lyons <kevin.x.lyons at oracle.com>
-1.12.8-12.0.1
>> - bus: raise fd limits before dropping privs [Orabug: 31175643]
>> - fix netlink poll: error 4 (Zhenzhong Duan)
>>
>> So OL is defnitly not 100% bug to bug compatible like the other clones
>> :-)
>>
>> And it makes me a bit worried why O* fixed this on Feb 16 and the
broken
>> dbus packages are now (in April) installed on CentOS servers?
>
> Sorry, maybe I'm wrong here and the OL8 addons are fixing other things?
> Could someone who experiences the issue test with the OL8 dbus packages?
>
Could it be BZ #1940067?

https://bugzilla.redhat.com/show_bug.cgi?id=1940067

Regards,
Simon

CentOS - Apr 2021 - ssh stalls/hangs instead of exiting

[CentOS] ssh stalls/hangs instead of exiting

[CentOS] ssh stalls/hangs instead of exiting