thr3ads.net - CentOS - [CentOS] ssh stalls/hangs instead of exiting [Apr 2021]

If this information is useful, please help other people find it:
Share via:

Chris Schanzle

2021-Apr-14 04:32 UTC

[CentOS] ssh stalls/hangs instead of exiting

On 4/13/21 11:36 PM, Chris Schanzle via CentOS wrote:> On 4/13/21 5:00 PM, Frank Cox wrote:
>> On Tue, 13 Apr 2021 22:29:26 +0200
>> Simon Matter wrote:
>>
>>> You could try running strace on the hanging process so see what
it's doing.
>> [frankcox at mutt temp]$ rsync -avv ../temp/ jeff:temp
>> opening connection using: ssh jeff rsync --server -vvlogDtpre.iLsfxC . 
temp  (7 args)>> sending incremental file list
>> delta-transmission enabled
>> abc is uptodate
>> total: matches=0  hash_hits=0  false_alarms=0 data=0
>>
>> Leaving that sit there apparently doing nothing (but still not giving
me my cursor back) I switched to another terminal window and did the following:
>>
>> [frankcox at mutt ~]$ ps -FA | grep rsync
>> frankcox    5400    2435  0 60586  3160   5 14:52 pts/0    00:00:00
rsync -avv ../temp/ jeff:temp
>> frankcox    5401    5400  0 67980  7440   1 14:52 pts/0    00:00:00 ssh
> jeff rsync --server -vvlogDtpre.iLsfxC . temp
>> frankcox    5526    5416  0 55476  1076   3 14:53 pts/1    00:00:00
grep --color=auto rsync
>>
>> [frankcox at mutt ~]$ strace -p 5401
>> strace: Process 5401 attached
>> select(11, [5 9 10], [], NULL, NULL
>>
>> Then it just sits there with no further action.  I get my cursor back
when I hit ctrl-c.
>>
>> [frankcox at mutt ~]$ strace -p 5400
>> strace: Process 5400 attached
>> restart_syscall(<... resuming interrupted nanosleep ...>) = 0
>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>
>> The wait4-etc line just keeps repeating endlessly until I hit ctrl-c.
>>
>> Unfortunately, I have no idea what any of the above actually means. 
Does it tell us anything interesting?
>
> Yay!? I am glad someone else on the planet is experiencing this.? 
> I noticed this started happening to me after updating some CentOS Linux 
8 > systems today.
>
> I discovered if I set ForwardX11=no (either on ssh command line or in ~/.ssh/config) the hang does not happen.? But why does that matter?? No updates
to openssh.>
> It is not the systemd update doing something silly with session
management.? I painfully downgraded manually and rebooted to no effect.?
> As an aside, why can't we we have nice things in life like 'dnf
downgrade
> systemd\*' actually work?? I did the below - might be dumb, but it worked -- alternate suggestions to downgrade are appreciated - searching the
list and my google-fu was off the mark today.>
> ? cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages
> ? dnf downgrade $(rpm -qa systemd\* | grep 239-41.el8_3.2 | sed -e 's/3\.2/3.1/' -e 's/^/.\//' -e
's/$/.rpm/')>
> Chris

[adjusted the subject, hope that is OK.]

Found it!? It's the dbus update to 1.12.8-12.? Downgrade to -11 
and ssh connections close normally.

To clarify the problem, with the new dbus, simple ssh's like:

ssh somehost uptime

will print the uptime, but do not return to the local shell prompt until you hit
ctrl-c.? It works normally if you downgrade dbus or

ssh -o forwardx11=no somehost uptime

I'm sure a bug report exists somewhere, but that's something to dig for
or create tomorrow.

To downgrade, packages were scattered in different locations, so I copied 
them to one directory and did

dnf downgrade ./*

The packages I needed to downgrade on a? x86_64 system were:

dbus-1.12.8-11.el8.x86_64.rpm
dbus-common-1.12.8-11.el8.noarch.rpm
dbus-daemon-1.12.8-11.el8.x86_64.rpm
dbus-devel-1.12.8-11.el8.x86_64.rpm
dbus-libs-1.12.8-11.el8.x86_64.rpm
dbus-tools-1.12.8-11.el8.x86_64.rpm
dbus-x11-1.12.8-11.el8.x86_64.rpm

Simon Matter

2021-Apr-14 06:10 UTC

head link

[CentOS] ssh stalls/hangs instead of exiting

> On 4/13/21 11:36 PM, Chris Schanzle via CentOS wrote:
>> On 4/13/21 5:00 PM, Frank Cox wrote:
>>> On Tue, 13 Apr 2021 22:29:26 +0200
>>> Simon Matter wrote:
>>>
>>>> You could try running strace on the hanging process so see what
it's
>>>> doing.
>>> [frankcox at mutt temp]$ rsync -avv ../temp/ jeff:temp
>>> opening connection using: ssh jeff rsync --server
-vvlogDtpre.iLsfxC .
> temp  (7 args)
>>> sending incremental file list
>>> delta-transmission enabled
>>> abc is uptodate
>>> total: matches=0  hash_hits=0  false_alarms=0 data=0
>>>
>>> Leaving that sit there apparently doing nothing (but still not
giving
>>> me my cursor back) I switched to another terminal window and did
the
>>> following:
>>>
>>> [frankcox at mutt ~]$ ps -FA | grep rsync
>>> frankcox    5400    2435  0 60586  3160   5 14:52 pts/0    00:00:00
>>> rsync -avv ../temp/ jeff:temp
>>> frankcox    5401    5400  0 67980  7440   1 14:52 pts/0    00:00:00
ssh
>> jeff rsync --server -vvlogDtpre.iLsfxC . temp
>>> frankcox    5526    5416  0 55476  1076   3 14:53 pts/1    00:00:00
>>> grep --color=auto rsync
>>>
>>> [frankcox at mutt ~]$ strace -p 5401
>>> strace: Process 5401 attached
>>> select(11, [5 9 10], [], NULL, NULL
>>>
>>> Then it just sits there with no further action.  I get my cursor
back
>>> when I hit ctrl-c.
>>>
>>> [frankcox at mutt ~]$ strace -p 5400
>>> strace: Process 5400 attached
>>> restart_syscall(<... resuming interrupted nanosleep ...>) = 0
>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>>>
>>> The wait4-etc line just keeps repeating endlessly until I hit
ctrl-c.
>>>
>>> Unfortunately, I have no idea what any of the above actually means.
>>> Does it tell us anything interesting?
>>
>> Yay!? I am glad someone else on the planet is experiencing this.?
>> I noticed this started happening to me after updating some CentOS Linux
> 8
>> systems today.
>>
>> I discovered if I set ForwardX11=no (either on ssh command line or in
> ~/.ssh/config) the hang does not happen.? But why does that matter?? No
> updates to openssh.
>>
>> It is not the systemd update doing something silly with session
>> management.? I painfully downgraded manually and rebooted to no
effect.?
>
>> As an aside, why can't we we have nice things in life like 'dnf
>> downgrade
>> systemd\*' actually work?? I did the below - might be dumb, but it
> worked -- alternate suggestions to downgrade are appreciated - searching
> the list and my google-fu was off the mark today.
>>
>> ? cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages
>> ? dnf downgrade $(rpm -qa systemd\* | grep 239-41.el8_3.2 | sed -e
> 's/3\.2/3.1/' -e 's/^/.\//' -e 's/$/.rpm/')
>>
>> Chris
>
>
> [adjusted the subject, hope that is OK.]
>
> Found it!? It's the dbus update to 1.12.8-12.? Downgrade to -11
> and ssh connections close normally.
>
> To clarify the problem, with the new dbus, simple ssh's like:
>
> ssh somehost uptime
>
> will print the uptime, but do not return to the local shell prompt until
> you hit ctrl-c.? It works normally if you downgrade dbus or
>
> ssh -o forwardx11=no somehost uptime
>
> I'm sure a bug report exists somewhere, but that's something to dig
for or
> create tomorrow.
>
> To downgrade, packages were scattered in different locations, so I copied
> them to one directory and did
>
> dnf downgrade ./*
>
> The packages I needed to downgrade on a? x86_64 system were:
>
> dbus-1.12.8-11.el8.x86_64.rpm
> dbus-common-1.12.8-11.el8.noarch.rpm
> dbus-daemon-1.12.8-11.el8.x86_64.rpm
> dbus-devel-1.12.8-11.el8.x86_64.rpm
> dbus-libs-1.12.8-11.el8.x86_64.rpm
> dbus-tools-1.12.8-11.el8.x86_64.rpm
> dbus-x11-1.12.8-11.el8.x86_64.rpm
Now that's really interesting, I was wondering why I don't see this on
OL8. The thing is that certain OL8 packages have an additional RPM
revision added like .0.1. Just checked dbus and its changelog shows:

* Tue Feb 16 2021 Kevin Lyons <kevin.x.lyons at oracle.com> -1.12.8-12.0.1
- bus: raise fd limits before dropping privs [Orabug: 31175643]
- fix netlink poll: error 4 (Zhenzhong Duan)

So OL is defnitly not 100% bug to bug compatible like the other clones :-)

And it makes me a bit worried why O* fixed this on Feb 16 and the broken
dbus packages are now (in April) installed on CentOS servers?

Regards,
Simon

CentOS - Apr 2021 - ssh stalls/hangs instead of exiting

[CentOS] ssh stalls/hangs instead of exiting

[CentOS] ssh stalls/hangs instead of exiting