On 18/08/2021 11:02, Darren Tucker wrote:> I have not been able to reproduce this.? I've tried:
> ?- disabling HAVE_PSELECT on a Linux system,
> ?- disabling HAVE_PSELECT on a 32bit Solaris 10 VM
> ?- disabling HAVE_PSELECT on a 64bit Solaris 11 VM
> ?- restoring an old Solaris 7 backup onto a qemu 32bit sparc VM
>
> Can I get some more details?? Compiler, OpenSSL version, configure
> options, exact command used to invoke the test?? Oh, and are they
> multiprocessor systems (maybe it's a race)?
>
The Solaris 7 system has:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/tgcware/libexec/gcc/sparc-sun-solaris2.7/4.5.4/lto-wrapper
Target: sparc-sun-solaris2.7
Configured with: ../gcc-4.5.4/configure --enable-obsolete
--prefix=/usr/tgcware --with-local-prefix=/usr/tgcware/gcc45
--bindir=/usr/tgcware/gcc45/bin --mandir=/usr/tgcware/gcc45/man
--infodir=/usr/tgcware/gcc45/info --disable-nls --enable-shared
--enable-threads=posix --with-gmp=/usr/tgcware --with-mpfr=/usr/tgcware
--with-mpc=/usr/tgcware --with-cloog=/usr/tgcware
--with-ppl=/usr/tgcware --without-gnu-ld --with-ld=/usr/ccs/bin/ld
--with-gnu-as --with-as=/usr/tgcware/bin/gas
--enable-languages=all,ada,obj-c++ --with-x --enable-java-awt=xlib
--with-cpu=v7
Thread model: posix
gcc version 4.5.4 (tgcware 4.5.4-2)
$ openssl version
OpenSSL 1.0.2u 20 Dec 2019
$ file /usr/tgcware/lib/libssl.so
/usr/tgcware/lib/libssl.so: ELF 32-bit MSB dynamic lib SPARC32PLUS
Version 1, V8+ Required, dynamically linked, stripped
$
The system is a multi-processor system with 4x336Mhz US-II cpus.
The Solaris 9 system has:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/tgcware/libexec/gcc/sparc-sun-solaris2.9/4.9.4/lto-wrapper
Target: sparc-sun-solaris2.9
Configured with: ../gcc-4.9.4/configure --enable-obsolete
--prefix=/usr/tgcware --with-local-prefix=/usr/tgcware/gcc49
--bindir=/usr/tgcware/gcc49/bin --mandir=/usr/tgcware/gcc49/man
--infodir=/usr/tgcware/gcc49/info --disable-nls --enable-shared
--enable-threads=posix --with-gmp=/usr/tgcware --with-mpfr=/usr/tgcware
--with-mpc=/usr/tgcware --with-cloog=/usr/tgcware
--with-isl=/usr/tgcware --with-cloog-backend=isl --without-gnu-ld
--with-ld=/usr/ccs/bin/ld --with-gnu-as --with-as=/usr/tgcware/bin/gas
--enable-languages=all,ada,obj-c++,go --with-x --enable-java-awt=xlib
--with-cpu=v9 --with-pkgversion='tgcware 4.9.4-1'
--with-bugurl=http://jupiterrise.com/tgcware
Thread model: posix
gcc version 4.9.4 (tgcware 4.9.4-1)
$ openssl version
OpenSSL 1.1.1k 25 Mar 2021
$ file /usr/tgcware/lib/libssl.so
/usr/tgcware/lib/libssl.so: ELF 32-bit MSB dynamic lib SPARC32PLUS
Version 1, V8+ Required, dynamically linked, stripped
$
The OS is running in a branded zone under Solaris 10 and the host system
is a multi-processor system with 4x900Mhz US-III+ cpus.
On both systems for the purposes of testing I am building openssh like this:
./configure CC=gcc LDFLAGS="-L/usr/tgcware/lib -R/usr/tgcware/lib"
CPPFLAGS="-I/usr/tgcware/include" --prefix=/tmp/ossh
make -j4
Then running the testsuite with 'make tests' or for just the rekey tests
'make tests LTESTS=rekey SKIP_UNIT=1'
I don't have any single processor SPARC systems I can test with but I
can off-line cpus. I just did that on the Solaris 7 system and with just
a single cpu online and no revert the rekey test ran to completion with
no hangs.
> Also a copy of the ssh.log and sshd.log from a hung instance (off-list
> is fine)?
>
This is from the Solaris 9 system with all 4 cpus online.
It hung almost immediately:
make[1]: Entering directory
`/export/home/tgc/buildpkg/openssh/src/openssh-git/regress'
run test rekey.sh ...
client rekey KexAlgorithms=diffie-hellman-group1-sha1
client rekey KexAlgorithms=diffie-hellman-group14-sha1
client rekey KexAlgorithms=diffie-hellman-group14-sha256
At this point all ssh(d) processses are idle.
I've uploaded the logs here:
https://jupiterrise.com/tmp/?C=M;O=D
They should be at the top of the list.
>>? ? ?the other, and then a <defunct> child of the still running
child.
>>? ? ?With truss I see that the client is still doing poll().
>
> if you truss the sshd that's still alive and hung what's it doing?
>
From ps these are the relevant processes:
F UID PID PPID %C PRI NI SZ RSS WCHAN S TT TIME COMMAND
0 3000 27640 27639 0 59 20 5376 3040 300d5f53020 S pts/13 0:00
/bin/bash
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/test-exec.sh
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/rekey.sh
0 3000 27640 27639 0 59 20 5376 3040 300d5f53020 S pts/13 0:00
/bin/bash
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/test-exec.sh
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/rekey.sh
0 3000 27772 27640 0 59 20 7200 4640 301226e40b2 S pts/13 0:02
/export/home/tgc/buildpkg/openssh/src/openssh-git/ssh
-E/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/ssh.log
-oRekeyLimit=256k -oCompression=no -v -F
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/ssh_proxy
somehost cat >
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/copy
0 3000 27773 27772 0 59 20 7336 4696 30043c38b02 S pts/13 0:00
/export/home/tgc/buildpkg/openssh/src/openssh-git/sshd -i -f
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/sshd_proxy
-E/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/sshd.log
0 3000 27775 27773 0 59 20 7616 2512 300f315a502 S pts/13 0:00
/export/home/tgc/buildpkg/openssh/src/openssh-git/sshd -i -f
/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/sshd_proxy
-E/export/home/tgc/buildpkg/openssh/src/openssh-git/regress/sshd.log
0 3000 27776 27775 0 0 0 0 Z 0:00
<defunct>
Not much to see with truss:
$ truss -p 27772
poll(0xFFBFCD28, 1, -1) (sleeping...)
$ truss -p 27773
poll(0xFFBFDB5C, 1, -1) (sleeping...)
$ truss -p 27775
poll(0xFFBFD8C8, 1, -1) (sleeping...)
-tgc