Yu He
2010-Apr-01 06:47 UTC
OpenSSH Coredump and "Bad packet length" errors seen on 5.10 sparc sun4v (Generic_125100-10)
Hi, OpenSSH coredump was seen on our customer's side causing ssh login slow and manual command not workable. We need help to identify the root cause. Thanks!!>> Background:1) server info: # uname -a SunOS owtnmncccm0cnmo 5.10 Generic_125100-10 sun4v sparc SUNW,Netra-CP3060 bash-3.00# /usr/local/bin/ssh -v OpenSSH_4.6p1, OpenSSL 0.9.8e 23 Feb 2007 bash-3.00# cat /usr/local/etc/sshd_config | grep -v "^#" | grep -v "^$" Subsystem sftp /usr/local/libexec/sftp-server 2) there is no user activities around the issue. A daemon script was running at background to sync certain files between our server pair (From unit A to unit B, both have same configuration and OS level) 3) After error happened, system hung and responsed very slowly to like ssh login 4) People had to reboot the server and everything looked fine afterwards.>> Coredump:# cd /var/core # ls -ltr -rw------- 1 root root 5199389 Feb 24 07:01 core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729 # pstack core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729 core 'core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729' of 6729: /usr/local/bin/ssh root at server0-unit1 rm -f /etc/init.d/staticroutes ff1ee314 AES_decrypt (3c, d1, aaa5d0a5, 314, 74, 3b0) + 2f4 ff1ee66c AES_cbc_encrypt (74490, 774a8, 10, 6a358, 61fb8, 61fb8) +2c ff238abc aes_128_cbc_cipher (1, 774a8, 74490, 10, f0, ff2d9a18) + 1c ff23dfb8 EVP_Cipher (61f98, 774a8, 74490, 10, 61800, 62400) + 18 0002f3e4 cipher_crypt (61f94, 774a8, 74490, 10, f0, 7b528) + 34 000338a4 packet_read_poll_seqnr (ffbfe474, 62000, 62000, 620f0,61800, 62400) + 258 00033f94 packet_read_seqnr (0, 6, ffbfe510, 628a8, f0, 3c) + 40 00038bbc dispatch_run (0, ffbfe524, ffbfe510, ffbfe4f0, 624ac, ff) +1c 00025988 ssh_userauth2 (64568, 65250, 72e08, 628a8, 1, 0) + 52c 00021a20 ssh_login (72e08, 4, 45400, 14, 45400, a) + 3a4 000196b4 main (62b14, 647e4, 42a60, 42a58, 42800, 62800) + 8a4 00017e48 _start (0, 0, 0, 0, 0, 0) + 5 Around the coredump, many "Bad packet length" errors can be seen in /var/adm/message, like: Disconnecting: Bad packet length 2298694383. Feb 24 07:00:36 owtnmncccm0cnmo sshd[860]: [ID 800047 auth.info] Disconnecting: Bad packet length 604783901. Feb 24 07:00:36 owtnmncccm0cnmo sshd[873]: [ID 800047 auth.info] Disconnecting: Bad packet length 2577232018.>> More:SSH calling is by sync daemon script (from server0-unit0 to server0-unit1), trying to remove /etc/init.d/staticroutes file which is on unit1 but not on unit0 A snippet: message_out ${MSG_TRACE} "The replicated file $a_file does not exist; remove it from the other unit: $OTHERUNIT." yes /usr/local/bin/ssh root@${OTHERUNIT} "rm -f ${a_file}" rc=$? if [ $rc -ne 0 ]; then message_out $MSG_ERROR "Failed to delete ${a_file} on ${OTHERUNIT} with return code $rc." yes return 1 fi FYI: No record now if or not this calling (rm -f /etc/init.d/staticroutes) succeeded. Coredump and message files have back ups. Tell me if I need upload it somewhere. Looking forward to your help&advice Regards, Yu
Possibly Parallel Threads
- 6299501 sun4v cscope/tags include sun4u, still references to sun4u in sun4v directory
- PSARC/2005/413 sun4v optimized MD5 and arcfour kernel cryptographic modules
- FWARC 2005/367 sun4v watchdog service API
- 6289017 sun4v system hangs during the boot
- 6312695 memnode.h deliverable is present on sun4u/sun4v but not on x86 platform.