On Fri, Jun 22, 2001 at 03:03:27PM -0500, pete at more.net
wrote:> Hello,
>
> I'm running several Sparcs with Solaris 2.7 on them. They all have
> Openssh 2.9 installed, and all work fine. Except one. Every once in a
> while I get this:
>
> "Disconnecting: Corrupted check bytes on input."
>
> When I truss or I am at the console I get this:
>
> "rsa_private_decrypt() failed"
>
> I've tried reinstalling with OpenSSH 2.5, 2.9 and the commercial
> versions. I get these errors from Linux and x86 Solaris boxes too in
> random intervals. It's just this one box.
>
> Obviously it seems like a lib is screwed up. All of my boxes are at the
> same patch level. I've also installed new OpenSSL and zlib's.
I've seen
> different requests at the OpenSSH site's mailing list archives for a
fix,
> and I haven't found any that work.
>
> I've generated several sets of keys, and replaced all the configs with
> one's known to work fine everywhere else. Any clues or suggestions
> would be greatly appreciated.
I had this problem on SSH 1.2.27, and it turned out to be a limited to the
Sun compiler; compiling with gcc fixed the problem.
I too had it happening on only one machine, and only about 25% of the time;
machines with identical hardware and software did not fail. I narrowed
down the problem to the section of code that calculates MD5 checksums; some
percentage of the time it simply calculated an incorrect value. I found
that if I recompiled with gcc, the problem went away. I tried Sun compiler
versions SC3.0.1, SC4.0, and SC4.2 and they all had the problem. My theory
is that there is some processor register that is not being saved by the Sun
compiler which sometimes got clobbered by a context switch or swap, and it
only happened on the one machine because it was more heavily loaded.
- Dave Dykstra