Good morning! We're experiencing rather very bad latency spikes on busy Linux systems, for example if one machine is the jumphost (ssh -J) for a few hundred connections, while at the same time handles CPU intensive tasks. Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g. put all ssh processes into the SCHED_FIXED scheduling class, with a priority higher than the non-interactive compute processes? Also, do I interpret it correctly that each forwarded TCP connection has its own process?! Ced -- Cedric Blancher <cedric.blancher at gmail.com> [https://plus.google.com/u/0/+CedricBlancher/] Institute Pasteur
On Thu, 10 Aug 2023, Cedric Blancher wrote:>We're experiencing rather very bad latency spikes on busy Linux >systems, for example if one machine is the jumphost (ssh -J) for a few >hundred connections, while at the same time handles CPU intensive >tasks. > >Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g.Did you already check the old and tried method of nice(2)? If the other load is CPU-intensive, this is usually sufficient. Normally you?d nice the CPU-intensive load (so anything else on the system is not affected), but you can also negative-nice the sshd processes (and therefore, the children) which however may not be sufficient and could require to negative-nice some other processes or kernel tasks as well, so see if your scenario can just positive-nice the load instead. gl hf, //mirabilos -- Infrastrukturexperte ? tarent solutions GmbH Am Dickobskreuz 10, D-53121 Bonn ? http://www.tarent.de/ Telephon +49 228 54881-393 ? Fax: +49 228 54881-235 HRB AG Bonn 5168 ? USt-ID (VAT): DE122264941 Gesch?ftsf?hrer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg **************************************************** /?\ The UTF-8 Ribbon ??? Campaign against Mit dem tarent-Newsletter nichts mehr verpassen: ??? HTML eMail! Also, https://www.tarent.de/newsletter ??? header encryption! ****************************************************
On Thu, 10 Aug 2023, Cedric Blancher wrote:> Good morning! > > We're experiencing rather very bad latency spikes on busy Linux > systems, for example if one machine is the jumphost (ssh -J) for a few > hundred connections, while at the same time handles CPU intensive > tasks. > > Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g. > put all ssh processes into the SCHED_FIXED scheduling class, with a > priority higher than the non-interactive compute processes?If the problem is load caused by the ssh connections then a different scheduling class isn't likely to help.> Also, do I interpret it correctly that each forwarded TCP connection > has its own process?!Usually yes. If you're using connection multiplexing (ControlPath/ ControlMaster/ControlPersist) then connections from the same user through the same jump host can be shared. -d
On Wed, Aug 9, 2023 at 10:42?PM Cedric Blancher <cedric.blancher at gmail.com> wrote:> > Good morning! > > We're experiencing rather very bad latency spikes on busy Linux > systems, for example if one machine is the jumphost (ssh -J) for a few > hundred connections, while at the same time handles CPU intensive > tasks. > > Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g. > put all ssh processes into the SCHED_FIXED scheduling class, with a > priority higher than the non-interactive compute processes?Real Time Linux doesn't solve these problems. It attempts to handle the variety of interrupts more consistently and equitably, but precisely the sort of "I'm too danged busy with this high priority process" issues of a highly burdened server are likely to be *aggraveted* by the incorrect guesses of what processes really matter on a "real-time" system. If you know which problems are most important and can raise their priority, great, but unpredictable delays are usually the sign of a "too-busy with too many processes" kernel.
On Thu, 10 Aug 2023 at 12:47, Cedric Blancher <cedric.blancher at gmail.com> wrote: [...]> We're experiencing rather very bad latency spikes on busy Linux > systems, for example if one machine is the jumphost (ssh -J) for a few > hundred connections, while at the same time handles CPU intensive > tasks.Are these hundreds of connections started around the same time? Connection establishment is the most computationally expensive part of the process by some margin, and if you have clients synchronized I could imagine that causing load spikes. If that's the case you could try disabling the more expensive key exchange algorithms ("KexAlgorithms in the config of either the client or server) or host key algos (HostKeyAlgorithms in the server config). Try benchmarking the available options, but I'd bet the post-quantum safe default KexAlgorithm (sntrup761x25519-sha512 at openssh.com) is the most expensive one. -- Darren Tucker (dtucker at dtucker.net) GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Reasonably Related Threads
- RT/Linux SCHED_RR/_FIXED to combat latency?
- Legacy option for key length?
- hackers celebrate this day: openssh drops security! was: Re: heads up: tcpwrappers support going away
- Evaluating a port to RTEMS (embedded OS with single address space and no processes)
- Legacy option for key length?