thr3ads.net - openssh unix dev - Instrumentation for metrics [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Craig Miskell

2020-Jan-21 00:59 UTC

Instrumentation for metrics

Hi,

We serve a fairly substantial number[1] of ssh connections across our 
fleet.? We have hit MaxStartups limits in the past and bumped it up a 
few times (currently at 300), but we have no warning before the limit is 
reached and connections start being dropped.? What I would love is some 
sort of instrumentation that could let us see the highest number of 
concurrent pre-auth connections the current running instance of the 
daemon has seen, so we can graph it and alert on it pro-actively (e.g. 
when we get within some reasonable percentage of the actual limit), and 
then decide if we need to increase MaxStartups further, scale our fleet 
horizontally, or do something else.

I'm more than happy to write & contribute the code to do this 
instrumentation, but I'd like to get some guidance on 
direction/implementation options first, so I don't spend time writing 
code which is never going to be accepted.

The most trivial approach would be to add logging to the main daemon, 
either when we get within X% of MaxStartups (X being possibly 
configurable), or just log the current max value every X minutes or Y 
connections (perhaps at Verbose logging level?). Either would be 
functional, but both feel a little bit unwieldy.

Alternatively, we could go a more complex and flexible route such as the 
way haproxy does it, with a local unix socket that responds to a 'stats'
command with some simple text format.? This would be more generally 
usable and extensible to other metrics in future, and seems more robust 
to me, although would be a more noticeable amount of work than just logging.

Are either of these approaches in keeping with current design 
preferences?? I'm open to any (other) approach; once the info is exposed 
in *some* fashion, anyone can get it into their monitoring system of 
choice via various hooks, and I think being agnostic about the actual 
monitoring system is the right choice (e.g. a prometheus HTTP endpoint 
exporter embedded in OpenSSH would be very very wrong).

Thanks,
Craig Miskell
SRE, GitLab

[1] ~26M/day, ~300/s avg

Damien Miller

2020-Jan-21 05:03 UTC

head link

Instrumentation for metrics

On Tue, 21 Jan 2020, Craig Miskell wrote:
> Hi,
> 
> We serve a fairly substantial number[1] of ssh connections across our 
> fleet.? We have hit MaxStartups limits in the past and bumped it up a 
> few times (currently at 300), but we have no warning before the limit is 
> reached and connections start being dropped.? What I would love is some 
> sort of instrumentation that could let us see the highest number of 
> concurrent pre-auth connections the current running instance of the 
> daemon has seen, so we can graph it and alert on it pro-actively (e.g. 
> when we get within some reasonable percentage of the actual limit), and 
> then decide if we need to increase MaxStartups further, scale our fleet 
> horizontally, or do something else.
> 
> I'm more than happy to write & contribute the code to do this 
> instrumentation, but I'd like to get some guidance on 
> direction/implementation options first, so I don't spend time writing 
> code which is never going to be accepted.
> 
> The most trivial approach would be to add logging to the main daemon, 
> either when we get within X% of MaxStartups (X being possibly 
> configurable), or just log the current max value every X minutes or Y 
> connections (perhaps at Verbose logging level?). Either would be 
> functional, but both feel a little bit unwieldy.
It would be trivial to make sshd.c:drop_connection() log a little more,
e.g. when the number of authenticating connections exceeds say 50% of
MaxStartups.
> Alternatively, we could go a more complex and flexible route such as the 
> way haproxy does it, with a local unix socket that responds to a
'stats'
> command with some simple text format.? This would be more generally 
> usable and extensible to other metrics in future, and seems more robust 
> to me, although would be a more noticeable amount of work than just
logging.
I'm reticent to add more interfaces to the sshd listener, especially ones
that accept any sort of command. The sshd listener has to be reliable
and (IMO) as simple as possible. Maybe some write-only interface where
sshd can dump stats could be simple enough?

There aren't many stats to dump ATM though, just the active number of
startups - everything else of potential interest is in separate forked
sshd processes where they aren't accessible without quite a lot of work.

This makes me think that the syslog approach is probably the way to go
unless someone can come up with other stuff that would be a) worth reading
and b) accessible.

-d

Philipp Marek

2020-Jan-21 07:05 UTC

head link

Instrumentation for metrics

> This makes me think that the syslog approach is probably the way to go
Yeah, right.
Another idea is to mirror the current preauth load via setproctitle()...
That makes that data accessible even without a syscall (at least the
writing of the data - quering needs syscalls, right), so that can be
kept up-to-date and allows a high monitoring frequency as well.

Multiple instances of SSHd (on different ports) are easily distinguished
as well.

> unless someone can come up with other stuff that would be a) worth 
> reading
> and b) accessible.
Data that I would like to see logged is the utime information of child
processes - how much user/sys time the processes took, memory usage,
and some more.

I imagine a single-line output with SSHd pid, session ID, user,
child PID, and the accounting data - that would be nice to have.


The parallel ongoing discussion about ControlMaster reminds me that
one SSH connection might drop multiple such log lines...

Possibly Parallel Threads

Search for more apparently analagous threads

openssh unix dev - Jan 2020 - Instrumentation for metrics

Instrumentation for metrics

Instrumentation for metrics

Instrumentation for metrics

Possibly Parallel Threads