thr3ads.net - dovecot - Foreman component [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Aki Tuomi

2016-Apr-19 09:55 UTC

Foreman component

Hi!

I am planning to add foreman component to dovecot core and I am hoping
for some feedback:

Foreman - generic per-user worker handling component

It is responsible for managing worker pools per protocol, lifecycle
management of workers and handing out instances to others, as in, unix
socket connections.

Each time a request is made to foreman for a worker, the worker is
instantiated (if possible) and locked. Then connection is created to the
socket of the worker, and the file descriptor is returned to requestor.

When the requestor has completed the task, it's responsibility is to
close the DATA channel and also ask foreman to unlock the worker.

Components can register new workers using lib-foreman's API. Each worker
is registered with

struct foreman_worker {
    const char *protocol;
    const char *path;
    const char *version;
    unsigned int max_instances;
    unsigned int max_requests; /* how many requests one worker can handle */
    unsigned int max_idletime_secs; /* how long worker can idle */
    unsigned int max_processtime_secs; /* how long worker can process
something */
    unsigned int max_lifetime_secs; /* how long a worker can live
                       this can be exceeded if the worker
                       has work to do. */
};

/* minimum lifetime of worker: max_idletime_secs */
/* absolute maximum lifetime of worker: max_lifetime_secs +
max_processtime_secs */

The unsigned ints are optional. If they are not defined, the workers are
kept until foreman exceeds the total number of workers permitted across
pools.

Pools are per-worker-class pools, and are generated when a worker is
registered.

Version specifies the protocol version and is going to be "1000" now. 
Next version will always be 1001, 1002 etc.

Any feedback or questions are welcome!

Aki

Steffen Kaiser

2016-Apr-20 05:35 UTC

head link

Foreman component

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 19 Apr 2016, Aki Tuomi wrote:

Hi Aki,>
> I am planning to add foreman component to dovecot core and I am hoping
> for some feedback:
>
> Foreman - generic per-user worker handling component
>
> It is responsible for managing worker pools per protocol, lifecycle
> management of workers and handing out instances to others, as in, unix
> socket connections.
>
> Each time a request is made to foreman for a worker, the worker is
> instantiated (if possible) and locked. Then connection is created to the
> socket of the worker, and the file descriptor is returned to requestor.
this is not like a multiplexor, that forwards the data between client and 
worker, but Forman will pass the new fd back to the client and may close 
the connection to it, right?

Like: http://man7.org/tlpi/code/online/dist/sockets/scm_rights_send.c.html
http://keithp.com/blogs/fd-passing/
> When the requestor has completed the task, it's responsibility is to
> close the DATA channel and also ask foreman to unlock the worker.
If the client abends, Foreman would held the worker locked forever. 
Workers should notify their Foreman, that they are ready to take another 
task. Or, when a new client wants to make a connection to a worker, 
Foreman could probe unlocked workers first, then locked workers, if they 
are available. So the "locked" state is more a soft rather than a hard
condition.
> Components can register new workers using lib-foreman's API. Each
worker
> is registered with
>
> struct foreman_worker {
>    const char *protocol;
>    const char *path;
>    const char *version;
>    unsigned int max_instances;
>    unsigned int max_requests; /* how many requests one worker can handle */
>    unsigned int max_idletime_secs; /* how long worker can idle */
>    unsigned int max_processtime_secs; /* how long worker can process
> something */
>    unsigned int max_lifetime_secs; /* how long a worker can live
>                       this can be exceeded if the worker
>                       has work to do. */
> };
>
> /* minimum lifetime of worker: max_idletime_secs */
> /* absolute maximum lifetime of worker: max_lifetime_secs +
> max_processtime_secs */
>
> The unsigned ints are optional. If they are not defined, the workers are
> kept until foreman exceeds the total number of workers permitted across
> pools.
>
> Pools are per-worker-class pools, and are generated when a worker is
> registered.
>
> Version specifies the protocol version and is going to be "1000"
now.
> Next version will always be 1001, 1002 etc.
> Any feedback or questions are welcome!
>
> Aki
>
>
- -- 
Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEVAwUBVxcVKnz1H7kL/d9rAQJEnAf9H7qUT5AGVaaUkuRO24rtymCxGa2Pqupl
jyYARY4P7hwUVohe+f4WeaPtppq7hzpzpJeVsh3oSsDOm1q96SFGzlGMZALurW78
P4Yqn0OwJXAfGF6g/84PKd+kNum+a9aXVOjHcGMsqpXg8UlVTnz3NrrrG/sGdKna
4cLOlorrITQ38Mo/bjPBxP2BMhUhGRib8pIk6fND25OOJwN0NH420AvXOl80aGa6
a6jJhE80qHaAWDwhpU8Oj09AvD8/WNY172Bg7UjHQhrczQKRRp4dGWIDXVMnm2wd
HK9MriOubvdZ4hcSs56W0GsFnY/p8saT+e+94ruT8OPbUrEyYFu1zg==B3I3
-----END PGP SIGNATURE-----

Timo Sirainen

2016-Apr-20 19:39 UTC

head link

Foreman component

On 19 Apr 2016, at 12:55, Aki Tuomi <aki.tuomi at dovecot.fi>
wrote:> 
> I am planning to add foreman component to dovecot core and I am hoping
> for some feedback:
> 
> Foreman - generic per-user worker handling component
First an explanation of what this was planned to be used for: Think about many
short-lived JMAP (HTTP) connections with each connection creating a new jmap
process that opens the user's mailbox, processes the JMAP command, closes
the mailbox and kills the process. Repeat for each command. Not very efficient
when the same jmap process could handle all of the user's JMAP requests. The
same problem exists also with most webmails' IMAP connections that are very
short-lived.

One annoying problem with the foreman concept is that it requires an open UNIX
socket for all the worker processes. Which could mean >10k open UNIX sockets,
which all too often runs into file descriptor limits. We could of course just
increase it high enough, and it probably would work ok.. But I also hate adding
more of these "master" processes because they don't scale easily
to multiple CPUs so they might become bottlenecks at some point (and some of
these existing master processes already have become bottlenecks).

I've been trying to figure out a nice solution for the above problem for
years already, but never really came up with anything better. Except today
finally I had the new realization that anvil process already contains all of the
needed information. We don't need a new process containing duplicated data,
just some expansion of anvil and master. Of course, anvil is still kind of a
"master" process that knows about all users, but it's already
there anyway. And there's the new idea of how to avoid a single process
using a ton of sockets:

(Talking only about IMAP here for clarity, but the same applies to POP3, JMAP
and others.)

 - Today anvil already keeps track of (user, protocol, imap-process-pid), which
is where "doveadm who" gets the user list.
 - Today imap-login process already does anvil lookup to see if the user has too
many open connections. This lookup could be changed to also return the
imap-process-pid[] array.
 - We'll add a new feature to Dovecot master: Ability to specify service
imap { unix_listener /var/run/dovecot/login/imap-%{pid} { .. } }, which would
cause such a UNIX socket path to be dynamically created for each created
process. Only that one process is listening in the socket, master process itself
wouldn't keep it open. When the process gets destroyed, the socket gets
deleted automatically.
 - When imap process starts serving an IMAP connection, it does fchmod(socket,
0) for its imap-%{pid} listener. When it stops serving an active IMAP connection
it does fchmod(socket, original-permissions).
 - imap-login process attempts to connect to each imap-%{pid} socket based on
the imap-process-pid[] list returned by anvil. It ignores each EACCES failure,
because those are already serving IMAP connections. If it succeeds in
connecting, it sends the IMAP connection fd to it. If not, it connects to the
default imap socket to create a new process.
 - The above method of trying to connect to every imap-process-pid[] is probaly
efficient enough, although it probably ends up doing a lot of unnecessary
connect() syscalls to sockets that are already handling existing connections. If
this needs to be optimized, we could also enhance anvil to keep track of the
"does this process have an active connection" flag and it would only
return imap-process-pid[] for the processes without an active connection. There
are of course some race conditions with this in any case but the worst that can
happen is that a new imap process is created when there was another existing one
already that could have served the connection, so slightly worse performance in
some rare situations.

These same per-process sockets might be useful for other purposes too.. I've
many times wanted an ability to communicate with an existing process. The
"ipc" process was an attempt to do something about it, but it's
not very nice and has the same problems with potentially using a huge number of
fds.

Then there's the issue of how the management of idle processes (= processes
with no active IMAP connections) goes:
 - service { idle_kill } already specifies when processes without clients are
killed. We can use this here as well, so when IMAP connection has closed the
process stays alive for idle_kill number of seconds until it gets closed.
 - If idle_kill times are set large enough on a busy system, we're usually
reaching service { process_limit } constantly. So when no new processes can be
created, we need the ability to kill an existing process instead. I think this
is master process's job. When connection comes to "imap" and
process_limit is reached, master picks the imap process with the longest
idle-time and kills it (*). Then it waits for it to die and creates a new
process afterwards. There's race condition here though and the process may
not die but instead notify master that it's serving a new client. In this
case master needs to retry with the next process. The process destroying might
also not be fast always. To avoid unnecessarily large latencies due to waiting
for process destruction, I think master should always try to stay a bit below
process_limit (= a new service setting).
 - (*) I'm not sure if longest idle-time is the ideal algorithm. Some more
heuristics would be useful, but this complicates master process too much. The
processes themselves could try to influence master's decisions with some
status notifications. For example if we've determined that user at
example.com constantly logs in every 5 minutes, and the process has been idle
for 4mins59 seconds, which is also the oldest idling process, we still don't
want to kill it because we know that it's going to be recreated in 1 second
anyway. This is probably not going to be in the first version though.

dovecot - Apr 2016 - Foreman component

Foreman component

Foreman component

Foreman component