Steffen Nurpmeso
2021-Sep-10 12:06 UTC
ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1
Hello Peter. Peter Stuge wrote in <20210909225708.2121.qmail at stuge.se>: |Steffen Nurpmeso wrote: |>|I think the lack of feedback about removal success/failure inherent |>|with signals is problematic. |> |> Bah. What hair-splitting is that? Sorry. | |To clarify, my comment is about using a signal to trigger the deletion; Well in the end i am only def and blind and need some time for wrapping my head around a problem. I think in fact that de Raadt's initial comment pointed into the right direction, and that requires nothing such, but the patch would be diff --git a/ssh-agent.1 b/ssh-agent.1 index ed8c870969..dd2b0e8af1 100644 --- a/ssh-agent.1 +++ b/ssh-agent.1 @@ -178,6 +178,9 @@ will automatically use them if present. is also used to remove keys from .Nm and to query the keys that are held in one. +.Nm +will remove all keys when it receives the user signal +.Dv SIGUSR1 . .Pp Connections to .Nm diff --git a/ssh-agent.c b/ssh-agent.c index 48a47d45a9..014cf7b38c 100644 --- a/ssh-agent.c +++ b/ssh-agent.c @@ -171,6 +171,11 @@ static int fingerprint_hash = SSH_FP_HASH_DEFAULT; /* Refuse signing of non-SSH messages for web-origin FIDO keys */ static int restrict_websafe = 1; +/* Request to remove_all_identities() pending */ +static volatile sig_atomic_t remove_all_asap = 0; + +static enum run_state { RS_RUN, RS_POLL } run_state = RS_RUN; + static void close_socket(SocketEntry *e) { @@ -529,11 +534,10 @@ process_remove_identity(SocketEntry *e) } static void -process_remove_all_identities(SocketEntry *e) +remove_all_identities(void) { Identity *id; - debug2_f("entering"); /* Loop over all identities and clear the keys. */ for (id = TAILQ_FIRST(&idtab->idlist); id; id = TAILQ_FIRST(&idtab->idlist)) { @@ -543,6 +547,13 @@ process_remove_all_identities(SocketEntry *e) /* Mark that there are no identities. */ idtab->nentries = 0; +} + +static void +process_remove_all_identities(SocketEntry *e) +{ + debug2_f("entering"); + remove_all_identities(); /* Send success. */ send_status(e, 1); @@ -1342,6 +1353,16 @@ cleanup_handler(int sig) _exit(2); } +/*ARGSUSED*/ +static void +remove_all_handler(int sig) +{ + if (run_state == RS_POLL) + remove_all_identities(); + else + remove_all_asap = 1; +} + static void check_parent_exists(void) { @@ -1619,6 +1640,7 @@ skip: ssh_signal(SIGINT, (d_flag | D_flag) ? cleanup_handler : SIG_IGN); ssh_signal(SIGHUP, cleanup_handler); ssh_signal(SIGTERM, cleanup_handler); + ssh_signal(SIGUSR1, remove_all_handler); if (pledge("stdio rpath cpath unix id proc exec", NULL) == -1) fatal("%s: pledge: %s", __progname, strerror(errno)); @@ -1626,11 +1648,17 @@ skip: while (1) { prepare_poll(&pfd, &npfd, &timeout, maxfds); + run_state = RS_POLL; result = poll(pfd, npfd, timeout); + run_state = RS_RUN; saved_errno = errno; if (parent_alive_interval != 0) check_parent_exists(); - (void) reaper(); /* remove expired keys */ + if (remove_all_asap) { + remove_all_asap = 0; + remove_all_identities(); + } else + (void) reaper(); /* remove expired keys */ if (result == -1) { if (saved_errno == EINTR) continue; And if you really want to be super exact, one would add diff --git a/ssh-agent.c b/ssh-agent.c index 014cf7b38c..1ca8d3229e 100644 --- a/ssh-agent.c +++ b/ssh-agent.c @@ -1647,6 +1647,10 @@ skip: platform_pledge_agent(); while (1) { + if (remove_all_asap) { + remove_all_asap = 0; + remove_all_identities(); + } prepare_poll(&pfd, &npfd, &timeout, maxfds); run_state = RS_POLL; result = poll(pfd, npfd, timeout); So with these two there is the security guarantee you were asking for, that the removal happens instantly. It is also portable. Since all signals are blocked, all we need to know is whether we are hanging on poll(2), in which case it is "safe" to remove from within the signal handler in that no other code is currently running in user space. I'd say. (Of course on Cygwin poll(2) could be implemented like grazy, i have not looked in that for i would say really twenty years, select(2) by then, but then again i think they should block signals while working this tremendous piece of code in something that people think are system calls.) |with a signal the sender can not know whether removal completed successf\ |ully, |failed or is ongoing. They can merel hope for the best. That's a very weak |security promise. But why so Peter? Signals are used for communication on Unix since ever? If you send a signal to a process, and it has not ignored the signal (if it can), the action behind happens. You can see that easily on your Linux system, it kill aborts any program you have rights for, just specify a realtime signal. |I for one would expect suspend to wait for successful removal and |that the suspend is cancelled with an error message should removal |fail, to know the state of agents. What should fail when zeroing memory? There is no status for this operation anyhow, is it? It is merely you calling ssh-add, which then reports interaction with the agent was successful. ... |>|direct interaction with the agents look a lot better, with the |> |> How do you do this, Mister, from an ACPI script running as root? | |for SSH_AUTH_SOCK in /tmp/ssh-*/agent.*; do ssh-add -D; done # maybe? This is overly funny. |>|significant additional advantage of needing zero new agent code. |> |> Oh nooo! Spoiler!!! | |When in doubt, keep the trusted codebase small. Done. I always hate it to look in codebases that i do not know. There is so much work and time in such a codebase, done by people knowing the context exactly, and i just hate it. I mean this is effectively an easy thing here (but for me), but in general someone who knows a codebase can implement something _right_ (avoiding pitfalls) in a very short time, a state that third parties can achieve only with real effort. So the patch above should possibly have been it from the start. Maybe it will be considered, that would be nice. (Just 's/^ //.) Ciao from Germany, --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Steffen Nurpmeso
2021-Sep-10 13:56 UTC
ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1
Steffen Nurpmeso wrote in <20210910120610._EV-u%steffen at sdaoden.eu>: ... ||failed or is ongoing. They can merel hope for the best. That's a very weak ||security promise. And ... a test addition that somehow also escaped me until now. (Btw i tried "make tests LTESTS=agent" after having run the complete test once, and for me it seems to run all tests starting with the one given in LTESTS, or at least: make[1]: Entering directory '/tmp/x/openssh.tar_bomb_git/regress' run test agent.sh ... ok simple agent test make[1]: Leaving directory '/tmp/x/openssh.tar_bomb_git/regress' all t-exec passed BUILDDIR=`pwd`; \ cd ./regress || exit $?; \ EGREP='/usr/bin/grep -E' ...interop-tests ...test_sshbuf: ......^Cmake[1]: *** [Makefile:251: unit] Interrupt make: *** [Makefile:713: unit] Interrupt So i hope i finally made my homework and can now stop making noise. Ciao. And a nice weekend everybody. diff --git a/regress/agent.sh b/regress/agent.sh index f187b67572..2544f932eb 100644 --- a/regress/agent.sh +++ b/regress/agent.sh @@ -157,30 +157,42 @@ done ## Deletion tests. +delete_cycle() { + # make sure they're gone + ${SSHADD} -l > /dev/null 2>&1 + r=$? + if [ $r -ne 1 ]; then + fail "ssh-add -l returned unexpected exit code: $r" + fi + trace "readd keys" + # re-add keys/certs to agent + for t in ${SSH_KEYTYPES}; do + ${SSHADD} $OBJ/$t-agent-private >/dev/null 2>&1 || \ + fail "ssh-add failed exit code $?" + done + # make sure they are there + ${SSHADD} -l > /dev/null 2>&1 + r=$? + if [ $r -ne 0 ]; then + fail "ssh-add -l failed: exit code $r" + fi +} + trace "delete all agent keys" ${SSHADD} -D > /dev/null 2>&1 r=$? if [ $r -ne 0 ]; then fail "ssh-add -D failed: exit code $r" fi -# make sure they're gone -${SSHADD} -l > /dev/null 2>&1 -r=$? -if [ $r -ne 1 ]; then - fail "ssh-add -l returned unexpected exit code: $r" -fi -trace "readd keys" -# re-add keys/certs to agent -for t in ${SSH_KEYTYPES}; do - ${SSHADD} $OBJ/$t-agent-private >/dev/null 2>&1 || \ - fail "ssh-add failed exit code $?" -done -# make sure they are there -${SSHADD} -l > /dev/null 2>&1 +delete_cycle + +trace "delete all agent keys via SIGUSR1" +kill -USR1 $SSH_AGENT_PID >/dev/null 2>&1 r=$? if [ $r -ne 0 ]; then - fail "ssh-add -l failed: exit code $r" + fail "kill -USR1: exit code $r" fi +delete_cycle check_key_absent() { ${SSHADD} -L | grep "^$1 " >/dev/null --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Peter Stuge
2021-Sep-11 16:47 UTC
ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1
Hi Steffen, Steffen Nurpmeso wrote:> So with these two there is the security guarantee you were asking > for, that the removal happens instantly.Sorry but no, as long as the trigger is a signal there's no guarantee. See below.> It is also portable.Nice!> |with a signal the sender can not know whether removal completed > |successfully, failed or is ongoing. They can merel hope for the best. > |That's a very weak security promise. > > But why so Peter? Signals are used for communication on Unix since ever?In your stated use case of suspend on laptop lid close there is a race condition between the actual suspend and SIGUSR1 being handled in agents. The ACPI script runs various commands to prepare for suspend, one of them would be pkill or killall. That command would find all agent processes and call kill(2) to send SIGUSR1 to each one. There's a race because signal handling is asynchronous. kill(2) returns as soon as the signal has been *generated*, not when the signal handler in the receiving process has completed, in any case quite possibly before the signal handler has even started. After generating signals the pkill/killall command exits and from there execution of the suspend script (which continues towards suspending) and execution of signal handlers in agents (clearing keys) are decoupled without any way to synchronize - the suspend script can't get feedback from the agents. Consider as an exaggerated example the pkill/killall command being the very last command in the ACPI script before echo mem > /sys/power/state and a system with two CPU threads and 100+ running agents. I wouldn't be surprised if the kernel will have suspended the machine before all 100+ signal handlers have finished clearing keys. Maybe some or many kernels will reliably deliver all pending signals before suspending, but that doesn't mean much beyond scheduling the signal handlers, right? In particular I don't expect any kernel to wait for all processes to be idle or otherwise wait for signal handlers to return. I also wouldn't be surprised if the kernel would suspend the machine at least sometimes before keys were removed with just one agent running. That may be less likely, but the point is that with a signal there's no way to know *for sure* that signal handlers have completed before suspend.> ignored the signal (if it can), the action behind happens.Yes. The signal handler will run at some point, but maybe not until after resume, leaving keys in the agent while the machine is suspended, ie. silently failing to do what was intended.> |I for one would expect suspend to wait for successful removal and > |that the suspend is cancelled with an error message should removal > |fail, to know the state of agents. > > What should fail when zeroing memory?Zeroing can't fail - later patch iterations have improved in this regard! In earlier iterations at least socket() and connect() could have failed.> There is no status for this operation anyhow, is it?There is; anything but successful exit of ssh-add -D indicates that the intended operation was not (or not yet) completed as intended. The operation is logically simple, but technically complex enough that there can be many reasons for failure. Rather than trying to explicitly check them all I like that if ssh-add -D exits with success then, and only then, I do know that the intended operation has completed.> |>|direct interaction with the agents look a lot better, with the > |> > |> How do you do this, Mister, from an ACPI script running as root? > | > |for SSH_AUTH_SOCK in /tmp/ssh-*/agent.*; do ssh-add -D; done # maybe? > > This is overly funny.Why? Did you test it and/or spot a problem? It does work for me. If this is run in the suspend script then there's no race condition. With added error checking of ssh-add exit status the machine could be made to not suspend unless *after* all keys had been removed.> |>|significant additional advantage of needing zero new agent code. > |> > |> Oh nooo! Spoiler!!! > | > |When in doubt, keep the trusted codebase small. > > Done.I disagree; every addition grows the codebase.> I always hate it to look in codebases that i do not know.I can understand that! I see that a bit differently: I find it interesting to look in code that I don't know, because I may learn from it. You're right that it requires effort, but so does working with my own code. :) Kind regards //Peter