thr3ads.net - openssh unix dev - ssh-agent: perform AGENTC_REMOVE_ALL

If this information is useful, please help other people find it:
Share via:

Steffen Nurpmeso

2021-Sep-10 12:06 UTC

ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1

Hello Peter.

Peter Stuge wrote in
 <20210909225708.2121.qmail at stuge.se>:
 |Steffen Nurpmeso wrote:
 |>|I think the lack of feedback about removal success/failure inherent
 |>|with signals is problematic.
 |> 
 |> Bah.  What hair-splitting is that?  Sorry.
 |
 |To clarify, my comment is about using a signal to trigger the deletion;

Well in the end i am only def and blind and need some time for
wrapping my head around a problem.  I think in fact that de
Raadt's initial comment pointed into the right direction, and that
requires nothing such, but the patch would be

  diff --git a/ssh-agent.1 b/ssh-agent.1
  index ed8c870969..dd2b0e8af1 100644
  --- a/ssh-agent.1
  +++ b/ssh-agent.1
  @@ -178,6 +178,9 @@ will automatically use them if present.
   is also used to remove keys from
   .Nm
   and to query the keys that are held in one.
  +.Nm
  +will remove all keys when it receives the user signal
  +.Dv SIGUSR1 .
   .Pp
   Connections to
   .Nm
  diff --git a/ssh-agent.c b/ssh-agent.c
  index 48a47d45a9..014cf7b38c 100644
  --- a/ssh-agent.c
  +++ b/ssh-agent.c
  @@ -171,6 +171,11 @@ static int fingerprint_hash = SSH_FP_HASH_DEFAULT;
   /* Refuse signing of non-SSH messages for web-origin FIDO keys */
   static int restrict_websafe = 1;
   
  +/* Request to remove_all_identities() pending */
  +static volatile sig_atomic_t remove_all_asap = 0;
  +
  +static enum run_state { RS_RUN, RS_POLL } run_state = RS_RUN;
  +
   static void
   close_socket(SocketEntry *e)
   {
  @@ -529,11 +534,10 @@ process_remove_identity(SocketEntry *e)
   }
   
   static void
  -process_remove_all_identities(SocketEntry *e)
  +remove_all_identities(void)
   {
   	Identity *id;
   
  -	debug2_f("entering");
   	/* Loop over all identities and clear the keys. */
   	for (id = TAILQ_FIRST(&idtab->idlist); id;
   	    id = TAILQ_FIRST(&idtab->idlist)) {
  @@ -543,6 +547,13 @@ process_remove_all_identities(SocketEntry *e)
   
   	/* Mark that there are no identities. */
   	idtab->nentries = 0;
  +}
  +
  +static void
  +process_remove_all_identities(SocketEntry *e)
  +{
  +	debug2_f("entering");
  +	remove_all_identities();
   
   	/* Send success. */
   	send_status(e, 1);
  @@ -1342,6 +1353,16 @@ cleanup_handler(int sig)
   	_exit(2);
   }
   
  +/*ARGSUSED*/
  +static void
  +remove_all_handler(int sig)
  +{
  +	if (run_state == RS_POLL)
  +		remove_all_identities();
  +	else
  +		remove_all_asap = 1;
  +}
  +
   static void
   check_parent_exists(void)
   {
  @@ -1619,6 +1640,7 @@ skip:
   	ssh_signal(SIGINT, (d_flag | D_flag) ? cleanup_handler : SIG_IGN);
   	ssh_signal(SIGHUP, cleanup_handler);
   	ssh_signal(SIGTERM, cleanup_handler);
  +	ssh_signal(SIGUSR1, remove_all_handler);
   
   	if (pledge("stdio rpath cpath unix id proc exec", NULL) == -1)
   		fatal("%s: pledge: %s", __progname, strerror(errno));
  @@ -1626,11 +1648,17 @@ skip:
   
   	while (1) {
   		prepare_poll(&pfd, &npfd, &timeout, maxfds);
  +		run_state = RS_POLL;
   		result = poll(pfd, npfd, timeout);
  +		run_state = RS_RUN;
   		saved_errno = errno;
   		if (parent_alive_interval != 0)
   			check_parent_exists();
  -		(void) reaper();	/* remove expired keys */
  +		if (remove_all_asap) {
  +			remove_all_asap = 0;
  +			remove_all_identities();
  +		} else
  +			(void) reaper();	/* remove expired keys */
   		if (result == -1) {
   			if (saved_errno == EINTR)
   				continue;

And if you really want to be super exact, one would add

  diff --git a/ssh-agent.c b/ssh-agent.c
  index 014cf7b38c..1ca8d3229e 100644
  --- a/ssh-agent.c
  +++ b/ssh-agent.c
  @@ -1647,6 +1647,10 @@ skip:
   	platform_pledge_agent();
   
   	while (1) {
  +		if (remove_all_asap) {
  +			remove_all_asap = 0;
  +			remove_all_identities();
  +		}
   		prepare_poll(&pfd, &npfd, &timeout, maxfds);
   		run_state = RS_POLL;
   		result = poll(pfd, npfd, timeout);

So with these two there is the security guarantee you were asking
for, that the removal happens instantly.  It is also portable.
Since all signals are blocked, all we need to know is whether we
are hanging on poll(2), in which case it is "safe" to remove from
within the signal handler in that no other code is currently
running in user space.  I'd say.  (Of course on Cygwin poll(2)
could be implemented like grazy, i have not looked in that for
i would say really twenty years, select(2) by then, but then again
i think they should block signals while working this tremendous
piece of code in something that people think are system calls.)

 |with a signal the sender can not know whether removal completed successf\
 |ully,
 |failed or is ongoing. They can merel hope for the best. That's a very weak
 |security promise.

But why so Peter?  Signals are used for communication on Unix
since ever?  If you send a signal to a process, and it has not
ignored the signal (if it can), the action behind happens.

You can see that easily on your Linux system, it kill aborts any
program you have rights for, just specify a realtime signal.

 |I for one would expect suspend to wait for successful removal and
 |that the suspend is cancelled with an error message should removal
 |fail, to know the state of agents.

What should fail when zeroing memory?  There is no status for this
operation anyhow, is it?  It is merely you calling ssh-add, which
then reports interaction with the agent was successful.

  ...
 |>|direct interaction with the agents look a lot better, with the
 |> 
 |> How do you do this, Mister, from an ACPI script running as root?
 |
 |for SSH_AUTH_SOCK in /tmp/ssh-*/agent.*; do ssh-add -D; done # maybe?

This is overly funny.

 |>|significant additional advantage of needing zero new agent code.
 |> 
 |> Oh nooo!  Spoiler!!!
 |
 |When in doubt, keep the trusted codebase small.

Done.  I always hate it to look in codebases that i do not know.
There is so much work and time in such a codebase, done by people
knowing the context exactly, and i just hate it.  I mean this is
effectively an easy thing here (but for me), but in general
someone who knows a codebase can implement something _right_
(avoiding pitfalls) in a very short time, a state that third
parties can achieve only with real effort.

So the patch above should possibly have been it from the start.
Maybe it will be considered, that would be nice.  (Just 's/^  //.)

Ciao from Germany,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Steffen Nurpmeso

2021-Sep-10 13:56 UTC

head link

ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1

Steffen Nurpmeso wrote in
 <20210910120610._EV-u%steffen at sdaoden.eu>:
 ...
 ||failed or is ongoing. They can merel hope for the best. That's a very
weak
 ||security promise.

And ... a test addition that somehow also escaped me until now.
(Btw i tried "make tests LTESTS=agent" after having run the
complete test once, and for me it seems to run all tests starting
with the one given in LTESTS, or at least:

  make[1]: Entering directory '/tmp/x/openssh.tar_bomb_git/regress'
  run test agent.sh ...
  ok simple agent test
  make[1]: Leaving directory '/tmp/x/openssh.tar_bomb_git/regress'
  all t-exec passed
  BUILDDIR=`pwd`; \
  cd ./regress || exit $?; \
  EGREP='/usr/bin/grep -E'
  ...interop-tests
  ...test_sshbuf: ......^Cmake[1]:
  *** [Makefile:251: unit] Interrupt
  make: *** [Makefile:713: unit] Interrupt

So i hope i finally made my homework and can now stop making noise.

Ciao.
And a nice weekend everybody.

diff --git a/regress/agent.sh b/regress/agent.sh
index f187b67572..2544f932eb 100644
--- a/regress/agent.sh
+++ b/regress/agent.sh
@@ -157,30 +157,42 @@ done
 
 ## Deletion tests.
 
+delete_cycle() {
+	# make sure they're gone
+	${SSHADD} -l > /dev/null 2>&1
+	r=$?
+	if [ $r -ne 1 ]; then
+		fail "ssh-add -l returned unexpected exit code: $r"
+	fi
+	trace "readd keys"
+	# re-add keys/certs to agent
+	for t in ${SSH_KEYTYPES}; do
+		${SSHADD} $OBJ/$t-agent-private >/dev/null 2>&1 || \
+			fail "ssh-add failed exit code $?"
+	done
+	# make sure they are there
+	${SSHADD} -l > /dev/null 2>&1
+	r=$?
+	if [ $r -ne 0 ]; then
+		fail "ssh-add -l failed: exit code $r"
+	fi
+}
+
 trace "delete all agent keys"
 ${SSHADD} -D > /dev/null 2>&1
 r=$?
 if [ $r -ne 0 ]; then
 	fail "ssh-add -D failed: exit code $r"
 fi
-# make sure they're gone
-${SSHADD} -l > /dev/null 2>&1
-r=$?
-if [ $r -ne 1 ]; then
-	fail "ssh-add -l returned unexpected exit code: $r"
-fi
-trace "readd keys"
-# re-add keys/certs to agent
-for t in ${SSH_KEYTYPES}; do
-	${SSHADD} $OBJ/$t-agent-private >/dev/null 2>&1 || \
-		fail "ssh-add failed exit code $?"
-done
-# make sure they are there
-${SSHADD} -l > /dev/null 2>&1
+delete_cycle
+
+trace "delete all agent keys via SIGUSR1"
+kill -USR1 $SSH_AGENT_PID >/dev/null 2>&1
 r=$?
 if [ $r -ne 0 ]; then
-	fail "ssh-add -l failed: exit code $r"
+	fail "kill -USR1: exit code $r"
 fi
+delete_cycle
 
 check_key_absent() {
 	${SSHADD} -L | grep "^$1 " >/dev/null

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Peter Stuge

2021-Sep-11 16:47 UTC

head link

ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1

Hi Steffen,

Steffen Nurpmeso wrote:> So with these two there is the security guarantee you were asking
> for, that the removal happens instantly.
Sorry but no, as long as the trigger is a signal there's no guarantee.
See below.

> It is also portable.
Nice!

> |with a signal the sender can not know whether removal completed
> |successfully, failed or is ongoing. They can merel hope for the best.
> |That's a very weak security promise.
> 
> But why so Peter?  Signals are used for communication on Unix since ever?
In your stated use case of suspend on laptop lid close there is a race
condition between the actual suspend and SIGUSR1 being handled in agents.

The ACPI script runs various commands to prepare for suspend, one of
them would be pkill or killall. That command would find all agent
processes and call kill(2) to send SIGUSR1 to each one.

There's a race because signal handling is asynchronous.

kill(2) returns as soon as the signal has been *generated*, not when the
signal handler in the receiving process has completed, in any case quite
possibly before the signal handler has even started.

After generating signals the pkill/killall command exits and from there
execution of the suspend script (which continues towards suspending) and
execution of signal handlers in agents (clearing keys) are decoupled
without any way to synchronize - the suspend script can't get feedback
from the agents.

Consider as an exaggerated example the pkill/killall command being the
very last command in the ACPI script before echo mem > /sys/power/state
and a system with two CPU threads and 100+ running agents.

I wouldn't be surprised if the kernel will have suspended the machine
before all 100+ signal handlers have finished clearing keys. Maybe some
or many kernels will reliably deliver all pending signals before suspending,
but that doesn't mean much beyond scheduling the signal handlers, right?

In particular I don't expect any kernel to wait for all processes to be
idle or otherwise wait for signal handlers to return.

I also wouldn't be surprised if the kernel would suspend the machine
at least sometimes before keys were removed with just one agent running.

That may be less likely, but the point is that with a signal there's no
way to know *for sure* that signal handlers have completed before suspend.

> ignored the signal (if it can), the action behind happens.
Yes. The signal handler will run at some point, but maybe not until
after resume, leaving keys in the agent while the machine is suspended,
ie. silently failing to do what was intended.

>  |I for one would expect suspend to wait for successful removal and
>  |that the suspend is cancelled with an error message should removal
>  |fail, to know the state of agents.
> 
> What should fail when zeroing memory?
Zeroing can't fail - later patch iterations have improved in this regard!

In earlier iterations at least socket() and connect() could have failed.

> There is no status for this operation anyhow, is it?
There is; anything but successful exit of ssh-add -D indicates that the
intended operation was not (or not yet) completed as intended.

The operation is logically simple, but technically complex enough that
there can be many reasons for failure. Rather than trying to explicitly
check them all I like that if ssh-add -D exits with success then, and
only then, I do know that the intended operation has completed.

>  |>|direct interaction with the agents look a lot better, with the
>  |> 
>  |> How do you do this, Mister, from an ACPI script running as root?
>  |
>  |for SSH_AUTH_SOCK in /tmp/ssh-*/agent.*; do ssh-add -D; done # maybe?
> 
> This is overly funny.
Why? Did you test it and/or spot a problem? It does work for me.

If this is run in the suspend script then there's no race condition.

With added error checking of ssh-add exit status the machine could
be made to not suspend unless *after* all keys had been removed.

>  |>|significant additional advantage of needing zero new agent code.
>  |> 
>  |> Oh nooo!  Spoiler!!!
>  |
>  |When in doubt, keep the trusted codebase small.
> 
> Done.
I disagree; every addition grows the codebase.

> I always hate it to look in codebases that i do not know.
I can understand that!

I see that a bit differently: I find it interesting to look in code that
I don't know, because I may learn from it.

You're right that it requires effort, but so does working with my own code.
:)

Kind regards

//Peter

openssh unix dev - Sep 2021 - ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1

ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1

ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1

ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1