bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-15 19:23 UTC
[Bug 2598] New: ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 Bug ID: 2598 Summary: ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime Product: Portable OpenSSH Version: 6.9p1 Hardware: amd64 OS: Mac OS X Status: NEW Severity: minor Priority: P5 Component: ssh-agent Assignee: unassigned-bugs at mindrot.org Reporter: mindrot at hda3.com apologies for the vagueness of this report. I add these ssh certs (and keys) to the ssh-agent with a lifetime set to when the cert will expire, eg. 72k seconds. Very occasionally, an ssh-agent process won't actually remove the cert when the timer expires. These are exclusively laptops so my first thought was that maybe the laptop was asleep when the timer expired, but I've had a look through the ssh-agent code and it looks like reaper() checks now >death for every entry. I've also been able to run 'ssh-add -l' (which looks like it forces a call to reaper, presumably expiring all keys with now >= death), and the certs still aren't removed. Is my assumption wrong about reaper() being called every time 'ssh-add -l' is invoked? If it is called every time, is there anyway short of id->death getting set to 0 that a key could dodge removal? I guess it's possible that my ca is actually adding a lifetime that's much longer than I think it is, but I suspect I'd see a lot more if this if that were the case. I'm totally confused. :-/ -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-16 06:15 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 Darren Tucker <dtucker at zip.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dtucker at zip.com.au --- Comment #1 from Darren Tucker <dtucker at zip.com.au> --- WRT suspends: we changed the timer to CLOCK_MONOTIME in this commit: https://anongit.mindrot.org/openssh.git/patch/?id=b759c9c2 so it'll do the right thing over clock steps. We added CLOCK_BOOTTIME in https://anongit.mindrot.org/openssh.git/patch/?id=795b8631 but that was added in 6.7, so it shouldn't be a problem with the 6.9 version you're using. reaper is being called when you run ssh-add -l, but after the request is process and before it runs select() again. I could imagine a timer expiring while the system is asleep and the key not being removed until after the next time it processes a request. If that's the case, the key would only show up the first time you ran ssh-add -l. Failing that, if we add some debugging could you capture the logs for one of the problem cases? -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-16 06:22 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #2 from Darren Tucker <dtucker at zip.com.au> --- oh, mac os. Does that have CLOCK_BOOTTIME? -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-16 16:59 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #3 from Peter Moody <mindrot at hda3.com> --- osx doesn't appear to have clock_gettime(2), and I can't find any references to CLOCK_BOOTTIME in /usr/include. if monotime() is just using time(2), does that help isolate the issue? If you give me a patch and an idea of what steps you think might tickle this bug (eg. multiple keys, some constrained and some not, keys expiring when the laptop is sleeping, etc), I'd be happy to test. I do have one user that this has happened to twice in the last week and I might be able get him to replace his ssh-agent and see if something about his regular workflow just tickles this bug. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-16 17:03 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #4 from Peter Moody <mindrot at hda3.com> --- one other thing. If I were to run ssh-agent -d and then suspend and resume the process, does that make ssh-agent behave in a similar manner to if the machine hibernated? Or do you think there might be something special about being flushed to disk, etc? I'm just trying to think of ways to test this on a machine which so far hasn't had this happen. Cheers, peter -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-17 08:49 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #5 from Darren Tucker <dtucker at zip.com.au> --- (In reply to Peter Moody from comment #3)> osx doesn't appear to have clock_gettime(2), and I can't find any > references to CLOCK_BOOTTIME in /usr/include. if monotime() is just > using time(2), does that help isolate the issue?maybe, I'll have a think about it.> If you give me a patch and an idea of what steps you think might > tickle this bug (eg. multiple keys, some constrained and some not, > keys expiring when the laptop is sleeping, etc), I'd be happy to > test. > > I do have one user that this has happened to twice in the last week > and I might be able get him to replace his ssh-agent and see if > something about his regular workflow just tickles this bug.(In reply to Peter Moody from comment #4)> one other thing. If I were to run ssh-agent -d and then suspend and > resume the process, does that make ssh-agent behave in a similar > manner to if the machine hibernated?It's worth a try but my guess is that it won't.> Or do you think there might be > something special about being flushed to disk, etc? I'm just trying > to think of ways to test this on a machine which so far hasn't had > this happen.I doubt it's disk flushes have anything to do with it (I don't think ssh-agent even includes stderr in the descriptors it selects on. A couple of questions: - when it happens, if you run ssh-add -l twice are the keys present in both? - is there anything else going on with clocks, eg ntpd? if so, are there any clock steps logged? -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-17 09:49 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #6 from Darren Tucker <dtucker at zip.com.au> --- Created attachment 2852 --> https://bugzilla.mindrot.org/attachment.cgi?id=2852&action=edit add debugging to ssh-agent This patch against 7.2p2 adds some more debugging to ssh-agent, including outputting the current time() and monotime() values on every log line. I'm not sure what to look for so I'm not sure if it'll be enough. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-18 17:30 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #7 from Peter Moody <mindrot at hda3.com> --- Thanks, Darren. I'm running the patched ssh-agent now. fwiw, I apparently *can* repro this on my machine (I'd only gotten reports of this from other people before). this is on my system-provided agent. $ usshcertstatus ussh cert good for -50h-16m waiting for the cert/key to expire on the patched version now $ env SSH_AUTH_SOCK=/tmp/ssh.sock usshcertstatus ussh cert good for 19h56m> - when it happens, if you run ssh-add -l twice are the keys present in both?yes $ ssh-add -l 2048 SHA256:xXX0cRWdec7IA43C0cSF+Y9JrKul2JBzgXk28NMLfEU [Valid until Sat 16 Jul 2016 15:01 UTC, Version 2] (RSA-CERT) 2048 SHA256:xXX0cRWdec7IA43C0cSF+Y9JrKul2JBzgXk28NMLfEU [Valid until Sat 16 Jul 2016 15:01 UTC, Version 2] (RSA) $ ssh-add -l 2048 SHA256:xXX0cRWdec7IA43C0cSF+Y9JrKul2JBzgXk28NMLfEU [Valid until Sat 16 Jul 2016 15:01 UTC, Version 2] (RSA-CERT) 2048 SHA256:xXX0cRWdec7IA43C0cSF+Y9JrKul2JBzgXk28NMLfEU [Valid until Sat 16 Jul 2016 15:01 UTC, Version 2] (RSA) $ ssh-add -l 2048 SHA256:xXX0cRWdec7IA43C0cSF+Y9JrKul2JBzgXk28NMLfEU [Valid until Sat 16 Jul 2016 15:01 UTC, Version 2] (RSA-CERT) 2048 SHA256:xXX0cRWdec7IA43C0cSF+Y9JrKul2JBzgXk28NMLfEU [Valid until Sat 16 Jul 2016 15:01 UTC, Version 2] (RSA) $ ssh-add -l 2048 SHA256:xXX0cRWdec7IA43C0cSF+Y9JrKul2JBzgXk28NMLfEU [Valid until Sat 16 Jul 2016 15:01 UTC, Version 2] (RSA-CERT) 2048 SHA256:xXX0cRWdec7IA43C0cSF+Y9JrKul2JBzgXk28NMLfEU [Valid until Sat 16 Jul 2016 15:01 UTC, Version 2] (RSA) $ usshcertstatus ussh cert good for -50h-21m> - is there anything else going on with clocks, eg ntpd? if so, are there any clock steps logged?I don't see any likely ntp errors in the logs. The only slight weirdness with my setup here is that I believe my laptop was asleep for most of the weekend. it looks like this key/cert pair should've been removed on 16 July at ~08.25 my latop diagnostic logs go from 16 July at 4.57 to 16 July 9.44. time(NULL) couldn't wrapping around, could it .. ? Anyway, I'll let you know the results from the instrumented ssh-agent. Thanks! -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-18 18:39 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #8 from Peter Moody <mindrot at hda3.com> --- adding a key with a 60 second lifetime and suspending for 40 minutes didn't work. debug1[1468864204,1468864204]: type 25 debug3[1468864204,1468864204]: constrain lifetime key '/Users/pmoody/.ssh/uber_rsa' seconds 60 death time 1468864264 debug3[1468864204,1468864204]: key '/Users/pmoody/.ssh/uber_rsa' death time 1468864264 debug3[1468864204,7237125641618659020]: reaper deadline 60 debug3[1468864204,140734662252236]: key '/Users/pmoody/.ssh/uber_rsa' death time 1468864264 debug3[1468864204,140734662252236]: key '/Users/pmoody/.ssh/uber_rsa' death time 1468864264 debug3[1468864204,3419202321690465996]: reaper deadline 60 debug3[1468864204,140734662252236]: key '/Users/pmoody/.ssh/uber_rsa' death time 1468864264 debug1[1468864204,3419202321690465996]: XXX shrink: 3 < 4 debug3[1468864204,140734662252236]: key '/Users/pmoody/.ssh/uber_rsa' death time 1468864264 debug3[1468864204,3419202321690465996]: reaper deadline 60 './ssh-agent -d -a /tmp/ssh.sock' has stopped $ fg Send job 1, './ssh-agent -d -a /tmp/ssh.sock' to foreground debug1[1468866660,140734662254692]: expiring key '/Users/pmoody/.ssh/uber_rsa' debug3[1468866660,8747515637889048676]: reaper deadline 0 I'll let you know what the long lived key does tomorrow. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-19 03:47 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #9 from Darren Tucker <dtucker at zip.com.au> --- (In reply to Peter Moody from comment #7) [...]> this is on my system-provided agent.Note that if your problem occurs only with the ssh-agent supplied with the OS, Apple makes a number of changes to ssh-agent to make it work with launchd and the keychain. The sources are apparently here: http://opensource.apple.com//source/OpenSSH/ although I don't know how to map what you might be running to the corresponding tree.> $ usshcertstatus > ussh cert good for -50h-16mwhat's "usshcertatus" ? [...]> Anyway, I'll let you know the results from the instrumented > ssh-agent. Thanks!Good luck. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-19 04:00 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #10 from Darren Tucker <dtucker at zip.com.au> --- (In reply to Peter Moody from comment #8)> debug3[1468864204,7237125641618659020]: reaper deadline 60Sigh. I missed a cast so the second time value is bogus. I think from the earlier ones the values should always be equal so it's not worth stopping your experiment for, but if you run another you might want to use this updated diff. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-19 04:02 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 Darren Tucker <dtucker at zip.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #2852|0 |1 is obsolete| | --- Comment #11 from Darren Tucker <dtucker at zip.com.au> --- Created attachment 2854 --> https://bugzilla.mindrot.org/attachment.cgi?id=2854&action=edit add debugging to ssh-agent -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-19 15:20 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #12 from Peter Moody <mindrot at hda3.com> --- The key didn't expire this morning with vanilla 6.9p1. attaching the agent.log now. I'll try your new patch against HEAD. $ env SSH_AUTH_SOCK=/tmp/ssh.sock ssh-add -l 2048 SHA256:6kWrXTlSBCyHqfiPJHm5teVHIP0wH+RDWrVFk3FB4DE [Valid until Tue 19 Jul 2016 13:17 UTC, Version 2] (RSA-CERT) 2048 SHA256:6kWrXTlSBCyHqfiPJHm5teVHIP0wH+RDWrVFk3FB4DE [Valid until Tue 19 Jul 2016 13:17 UTC, Version 2] (RSA) $ env SSH_AUTH_SOCK=/tmp/ssh.sock ssh-add -l 2048 SHA256:6kWrXTlSBCyHqfiPJHm5teVHIP0wH+RDWrVFk3FB4DE [Valid until Tue 19 Jul 2016 13:17 UTC, Version 2] (RSA-CERT) 2048 SHA256:6kWrXTlSBCyHqfiPJHm5teVHIP0wH+RDWrVFk3FB4DE [Valid until Tue 19 Jul 2016 13:17 UTC, Version 2] (RSA) $ date -u Tue Jul 19 14:20:00 UTC 2016 # about 30 minutes later $ env SSH_AUTH_SOCK=/tmp/ssh.sock usshcertstatus ussh cert good for -1h-56m> what's "usshcertatus" ?we call the ssh cert stuff, "ussh" (uber-ssh). usshcertstatus prints ssh-keygen -L -f type information for any certs signed by our CA. sort of like prodcertstatus for prodaccess. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-19 15:21 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #13 from Peter Moody <mindrot at hda3.com> --- Created attachment 2855 --> https://bugzilla.mindrot.org/attachment.cgi?id=2855&action=edit ssh-agent log -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-19 15:43 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 --- Comment #14 from Peter Moody <mindrot at hda3.com> --- gah, this could all be PEBKAC. it looks like my non-expiring keys are all being added as unconstrained (type 17) for some reason. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-19 16:05 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 Peter Moody <mindrot at hda3.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID Status|NEW |RESOLVED --- Comment #15 from Peter Moody <mindrot at hda3.com> --- Alright, this can be closed. the bug is somewhere I my code :( sorry! -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Aug-02 00:41 UTC
[Bug 2598] ssh-agent very occasionally won't remove keys or certs despite now() >= lifetime
https://bugzilla.mindrot.org/show_bug.cgi?id=2598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED --- Comment #16 from Damien Miller <djm at mindrot.org> --- Close all resolved bugs after 7.3p1 release -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.