bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-01 19:21 UTC
[Bug 3049] New: ssh startup time is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Bug ID: 3049
Summary: ssh startup time is linear to _SC_OPEN_MAX
Product: Portable OpenSSH
Version: -current
Hardware: All
OS: Mac OS X
Status: NEW
Severity: normal
Priority: P5
Component: ssh
Assignee: unassigned-bugs at mindrot.org
Reporter: likan_999.student at sina.com
Created attachment 3305
--> https://bugzilla.mindrot.org/attachment.cgi?id=3305&action=edit
Patch against https://github.com/openssh/openssh-portable
In our company, on our MacBook we had to raise the _SC_OPEN_MAX to a
high value (currently 10M), ever since the ssh becomes very slow. By
digging into this, we found it spends 5 seconds just to close all file
descriptors. On Linux, it uses /proc/fd so it is relatively fast. On
OSX, there is no proc fs, therefore it tries all file descriptors from
3 to _SC_OPEN_MAX.
I have a patch, attached below, for portable-ssh, that adds the
particular implementation for OSX, to utilize proc_fdinfo to enumerate
all file descriiptors of the process. Please review it to see if it is
appropriate to be merged upstream. Thanks.
--
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-01 20:01 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
likan_999.student at sina.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|ssh startup time is linear |ssh startup time on OSX is
|to _SC_OPEN_MAX |linear to _SC_OPEN_MAX
--
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-01 23:03 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Darren Tucker <dtucker at dtucker.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dtucker at dtucker.net
Attachment #3305|application/octet-stream |text/plain
mime type| |
Attachment #3305|0 |1
is patch| |
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 00:21 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |2988
CC| |djm at mindrot.org
Referenced Bugs:
https://bugzilla.mindrot.org/show_bug.cgi?id=2988
[Bug 2988] Tracking bug for 8.1 release
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 01:01 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3305|0 |1
is obsolete| |
Status|NEW |ASSIGNED
Assignee|unassigned-bugs at mindrot.org |djm at mindrot.org
Attachment #3306| |ok?(dtucker at dtucker.net)
Flags| |
--- Comment #1 from Damien Miller <djm at mindrot.org> ---
Created attachment 3306
--> https://bugzilla.mindrot.org/attachment.cgi?id=3306&action=edit
refactor, add fallback
Thanks for the patch.
I think there were two problems in your original implementation:
1) if the fd table size changed between the size-query proc_pidinfo()
and the call to obtain the fd map then the code would just bail out
without closing anything
2) likewise, for other errors (e.g. malloc failure) - it's better to
fall back to brute-force closing all fds than doing nothing.
I've refactored the closefrom implementation to do both these, with a
closefrom_fallback() doing the brute-force closure, and sharing this
with the /proc/pid/fd strategy. I'm not in from of an OSX machine right
now, so please test.
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 01:32 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #2 from Darren Tucker <dtucker at dtucker.net> --- Comment on attachment 3306 --> https://bugzilla.mindrot.org/attachment.cgi?id=3306 refactor, add fallback>+ AC_CHECK_HEADERS([libproc.h], [ >+ AC_CHECK_DECLS([proc_pidinfo], [], [], [#include <libproc.h>])this only checks that the function is declared, not that it's actually present. conceivably this could activate this code but fail at link time. otherwise looks ok. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 05:54 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #3 from likan_999.student at sina.com --- Thanks for reviewing it. Good catch, the error case should fallback to original implementation. For detecting table size change, I was previously thinking the function will be called in single thread situation (such as in ssh, at beginning) thus the fd table is not possible to change. If we want to make it robust enough to handle the case where another thread opens/closes the files between the two calls to proc_fdinfo (do we have such case for openssh codebase?), it is incorrect the way the refactored patch detects the table change. Here is the source code of proc_pidinfo: https://opensource.apple.com/source/xnu/xnu-2050.7.9/bsd/kern/proc_info.c>From the code (function proc_pidfdlist), we can see if bufsize is notlarge enough to hold all file descriptors, it only returns as many as possible to be held in the buffer, without reporting any error, thus won't be detected by comparing return value with NULL (actually, it is not a good style to compare buffer size with NULL). If we really want to detect the change of file descriptors, we need to compare with the previous buffer size, if it is the same as previous value, then it is possible new ones are added, thus we need to retry. But IMHO, is this really needed? If yes, we also need to retry in the fallback method. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 06:00 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #4 from likan_999.student at sina.com --- Comment on attachment 3306 --> https://bugzilla.mindrot.org/attachment.cgi?id=3306 refactor, add fallback Also, the line 129, after closefrom_fallback is called, it should return, otherwise fdinfo_buf==NULL but dereferenced at line 132, which will crash the process. After line 123, I guess you miss a 'continue'? Otherwise it won't ever retry, right? -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 06:42 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3306|ok?(dtucker at dtucker.net) |
Flags| |
Attachment #3306|0 |1
is obsolete| |
--- Comment #5 from Damien Miller <djm at mindrot.org> ---
Created attachment 3307
--> https://bugzilla.mindrot.org/attachment.cgi?id=3307&action=edit
attempt #2
revised patch
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 07:49 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #6 from likan_999.student at sina.com --- Comment on attachment 3307 --> https://bugzilla.mindrot.org/attachment.cgi?id=3307 attempt #2 Online 129, you miss a semicolon after return, which makes it fail to compile. Also, we need to have "got < need" instead of "got <= need". Firstly, got == need means the buffer given to proc_pidinfo is full, so it is possible there are more file descriptors added, thus we need to retry in this case. Secondly, proc_pidinfo, if given NULL buffer, will return 20 more entries than the actual file table size, so in normal cases, the 'got < need' code path will be executed. In case got == need, ideally we should close file descriptors in the buffer returned, before we retry. In this way, the next retry will have far less number of elements to work on. Otherwise, if another thread keep adding new file descriptors, the next retry will have more file descriptors than previous retry, causing the proc_pidinfo syscall as well as malloc to be even slower, making it even lower chance that there is no file table changed between two proc_pidinfo syscalls. Again, I'd like to ask, in which case this closefrom is called while another thread is opening new file descriptors? The retry logic is complicated now, do we really need to introduce such a complicated logic to handle such a corner case? My understanding is, neither F_CLOSEM nor fallback implementation guarantees anything if there is another thread keep opening new file descriptors, therefore this closefrom is just on best effort basis. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 08:24 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
--- Comment #7 from Darren Tucker <dtucker at dtucker.net> ---
I don't think the retries are worth it in this context either. I'm not
sure it's even possible for an FD to change under it, but if so this
could be handled by the fallback.
Also noticed this warning this because "got" gets promoted:
bsd-closefrom.c:132:16: warning: comparison of integers of different
signs:
'int' and 'unsigned long' [-Wsign-compare]
for (i = 0; i < got / PROC_PIDLISTFD_SIZE; i++) {
~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
$ grep -r PROC_PIDLISTFD_SIZE /usr/include
/usr/include//sys/proc_info.h:#define PROC_PIDLISTFD_SIZE
(sizeof(struct proc_fdinfo))
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-09 06:27 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3307|0 |1
is obsolete| |
Attachment #3309| |ok?(dtucker at dtucker.net)
Flags| |
--- Comment #8 from Damien Miller <djm at mindrot.org> ---
Created attachment 3309
--> https://bugzilla.mindrot.org/attachment.cgi?id=3309&action=edit
revised again
ok, I've got my mac running again. Here's a revision that removes the
retry loop and just goes straight to the fallback when anything goes
wrong. It seems to close the right stuff when inspected with dtruss.
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-09 06:56 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Darren Tucker <dtucker at dtucker.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3309|ok?(dtucker at dtucker.net) |ok+
Flags| |
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-30 03:24 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #9 from Damien Miller <djm at mindrot.org> ---
patch applied and will be in OpenSSH 8.1 - thanks!
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-30 22:26 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
likan_999.student at sina.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|FIXED |---
Status|RESOLVED |REOPENED
--- Comment #10 from likan_999.student at sina.com ---
(In reply to Damien Miller from comment #9)> patch applied and will be in OpenSSH 8.1 - thanks!
Hi Damien, thanks for the patch. Sorry I didn't reply earlier, in the
patch, there are a few problems I can observe here:
`if (r < 0 || r >= sz)`: it should be `if (r < 0 || r > sz)` because
`r==sz` is still valid.
`sz / (int)PROC_PIDLISTFD_SIZE` should be changed to `r /
(int)PROC_PIDLISTFD_SIZE` because entries between r and sz are invalid.
These are not anything urgent but it's nice to fix them.
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Sep-02 00:29 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|REOPENED |RESOLVED
--- Comment #11 from Damien Miller <djm at mindrot.org> ---
applied - thanks
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.