bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-01 19:21 UTC
[Bug 3049] New: ssh startup time is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Bug ID: 3049 Summary: ssh startup time is linear to _SC_OPEN_MAX Product: Portable OpenSSH Version: -current Hardware: All OS: Mac OS X Status: NEW Severity: normal Priority: P5 Component: ssh Assignee: unassigned-bugs at mindrot.org Reporter: likan_999.student at sina.com Created attachment 3305 --> https://bugzilla.mindrot.org/attachment.cgi?id=3305&action=edit Patch against https://github.com/openssh/openssh-portable In our company, on our MacBook we had to raise the _SC_OPEN_MAX to a high value (currently 10M), ever since the ssh becomes very slow. By digging into this, we found it spends 5 seconds just to close all file descriptors. On Linux, it uses /proc/fd so it is relatively fast. On OSX, there is no proc fs, therefore it tries all file descriptors from 3 to _SC_OPEN_MAX. I have a patch, attached below, for portable-ssh, that adds the particular implementation for OSX, to utilize proc_fdinfo to enumerate all file descriiptors of the process. Please review it to see if it is appropriate to be merged upstream. Thanks. -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-01 20:01 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 likan_999.student at sina.com changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|ssh startup time is linear |ssh startup time on OSX is |to _SC_OPEN_MAX |linear to _SC_OPEN_MAX -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-01 23:03 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Darren Tucker <dtucker at dtucker.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dtucker at dtucker.net Attachment #3305|application/octet-stream |text/plain mime type| | Attachment #3305|0 |1 is patch| | -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 00:21 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |2988 CC| |djm at mindrot.org Referenced Bugs: https://bugzilla.mindrot.org/show_bug.cgi?id=2988 [Bug 2988] Tracking bug for 8.1 release -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 01:01 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3305|0 |1 is obsolete| | Status|NEW |ASSIGNED Assignee|unassigned-bugs at mindrot.org |djm at mindrot.org Attachment #3306| |ok?(dtucker at dtucker.net) Flags| | --- Comment #1 from Damien Miller <djm at mindrot.org> --- Created attachment 3306 --> https://bugzilla.mindrot.org/attachment.cgi?id=3306&action=edit refactor, add fallback Thanks for the patch. I think there were two problems in your original implementation: 1) if the fd table size changed between the size-query proc_pidinfo() and the call to obtain the fd map then the code would just bail out without closing anything 2) likewise, for other errors (e.g. malloc failure) - it's better to fall back to brute-force closing all fds than doing nothing. I've refactored the closefrom implementation to do both these, with a closefrom_fallback() doing the brute-force closure, and sharing this with the /proc/pid/fd strategy. I'm not in from of an OSX machine right now, so please test. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 01:32 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #2 from Darren Tucker <dtucker at dtucker.net> --- Comment on attachment 3306 --> https://bugzilla.mindrot.org/attachment.cgi?id=3306 refactor, add fallback>+ AC_CHECK_HEADERS([libproc.h], [ >+ AC_CHECK_DECLS([proc_pidinfo], [], [], [#include <libproc.h>])this only checks that the function is declared, not that it's actually present. conceivably this could activate this code but fail at link time. otherwise looks ok. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 05:54 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #3 from likan_999.student at sina.com --- Thanks for reviewing it. Good catch, the error case should fallback to original implementation. For detecting table size change, I was previously thinking the function will be called in single thread situation (such as in ssh, at beginning) thus the fd table is not possible to change. If we want to make it robust enough to handle the case where another thread opens/closes the files between the two calls to proc_fdinfo (do we have such case for openssh codebase?), it is incorrect the way the refactored patch detects the table change. Here is the source code of proc_pidinfo: https://opensource.apple.com/source/xnu/xnu-2050.7.9/bsd/kern/proc_info.c>From the code (function proc_pidfdlist), we can see if bufsize is notlarge enough to hold all file descriptors, it only returns as many as possible to be held in the buffer, without reporting any error, thus won't be detected by comparing return value with NULL (actually, it is not a good style to compare buffer size with NULL). If we really want to detect the change of file descriptors, we need to compare with the previous buffer size, if it is the same as previous value, then it is possible new ones are added, thus we need to retry. But IMHO, is this really needed? If yes, we also need to retry in the fallback method. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 06:00 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #4 from likan_999.student at sina.com --- Comment on attachment 3306 --> https://bugzilla.mindrot.org/attachment.cgi?id=3306 refactor, add fallback Also, the line 129, after closefrom_fallback is called, it should return, otherwise fdinfo_buf==NULL but dereferenced at line 132, which will crash the process. After line 123, I guess you miss a 'continue'? Otherwise it won't ever retry, right? -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 06:42 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3306|ok?(dtucker at dtucker.net) | Flags| | Attachment #3306|0 |1 is obsolete| | --- Comment #5 from Damien Miller <djm at mindrot.org> --- Created attachment 3307 --> https://bugzilla.mindrot.org/attachment.cgi?id=3307&action=edit attempt #2 revised patch -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 07:49 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #6 from likan_999.student at sina.com --- Comment on attachment 3307 --> https://bugzilla.mindrot.org/attachment.cgi?id=3307 attempt #2 Online 129, you miss a semicolon after return, which makes it fail to compile. Also, we need to have "got < need" instead of "got <= need". Firstly, got == need means the buffer given to proc_pidinfo is full, so it is possible there are more file descriptors added, thus we need to retry in this case. Secondly, proc_pidinfo, if given NULL buffer, will return 20 more entries than the actual file table size, so in normal cases, the 'got < need' code path will be executed. In case got == need, ideally we should close file descriptors in the buffer returned, before we retry. In this way, the next retry will have far less number of elements to work on. Otherwise, if another thread keep adding new file descriptors, the next retry will have more file descriptors than previous retry, causing the proc_pidinfo syscall as well as malloc to be even slower, making it even lower chance that there is no file table changed between two proc_pidinfo syscalls. Again, I'd like to ask, in which case this closefrom is called while another thread is opening new file descriptors? The retry logic is complicated now, do we really need to introduce such a complicated logic to handle such a corner case? My understanding is, neither F_CLOSEM nor fallback implementation guarantees anything if there is another thread keep opening new file descriptors, therefore this closefrom is just on best effort basis. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-02 08:24 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 --- Comment #7 from Darren Tucker <dtucker at dtucker.net> --- I don't think the retries are worth it in this context either. I'm not sure it's even possible for an FD to change under it, but if so this could be handled by the fallback. Also noticed this warning this because "got" gets promoted: bsd-closefrom.c:132:16: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] for (i = 0; i < got / PROC_PIDLISTFD_SIZE; i++) { ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~ $ grep -r PROC_PIDLISTFD_SIZE /usr/include /usr/include//sys/proc_info.h:#define PROC_PIDLISTFD_SIZE (sizeof(struct proc_fdinfo)) -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-09 06:27 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3307|0 |1 is obsolete| | Attachment #3309| |ok?(dtucker at dtucker.net) Flags| | --- Comment #8 from Damien Miller <djm at mindrot.org> --- Created attachment 3309 --> https://bugzilla.mindrot.org/attachment.cgi?id=3309&action=edit revised again ok, I've got my mac running again. Here's a revision that removes the retry loop and just goes straight to the fallback when anything goes wrong. It seems to close the right stuff when inspected with dtruss. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-09 06:56 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Darren Tucker <dtucker at dtucker.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3309|ok?(dtucker at dtucker.net) |ok+ Flags| | -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-30 03:24 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution|--- |FIXED --- Comment #9 from Damien Miller <djm at mindrot.org> --- patch applied and will be in OpenSSH 8.1 - thanks! -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Aug-30 22:26 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 likan_999.student at sina.com changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|FIXED |--- Status|RESOLVED |REOPENED --- Comment #10 from likan_999.student at sina.com --- (In reply to Damien Miller from comment #9)> patch applied and will be in OpenSSH 8.1 - thanks!Hi Damien, thanks for the patch. Sorry I didn't reply earlier, in the patch, there are a few problems I can observe here: `if (r < 0 || r >= sz)`: it should be `if (r < 0 || r > sz)` because `r==sz` is still valid. `sz / (int)PROC_PIDLISTFD_SIZE` should be changed to `r / (int)PROC_PIDLISTFD_SIZE` because entries between r and sz are invalid. These are not anything urgent but it's nice to fix them. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2019-Sep-02 00:29 UTC
[Bug 3049] ssh startup time on OSX is linear to _SC_OPEN_MAX
https://bugzilla.mindrot.org/show_bug.cgi?id=3049 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|REOPENED |RESOLVED --- Comment #11 from Damien Miller <djm at mindrot.org> --- applied - thanks -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.