Tris Mabbs
2013-Aug-12 22:07 UTC
[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
Good day oh technical ones . I was running Samba 4 (client only, not using it as a DC so effectively running Samba 3 code from the Samba 4 tree) and, other than a little "Gotcha!" regarding decoding Kerberos PACs, it was all working perfectly. Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111" (I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris" and is now "Solaris 11.1") so I recompiled it all up and installed afresh (so no ".tdb"s from the previous installation or anything). But here's a funny thing (#2). The log file gets absolutely ridiculous numbers of messages thus: Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:01.731562, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556423, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556688, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort And so on. These will come in spurts; there won't be any such messages for several minutes then a whole load will come along all at once. Rather like busses . It doesn't seem to be affecting the operation so far as any client is concerned. Or rather it evidently will be having some effect, but it's not a noticeable one. However it is really irritating having the system log getting filled up with all these messages! Murphy's law, of course, states that trying to catch one of these messages being created, so I can include a suitable system call trace in this message, will be impossible - there will be no such messages logged until the instant I click "Send" (at which point probably about half-a-dozen will be logged all at once). That does indeed seem to be the case - I've now been trying to persuade one of these, normally very regularly occurring, messages to be logged for about 20 minutes and still, stubbornly, nothing continues to happen. I will catch "smbd" in the act at some point though, and when I do I'll follow-up with a system call trace to show exactly what is happening when this message gets triggered. It will, of course, be something bizarrely Solaris specific (you didn't set the "SO_DONT_RANDOMLY_ABORT_CONNECTIONS" socket() option, did you? Tsk tsk tsk .). Cheers, Tris.
Tris Mabbs
2013-Aug-13 11:54 UTC
[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
Caught it in the act! As predicted, almost immediately after sending my original message (!) . However unfortunately, it's not particularly useful . ... 16327: write(7, "\0", 1) = 1 16327: setcontext(0xFEFFD690) 16327: waitid(P_ALL, 0, 0xFEFFDD70, WEXITED|WTRAPPED|WNOHANG) = 0 16327: siginfo: SIGCLD CLD_EXITED pid=26937 status=0x0000 16327: waitid(P_ALL, 0, 0xFEFFDD70, WEXITED|WTRAPPED|WNOHANG) = 0 16327: siginfo: SIG#0 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000) = 1 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=0 16327: fd=35 ev=POLLIN|POLLHUP rev=0 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=POLLIN 16327: timeout: 59.999000000 sec 16327: read(6, "\0", 16) = 1 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000) (sleeping...) 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=0 16327: fd=35 ev=POLLIN|POLLHUP rev=0 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=0 16327: timeout: 59.999000000 sec 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000) = 1 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=POLLIN 16327: fd=35 ev=POLLIN|POLLHUP rev=0 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=0 16327: timeout: 59.999000000 sec 16327: accept(37, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) = 41 16327: AF_INET name = X.X.X.X port = 28986 16327: forkx(0) = 26942 16327: lwp_sigmask(SIG_SETMASK, 0x00011080, 0x00000000, 0x00000000, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF] 16327: close(41) = 0 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000) = 1 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=35 ev=POLLIN|POLLHUP rev=POLLIN 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=0 16327: timeout: 44.696000000 sec 16327: accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130 ECONNABORTED 16327: fstat(40, 0xFEFFD1A0) = 0 16327: d=0x08880000 i=22079 m=0020000 l=0 u=0 g=0 rdev=0x02480248 16327: at = Aug 12 17:30:31 BST 2013 [ 1376325031.000000000 ] 16327: mt = Aug 12 17:30:31 BST 2013 [ 1376325031.000000000 ] 16327: ct = Aug 12 17:30:31 BST 2013 [ 1376325031.000000000 ] 16327: bsz=8192 blks=0 fs=BADVFS 16327: time() = 1376345330 16327: getpid() = 16327 [1] 16327: putmsg(40, 0xFEFFDB80, 0xFEFFDB90, 0) = 0 16327: ctl: maxlen=24 len=24 buf=0xFEFFD230: "98 0 tFE\0\010\0".. 16327: 98 0 tFE\0\010\0 XD2FFFE \CF iFE HFA\b\b1B\0\0\0 16327: dat: maxlen=1280 len=151 buf=0xFEFFD250: " A u g 1 2 2".. 16327: A u g 1 2 2 3 : 0 8 : 5 0 s m b d [ 1 6 3 2 7 ] : [ I D 16327: 7 0 2 9 1 1 F A C I L I T Y _ A N D _ P R I O R I T Y ] [ 16327: 2 0 1 3 / 0 8 / 1 2 2 3 : 0 8 : 5 0 . 0 4 3 5 8 2 , 0 ] 16327: . . / s o u r c e 3 / s m b d / s e r v e r . c : 5 5 6 ( s m b 16327: d _ a c c e p t _ c o n n e c t i o n )\n\n\0 ... However hopefully that might be of use to someone - unfortunately it does just look like a Solaris specific nasty. There should be no reason for the abort; everything is directly connected on uncongested gigabit Ethernet, so it's unlikely to be TCP timeout related and I can't see a client aborting the connection before even it's established (that might happen if the clients, or possibly the server were desperately short of IP resources but that isn't going to be the case either). It doesn't appear to be related to the number of connections either, so I can't immediately think of any sensible reason for the connection to abort. What might also be interesting to investigate (though completely unrelated) is why there's an "fstat(40, 0xFEFFD1A0)" when writing the message - descriptor 40 is the socket for the "syslog()" connection, so performing an "fstat()" on it possibly isn't particularly useful (except mebbee to obtain the block size, but that's probably not going to be relevant on a socket anyway) . That's doubtless why the stat structure contains "fs=BADVFS". Not any sort of problem, but curious minds wonder why the code is bothering to "fstat()" the logging socket . Cheers, Tris. From: Tris Mabbs [mailto:TM-Samba201302 at Firstgrade.Co.UK] Sent: 12 August 2013 23:08 To: 'samba at lists.samba.org' Subject: Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort". Good day oh technical ones . I was running Samba 4 (client only, not using it as a DC so effectively running Samba 3 code from the Samba 4 tree) and, other than a little "Gotcha!" regarding decoding Kerberos PACs, it was all working perfectly. Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111" (I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris" and is now "Solaris 11.1") so I recompiled it all up and installed afresh (so no ".tdb"s from the previous installation or anything). But here's a funny thing (#2). The log file gets absolutely ridiculous numbers of messages thus: Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:01.731562, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556423, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556688, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort And so on. These will come in spurts; there won't be any such messages for several minutes then a whole load will come along all at once. Rather like busses . It doesn't seem to be affecting the operation so far as any client is concerned. Or rather it evidently will be having some effect, but it's not a noticeable one. However it is really irritating having the system log getting filled up with all these messages! Murphy's law, of course, states that trying to catch one of these messages being created, so I can include a suitable system call trace in this message, will be impossible - there will be no such messages logged until the instant I click "Send" (at which point probably about half-a-dozen will be logged all at once). That does indeed seem to be the case - I've now been trying to persuade one of these, normally very regularly occurring, messages to be logged for about 20 minutes and still, stubbornly, nothing continues to happen. I will catch "smbd" in the act at some point though, and when I do I'll follow-up with a system call trace to show exactly what is happening when this message gets triggered. It will, of course, be something bizarrely Solaris specific (you didn't set the "SO_DONT_RANDOMLY_ABORT_CONNECTIONS" socket() option, did you? Tsk tsk tsk .). Cheers, Tris.
Tris Mabbs
2013-Aug-25 17:50 UTC
[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
Probably should have posted this to "samba-technical" in the first place, so re-posting in case anyone has any useful ideas . From: Tris Mabbs Sent: 12 August 2013 23:08 To: 'samba at lists.samba.org' Subject: Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort". Good day oh technical ones . I was running Samba 4 (client only, not using it as a DC so effectively running Samba 3 code from the Samba 4 tree) and, other than a little "Gotcha!" regarding decoding Kerberos PACs, it was all working perfectly. Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111" (I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris" and is now "Solaris 11.1") so I recompiled it all up and installed afresh (so no ".tdb"s from the previous installation or anything). But here's a funny thing (#2). The log file gets absolutely ridiculous numbers of messages thus: Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:01.731562, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556423, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556688, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort And so on. These will come in spurts; there won't be any such messages for several minutes then a whole load will come along all at once. Rather like busses . It doesn't seem to be affecting the operation so far as any client is concerned. Or rather it evidently will be having some effect, but it's not a noticeable one. However it is really irritating having the system log getting filled up with all these messages! Murphy's law, of course, states that trying to catch one of these messages being created, so I can include a suitable system call trace in this message, will be impossible - there will be no such messages logged until the instant I click "Send" (at which point probably about half-a-dozen will be logged all at once). That does indeed seem to be the case - I've now been trying to persuade one of these, normally very regularly occurring, messages to be logged for about 20 minutes and still, stubbornly, nothing continues to happen. I will catch "smbd" in the act at some point though, and when I do I'll follow-up with a system call trace to show exactly what is happening when this message gets triggered. It will, of course, be something bizarrely Solaris specific (you didn't set the "SO_DONT_RANDOMLY_ABORT_CONNECTIONS" socket() option, did you? Tsk tsk tsk .). Cheers, Tris.
Apparently Analagous Threads
- Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #1 - "Could not fetch trust account password for domain ...".
- 1.0.rc23 released
- 1.0.rc23 released
- samba4 ldap high load and port queue overflow
- FreeBSD - dovecot: We couldn't drop root group privileges