jaroslav at thinline.cz
2022-Mar-05 12:25 UTC
doveadm quota get hangs for 60 seconds (some analysis done)
Hello, I encountered somewhat strange behaviour when running doveadm quota get with wildcard username (all usernames are in user at domain format): doveadm -f pager quota get -u *@example.domain The command hangs for 60 seconds before returning its result. I tried to dig into the issue a bit using strace and found out that there seems to be some kind of communication mismatch between Dovecot processes. Here is what I found - sorry for the wall of text, strace outputs tend to be large, I tried to reduce them as much as possible. I also tried to draw some conclusions below. This is from strace in the doveadm process, it connects to auth-userdb and requests list of users 1646473083.263021 connect(10, {sa_family=AF_UNIX, sun_path="/run/dovecot/auth-userdb"}, 110) = 0 1646473083.263627 write(10, "VERSION\t1\t0\nLIST\t1\tuser=*@example.domain\n", 47) = 47 1646473083.263807 read(10, "VERSION\t1\t1\nSPID\t30012\n", 8192) = 23 1646473084.188212 read(10, "LIST\t1\tuser1 at notthat.domain\nLIST\t1\tuser2 at another.domain\nLIST\t1\tuser3 at also.not\nLIST\t1\t.... This list goes on, the server has over 10000 user accounts. (As a side note, this behaviour struck me as strange - I would expect that the result would be filtered by the server side of the connection, only listing users that match the requested wildcard. I don't know anything about Dovecot internals though, so this might be perfectly normal.) At some point the returned data contain users doveadm is interested in (ie. users that match the wildcard) and requests their information from other Dovecot services, this is an excerpt for a dict service: 1646473084.224608 connect(13, {sa_family=AF_UNIX, sun_path="/run/dovecot/dict"}, 110) = 0 1646473084.225617 write(13, "H2\t2\t0\trelevantuser at example.domain\tsqldomainquota\n", 50) = 50 No data are read from FD 10 during this time. When all users are queried, reading from FD 10 resumes 1646473084.274405 read(10, "ifth.domain\nLIST\t1\tuser987 at domain.six\nLIST\t1\t... At this point though, the process gets stuck: 1646473084.274514 epoll_wait(11, [], 1, 0) = 0 1646473084.274555 read(10, 0x5620076d79dd, 6035) = -1 EAGAIN (Resource temporarily unavailable) 1646473084.274601 epoll_wait(11, [{EPOLLIN, {u32=124338208, u64=94695563280416}}], 1, 155000) = 1 It resumes after 60 seconds, reads some more users and then receives string "DONE" 1646473144.287149 read(10, "LIST\t1\tuser456 at domain.seven\nLIST\t1\tuser123 at domain.e8\nLIST\t1\tuser7865 at dom.nine\n", 6035) = 99 1646473144.287149 read(10, "LIST\t1\t ... ", 5936) = 82 ... 1646473144.294373 epoll_wait(11, [{EPOLLIN, {u32=124338208, u64=94695563280416}}], 1, 155000) = 1 1646473144.307846 read(10, "DONE\t1\t\n", 1611) = 8 Note that at this point the read syscall is no longer trucated. As opposed to reads above, which amounted to 8kB blocks, these are reading few addresses at a time. After the "DONE" string is received, there is no more communication with other Dovecot processes, doveadm prints out results to the console and terminates. Next I investigated the auth process. 1646473083.263121 accept(11, {sa_family=AF_UNIX}, [28->2]) = 23 1646473083.263181 getsockname(23, {sa_family=AF_UNIX, sun_path="/run/dovecot/auth-userdb"}, [28->27]) = 0 1646473083.263690 read(23, "VERSION\t1\t0\nLIST\t1\tuser=*@example.domain\n", 1024) = 47 1646473083.263939 connect(24, {sa_family=AF_UNIX, sun_path="auth-worker"}, 110) = 0 1646473083.264228 write(24, "VERSION\tauth-worker\t1\t0\nDBHASH\t1234567890abcdef1234567890abcdef\tfedcba0987654321fedcba0987654321\n", 97) = 97 1646473083.264271 writev(24, [{iov_base="1\t", iov_len=2}, {iov_base="LIST\t1\tuser=*@example.domain\tservice=\toriginal-username", iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64 1646473084.187567 read(24, "1\t*\tuser1 at notthat.domain\n1\t*\t... 1646473084.187977 write(23, "LIST\t1\tuser1 at notthat.domain\nLIST\t1\t Reads and writes alternate until these syscalls: 1646473084.238938 read(24, " ... \n1\tOK\n", 8167) = 5354 1646473084.239108 write(23, "LIST\t1\t ... , 51224) = -1 EAGAIN (Resource temporarily unavailable) 1646473084.239536 write(23, "LIST\t1\t ... , 51224) = -1 EAGAIN (Resource temporarily unavailable) From the timing this looks like a kernel buffer for the connection is filled because the other side is not reading the data. This matches strace from doveadm which is at this point reading information about quotas from dict service (timestamp 1646473084.225617.) Auth process seems to react to this by putting the FD on epoll list and then it goes to handle other connections. FD 24 is removed from the epoll list at the same time: 1646473084.239926 epoll_ctl(14, EPOLL_CTL_ADD, 23, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=109861328, u64=94566699784656}}) = 0 1646473084.239980 epoll_ctl(14, EPOLL_CTL_DEL, 24, 0x7ffca3a9ae2c) = 0 FD 23 becomes writable relatively quickly, pending data is written to it and read from FD 24 is attempted with no data available: 1646473084.270224 epoll_ctl(14, EPOLL_CTL_DEL, 23, 0x7ffca3a9ae5c) = 0 1646473084.270271 write(23, "LIST\t1\t ... , 51224) = 51224 1646473084.270620 epoll_ctl(14, EPOLL_CTL_ADD, 24, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=109860944, u64=94566699784272}}) = 0 1646473084.270712 read(24, 0x5602069198e3, 2813) = -1 EAGAIN (Resource temporarily unavailable) Again, nothing else happens for 60 seconds until FD 24 signals what I presume is a connection close attempt. Note that there was no data read from FD 24 but when the "SHUTDOWN" string arrives from it, auth process starts writing items to FD 23, one by one. 1646473144.286936 read(24, "SHUTDOWN\n", 2813) = 9 1646473144.287003 write(23, "LIST\t1\tuser456 at domain.seven\n", 29) = 29 1646473144.287059 write(23, "LIST\t1\tuser123 at domain.e8\n", 31) = 31 1646473144.287103 write(23, "LIST\t1\tuser7865 at dom.nine\n", 39) = 39 There are multiple user databases, I assume the following is a lookup from the others: 1646473144.294256 writev(24, [{iov_base="2\t", iov_len=2}, {iov_base="LIST\t2\tuser=*@example.domain\tservice=\toriginal-username", iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64 1646473144.294379 read(24, 0x5602069198ec, 2804) = -1 EAGAIN (Resource temporarily unavailable) 1646473144.300490 read(24, "2\tOK\n", 2804) = 5 1646473144.300556 writev(24, [{iov_base="3\t", iov_len=2}, {iov_base="LIST\t3\tuser=*@example.domain\tservice=\toriginal-username", iov_len=61}, {iov_base="\n", iov_len=1}], 3) = 64 1646473144.300653 read(24, 0x5602069198f1, 2799) = -1 EAGAIN (Resource temporarily unavailable) 1646473144.307687 read(24, "3\tOK\n", 2799) = 5 1646473144.307751 write(23, "DONE\t1\t\n", 8) = 8 1646473144.307944 close(24) = 0 1646473144.308092 read(23, "", 977) = 0 1646473144.308185 close(23) = 0 I also looked into the auth-userdb (auth -w) service, it seems to mirror what happens in the auth process: [pid 22387] 1646473083.407660 accept(7, <unfinished ...> [pid 22387] 1646473083.407767 <... accept resumed>{sa_family=AF_UNIX}, [28->2]) = 17 [pid 22387] 1646473084.187096 write(17, "1\t*\tuser1 at notthat.domain\n1\t*\t [pid 22387] 1646473084.233120 write(17, "1\tOK\n", 5 <unfinished ...> [pid 22387] 1646473084.233577 epoll_ctl(9, EPOLL_CTL_ADD, 17, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=659542480, u64=94356796068304}} <unfinished ...> [pid 22387] 1646473084.234006 read(17, <unfinished ...> [pid 22387] 1646473084.234254 <... read resumed>0x55d12753b7d1, 8031) = -1 EAGAIN (Resource temporarily unavailable) And after 60 seconds: [pid 22387] 1646473144.286843 write(17, "SHUTDOWN\n", 9 <unfinished ...> At which point the other two requests come in and are replied to. As I said above, I don't know anything about Dovecot internals but from I can gather from these outputs, all data are exchanged correctly and nothing gets stuck reading them from a permanent storage or alike. This bug doesn't manifest every time but seems to be fairly deterministic in that it always triggers for some domains but not for others. From what I've noticed, this affected domain receives all its users in single 8k read in the doveadm process, which in turns triggers (relatively) lengthy lookups and forces the auth process to put more writes on hold (at timestamp 1646473084.239108) Is it possible that when this happens, the "OK" string received from auth-userdb process (timestamp 1646473084.238938) is overlooked and auth process is waiting for more data from auth-userdb while auth-userdb waits for another request? The server itself runs Dovecot 1:2.3.13+dfsg1-2 from Debian repository on amd64. Output from dovecot --version is 2.3.13 (89f716dc2). As far as I can see, nothing indicates this is a performance problem related to insufficient hardware - server runs at 25% CPU utilization average, Pressure Stall Information reports average of 0.5% for CPU, 1% for I/O. System memory used is 10 out of 64GB. As far as I can see, no Dovecot process becomes CPU-bound when doveadm is running. I tried some web search but found nothing related to this. For our purposes, this issue has a relatively simple workaround - using doveadm quota get on specific usernames instead of the wildcard one. Since this might be an issue with a relatively easy fix for someone who knows where to look, I decided to post this bug report anyway. If needed, I should be able to test patches that apply cleanly to Debian sources and compile with dpkg-buildpackage. Thanks