One of our dovecot backend servers ran into a problem with it's auth process a few days ago. This doesn't appear to be the error logged when dovecot hits its internal limit so I'm not sure what is going on here. auth: Error: malloc: 58012: Cannot allocate memory auth: Error: Unable to allocate memory for mutexes from the region auth: Error: PANIC: Cannot allocate memory auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user pop3: Error: Authenticated user not found from userdb, auth lookup +id=2509111297 (client-pid=4781 client-id=1) pop3-login: Internal login failure (pid=4781 id=1) (internal failure, 1 +succesful auths): user=<test>... There was at least 10+GB free RAM on the server and no indication of a system level issue at the same time. The server is running 2.1.9. There were about 3,200 active sessions, with something like 12 new sessions/sec. The other identical servers are/were handling virtually identical load with the same service uptime and haven't had any issues so far. (Crash happened 7 days ago.) -- Kelsey Cummings - kgc at corp.sonic.net sonic.net, inc. System Architect 2260 Apollo Way 707.522.1000 Santa Rosa, CA 95407
On 9.11.2012, at 2.49, Kelsey Cummings wrote:> One of our dovecot backend servers ran into a problem with it's auth > process a few days ago. This doesn't appear to be the error logged when > dovecot hits its internal limit so I'm not sure what is going on here. > > auth: Error: malloc: 58012: Cannot allocate memory > auth: Error: Unable to allocate memory for mutexes from the region > auth: Error: PANIC: Cannot allocate memory > auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown userIt would have been nicer if libc would have just crashed the process instead of silently converting it into "unknown user" error.. That's probably actually a bug since the getpwuid_r() that Dovecot uses would have been able to return an error message.> pop3: Error: Authenticated user not found from userdb, auth lookup +id=2509111297 (client-pid=4781 client-id=1) > pop3-login: Internal login failure (pid=4781 id=1) (internal failure, 1 +succesful auths): user=<test>... > > There was at least 10+GB free RAM on the server and no indication of a > system level issue at the same time. The server is running 2.1.9. > There were about 3,200 active sessions, with something like 12 new > sessions/sec. The other identical servers are/were handling virtually > identical load with the same service uptime and haven't had any issues > so far. (Crash happened 7 days ago.)Memory leak maybe? service auth { vsz_limit } anyway was reached (default 256 MB).