Apologies first for using two addresses, but I can?t currently read my email at distal.com. :-) I was previously running dovecot2-2.2.29.1_2 on FreeBSD 11 on sparc64. Trying to debug a problem I was having with one of my clients, I upgraded to dovecot-2.2.33.2_4 on that same server. However, I cannot connect now, log shows: Feb 20 16:55:00 westeros dovecot: master: Dovecot v2.2.33.2 (d6601f4ec) starting up for imap, pop3, lmtp Feb 20 16:55:31 westeros dovecot: auth: Fatal: master: service(auth): child 25395 killed with signal 11 (core dumped) Feb 20 16:55:31 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 2 secs Feb 20 16:55:31 westeros dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 0 secs): user=<>, rip=2001::xxx, lip=2001:470:e24c:200::ae25, TLS handshaking, session=<ASDFSAFSADFSAD> Feb 20 16:55:33 westeros dovecot: auth: Fatal: master: service(auth): child 25398 killed with signal 11 (core dumped) Feb 20 16:55:33 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 4 secs Feb 20 16:55:33 westeros dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 2 secs): user=<>, rip=2001::xxx, lip=2001:470:e24c:200::ae25, session=<d46tyesdy5dsyd> Feb 20 16:55:37 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 8 secs Feb 20 16:55:37 westeros dovecot: auth: Fatal: master: service(auth): child 25400 killed with signal 11 (core dumped) Loading the core file, as described dovecot.org/bugreport.html , shows the error in libc somewhere: (gdb) bt full #0 __unaligned_load ( p=0x617070656e640e6d <Address 0x617070656e640e6d out of bounds>, size=4) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 val = 0 i = 0 #1 0x00000000109f9f6c in __unaligned_fixup (uf=0x7fdffffee40) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:78 addr = <value optimized out> val = <value optimized out> insn = 3254807616 sig = <value optimized out> #2 0x00000000109f9d50 in __sparc_utrap (uf=0x7fdffffee40) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap.c:100 sig = 272013984 #3 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. #4 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. Previous frame identical to this frame (corrupt stack?) (gdb) As this is a sparc64, with 8-byte alignment requirements, I?m guessing that?s the issue. Many a piece of software has failed to respect that and crashed. But, I?m not sure. Does anyone have any suggestions? I?ve built it locally (via ports), so if there are compiler options I can/should try, I certainly can try. Thanks? - Chris
Your core dump looks a bit broken. Since it seems to die instantly, can you try gdb /path/to/auth and just run it? Aki On 21.02.2018 02:08, Chris Ross wrote:> Apologies first for using two addresses, but I can?t currently read my email at distal.com. :-) > > I was previously running dovecot2-2.2.29.1_2 on FreeBSD 11 on sparc64. Trying to debug a problem I was having with one of my clients, I upgraded to dovecot-2.2.33.2_4 on that same server. However, I cannot connect now, log shows: > > Feb 20 16:55:00 westeros dovecot: master: Dovecot v2.2.33.2 (d6601f4ec) starting up for imap, pop3, lmtp > Feb 20 16:55:31 westeros dovecot: auth: Fatal: master: service(auth): child 25395 killed with signal 11 (core dumped) > Feb 20 16:55:31 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 2 secs > Feb 20 16:55:31 westeros dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 0 secs): user=<>, rip=2001::xxx, lip=2001:470:e24c:200::ae25, TLS handshaking, session=<ASDFSAFSADFSAD> > Feb 20 16:55:33 westeros dovecot: auth: Fatal: master: service(auth): child 25398 killed with signal 11 (core dumped) > Feb 20 16:55:33 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 4 secs > Feb 20 16:55:33 westeros dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 2 secs): user=<>, rip=2001::xxx, lip=2001:470:e24c:200::ae25, session=<d46tyesdy5dsyd> > Feb 20 16:55:37 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 8 secs > Feb 20 16:55:37 westeros dovecot: auth: Fatal: master: service(auth): child 25400 killed with signal 11 (core dumped) > > Loading the core file, as described dovecot.org/bugreport.html , shows the error in libc somewhere: > > (gdb) bt full > #0 __unaligned_load ( > p=0x617070656e640e6d <Address 0x617070656e640e6d out of bounds>, size=4) > at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 > val = 0 > i = 0 > #1 0x00000000109f9f6c in __unaligned_fixup (uf=0x7fdffffee40) > at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:78 > addr = <value optimized out> > val = <value optimized out> > insn = 3254807616 > sig = <value optimized out> > #2 0x00000000109f9d50 in __sparc_utrap (uf=0x7fdffffee40) > at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap.c:100 > sig = 272013984 > #3 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 > No symbol table info available. > #4 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 > No symbol table info available. > Previous frame identical to this frame (corrupt stack?) > (gdb) > > As this is a sparc64, with 8-byte alignment requirements, I?m guessing that?s the issue. Many a piece of software has failed to respect that and crashed. But, I?m not sure. > > Does anyone have any suggestions? I?ve built it locally (via ports), so if there are compiler options I can/should try, I certainly can try. > > Thanks? > > - Chris
Sadly, that doesn?t help either. Over the past day, I?ve built and installed a different branch of the OS (stable/11, instead of release/11.1), to see if a new compiler/libc might change things. Sadly, it does not. In the same situation now, auth fails immediately with signal 11. Running gdb on auth (from build dir, compiled -g -O2) shows something similar. - Chris # gdb work/dovecot-2.2.33.2/src/auth/.libs/auth GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc64-marcel-freebsd"... (gdb) list 372 /* ask auth master to disconnect us */ 373 auth_worker_client_send_shutdown(); 374 } 375 } 376 377 int main(int argc, char *argv[]) 378 { 379 int c; 380 381 master_service = master_service_init("auth", 0, &argc, &argv, "w"); (gdb) run Starting program: /usr/ports/mail/dovecot/work/dovecot-2.2.33.2/src/auth/.libs/auth Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb) bt #0 0x000000004022a380 in ?? () #1 0x0000000000000008 in ?? () Previous frame identical to this frame (corrupt stack?) (gdb)> On Feb 21, 2018, at 02:01, Aki Tuomi <aki.tuomi at dovecot.fi> wrote: > > Your core dump looks a bit broken. Since it seems to die instantly, can > you try gdb /path/to/auth and just run it? > > Aki
> As this is a sparc64, with 8-byte alignment requirements, I?m guessing that?s the issue. Many a piece of software has failed to respect that and crashed. But, I?m not sure. > > Does anyone have any suggestions? I?ve built it locally (via ports), so if there are compiler options I can/should try, I certainly can try. > > Thanks?On what specific hardware you are running FreeBSD/sparc64? I have some old Sun desktops lying around with UltraSPARC-III and UltraSPARC-IIIi processors. Maybe I need to power them up again so that we can run some tests on big-endian machine ourself. Sami
On Tue, Feb 20, 2018 at 19:08:27 -0500, Chris Ross wrote:> > Apologies first for using two addresses, but I can?t currently read my email at distal.com. :-) > > I was previously running dovecot2-2.2.29.1_2 on FreeBSD 11 on sparc64. > Trying to debug a problem I was having with one of my clients, I > upgraded to dovecot-2.2.33.2_4 on that same server. However, I cannot > connect now, log shows: >...> Loading the core file, as described > dovecot.org/bugreport.html , shows the error in libc > somewhere:I read the your other mails in this thread; can you run things as before and do a 'bt full' on the core file with the debug-symbol-enabled libdovecot? gdb seems to be catching the SIGTRAPs, which is making things a bit confusing.> (gdb) bt full > #0 __unaligned_load ( > p=0x617070656e640e6d <Address 0x617070656e640e6d out of bounds>, size=4)This address looks like ASCII - "append\x0em", so my theory at the moment is: (1) something clobbers a pointer (2) the CPU attempts to execute a load from the address (3) a utrap is generated to handle unaligned load (4) the utrap code attempts to emulate the unaligned load (5) the CPU fails to access the address since it is bogus, and a SIGSEGV is generated Now, I'm have no idea why it'd first try to work around the alignment requirement before doing a quick sanity check and generating SIGSEGV to begin with, but that's my theory based on the info available so far. Hopefully, a stack trace from a core file will help. Thanks, Jeff.> at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 > val = 0 > i = 0 > #1 0x00000000109f9f6c in __unaligned_fixup (uf=0x7fdffffee40) > at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:78 > addr = <value optimized out> > val = <value optimized out> > insn = 3254807616 > sig = <value optimized out> > #2 0x00000000109f9d50 in __sparc_utrap (uf=0x7fdffffee40) > at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap.c:100 > sig = 272013984 > #3 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 > No symbol table info available. > #4 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 > No symbol table info available. > Previous frame identical to this frame (corrupt stack?) > (gdb) > > As this is a sparc64, with 8-byte alignment requirements, I?m guessing that?s the issue. Many a piece of software has failed to respect that and crashed. But, I?m not sure. > > Does anyone have any suggestions? I?ve built it locally (via ports), so if there are compiler options I can/should try, I certainly can try. > > Thanks? > > - Chris-- If I have trouble installing Linux, something is wrong. Very wrong. - Linus Torvalds
> On Feb 22, 2018, at 15:21, Josef 'Jeff' Sipek <jeff.sipek at dovecot.fi> wrote: > >> Loading the core file, as described >> dovecot.org/bugreport.html , shows the error in libc >> somewhere: > > I read the your other mails in this thread; can you run things as before and > do a 'bt full' on the core file with the debug-symbol-enabled libdovecot? > gdb seems to be catching the SIGTRAPs, which is making things a bit confusing. > >> (gdb) bt full >> #0 __unaligned_load ( >> p=0x617070656e640e6d <Address 0x617070656e640e6d out of bounds>, size=4)No difference there. I changed the install process to not strip things, and manually copied in all of the libs in /usr/local/lib/dovecot again with unstripped (I think libtool stripped them, I just rejiggered makefiles and install-sh). Loading a core from a SEGV shows: Loaded symbols for /libexec/ld-elf.so.1 #0 __unaligned_load ( p=0x706172736572690a <Address 0x706172736572690a out of bounds>, size=4) at /usr/src/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 45 val = (val << 8) | p[i]; (gdb) bt full #0 __unaligned_load ( p=0x706172736572690a <Address 0x706172736572690a out of bounds>, size=4) at /usr/src/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 val = 0 i = 0 #1 0x0000000040adb7cc in __unaligned_fixup (uf=0x7fdfffff110) at /usr/src/lib/libc/sparc64/sys/__sparc_utrap_align.c:78 addr = <value optimized out> val = <value optimized out> insn = 3254806592 sig = <value optimized out> #2 0x0000000040adb5b0 in __sparc_utrap (uf=0x7fdfffff110) at /usr/src/lib/libc/sparc64/sys/__sparc_utrap.c:100 sig = 16 #3 0x0000000040a2c1cc in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. #4 0x0000000040a2c1cc in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. Previous frame identical to this frame (corrupt stack?) (gdb) (Which as you note below, that address is actually ?parseri\n?)> This address looks like ASCII - "append\x0em", so my theory at the moment > is: > > (1) something clobbers a pointer > (2) the CPU attempts to execute a load from the address > (3) a utrap is generated to handle unaligned load > (4) the utrap code attempts to emulate the unaligned load > (5) the CPU fails to access the address since it is bogus, and a SIGSEGV is > generated > > Now, I'm have no idea why it'd first try to work around the alignment > requirement before doing a quick sanity check and generating SIGSEGV to > begin with, but that's my theory based on the info available so far. > Hopefully, a stack trace from a core file will help.Unfortunately it seems not to have. But, good catch on the pointer value there being ASCII data. Let me know if you have any other ideas. - Chris