Mikhail T.
2013-Mar-29 18:19 UTC
smbfus: panic on the second attempt to reach unavailable server
Hello! I have my FreeBSD-server dump nightly backups onto an entertainment device running embedded Linux. The device has no NFS-server, but does run Samba (3.0.30). It allows access to its internal hard-drive, which my server mounts as: //dune/hdd750_..._32 /dune smbfs rw,noauto,-N,-Ekoi8-u:utf-8 There are two nightly cronjob using dump(8), xz(1), and ccrypt(1) to dump two "important" filesystems (/var/spool/imap and /home). The imap one kicks off at 3:11am and the home -- at 3:31am. This normally works perfectly fine every night, except when somebody accidentally sits on top of the remote-control of the entertainment device in the living room -- or somehow else managed to turn the box off. When this happens, the first dump simply fails, as one would expect: cannot create /dune/backups/narawntapu.imap.1.Tuesday.dump.xz.cpt: No such file or directory DUMP: Date of this level 1 dump: Tue Mar 12 03:11:07 2013 DUMP: Date of last level 0 dump: Wed Mar 6 01:31:07 2013 DUMP: Dumping snapshot of /dev/da0a (/var/spool/imap) to standard output DUMP: mapping (Pass I) [regular files] DUMP: Cache 16 MB, blocksize = 65536 DUMP: mapping (Pass II) [directories] DUMP: estimated 169895 tape blocks. DUMP: dumping (Pass III) [directories] DUMP: Broken pipe DUMP: The ENTIRE dump is aborted. However, when the second job tries to do the same twenty minutes later, the machine panics. This morning I was able to get a kernel coredump: ... #6 0xffffffff80750f2f in calltrap () at /cache/src/sys/amd64/amd64/exception.S:228 No locals. #7 0xffffffff805a46ca in turnstile_broadcast (ts=0x0, queue=0) at /cache/src/sys/kern/subr_turnstile.c:838 _tid = <value optimized out> ts1 = <value optimized out> td = <value optimized out> #8 0xffffffff80550e52 in _mtx_unlock_sleep (m=0xfffffe0105ecd8f0, opts=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at /cache/src/sys/kern/kern_mutex.c:715 ts = (struct turnstile *) 0x0 #9 0xffffffff8101a0cd in smb_iod_invrq (iod=<value optimized out>) at /cache/src/sys/modules/smbfs/../../netsmb/smb_iod.c:91 rqp = (struct smb_rq *) 0xfffffe0105ecd800 #10 0xffffffff8101b172 in smb_iod_addrq (rqp=0xfffffe0105ecd800) at /cache/src/sys/modules/smbfs/../../netsmb/smb_iod.c:418 vcp = <value optimized out> iod = (struct smbiod *) 0xfffffe009483b800 error = <value optimized out> __func__ = "u?", '\220' <repeats 12 times> #11 0xffffffff81017da2 in smb_rq_simple (rqp=0xfffffe0105ecd800) at /cache/src/sys/modules/smbfs/../../netsmb/smb_rq.c:168 vcp = (struct smb_vc *) 0xfffffe011f957000 error = <value optimized out> i = 0 #12 0xffffffff81016202 in smb_smb_treeconnect (ssp=0xfffffe015f069200, scred=0xfffffe009483b868) at /cache/src/sys/modules/smbfs/../../netsmb/smb_smb.c:574 vcp = (struct smb_vc *) 0xfffffe011f957000 rq = {sr_state = 1720810032, sr_vc = 0xfffffe0002a8c490, sr_share 0xffffff8366917a90, sr_mid = 40352, sr_seqno = 4294967295, sr_rseqno 1720810112, sr_rq = {mb_top = 0xffffffff80574fea, mb_cur = 0x100000001, mb_mleft = 1458488464, mb_count = -512, mb_copy = 0xffffff8366917a80, mb_udata = 0xffffffff80755149}, sr_rqflags = 0 '\0', sr_rqflags2 = 0, sr_wcount = 0x0, sr_bcount = 0xffffff8366917ac0, sr_rp = {md_top 0xffffffff8057546d, md_cur = 0x0, md_pos = 0xfffffe0056eec490 "\2005?\200????"}, sr_rpgen = -1803307004, sr_rplast = -512, sr_flags 1458488464, sr_rpsize = -512, sr_cred = 0xfffffe009483b804, sr_timo 1458488464, sr_rexmit = -512, sr_sendcnt = 1720810208, sr_timesent = {tv_sec = 582, tv_nsec = -2196531595260}, sr_lerror = 0, sr_rqsig 0xffffff8366917b10 "\200{\221f\203???\206?V\200????\200{\221f\203???\035?\001\201?\a", sr_rqtid = 0xffffffff805a0e97, sr_rquid = 0xffffff8366917b10, sr_errclass = 1 '\001', sr_serror = 0, sr_error = 0, sr_rpflags = 208 '?', sr_rpflags2 = 0, sr_rptid = 0, sr_rppid = 0, sr_rpuid = 0, sr_rpmid = 0, sr_slock = {lock_object {lo_name = 0xffffff8366917b80 "?{\221f\203???\032?\001\201?????{\221f\203???\230?\203\224", lo_flags 2153163654, lo_data = 4294967295, lo_witness = 0xffffff8366917b80}, mtx_lock = 8592098960413}, sr_t2 = 0xffffffff8102517c, sr_link = {tqe_next 0x9483b820, tqe_prev = 0x0}} rqp = (struct smb_rq *) 0xfffffe0105ecd800 mbp = (struct mbchain *) 0xfffffe0105ecd828 pp = <value optimized out> pbuf = 0x0 encpass = 0x0 error = <value optimized out> plen = 1 upper = 0 #13 0xffffffff8101ad1a in smb_iod_thread (arg=<value optimized out>) at /cache/src/sys/modules/smbfs/../../netsmb/smb_iod.c:206 iod = (struct smbiod *) 0xfffffe009483b800 #14 0xffffffff805365df in fork_exit (callout=0xffffffff8101aa83 <smb_iod_thread>, arg=0xfffffe009483b800, frame=0xffffff8366917c40) at /cache/src/sys/kern/kern_fork.c:992 p = (struct proc *) 0xfffffe0181104000 td = (struct thread *) 0xfffffe0056eec490 #15 0xffffffff8075145e in fork_trampoline () at /cache/src/sys/amd64/amd64/exception.S:602 Looking inside the smb_iod_invrq (smb_iod.c:91), I'm wondering, if an attempt is made to invalidate/release something twice (causing the turnstile_broadcast() to be invoked with ts being NULL the second time)? That would explain, why the first attempt to use the absent server errors-out as normal, and only the second attempt panics. My kernel is 9.1-PRERELEASE as of Dec 19. Any ideas? Thanks! Yours, -mi