Harry Schmalzbauer
2018-Jul-05 16:17 UTC
ctld(8) 11.2-release lockup with w2k16 [Was: Re: ctld(8), multiple 'portal-group' on same socket (individual 'discovery-auth-group' restrictions)]
Am 21.10.2014 um 12:43 schrieb Edward Tomasz Napiera?a:> On 1020T1035, Harald Schmalzbauer wrote: >> Hello, >> >> I'm trying to move from istgt(1) to ctld(8), but it seems my setup isn't >> possible with ctld. >> Besides missing support for virtual-DVDs ('UnitType DVD' in istgt) and >> real ODD-devices ('UnitType pass' in istgt), > Yup, we don't implement virtual DVDs and passthrough. Especially the > latter would be a nice feature to have.Hello Edward, my current problem is unrelated. But this old mail illustrates the timeframe I've been happily using ctld(8) without problems :-) Thanks! Recently, I discovered that WindowsServerBackup fails with Win2k16 (never used 2k12). Old initiators running 2008R2 (or ESXi 5.5) are still able to use ctld(8) ZVOL targets for WindowsServerBackup on 11.2-release without problems. I haven't had time to do much analysis and I'm lacking skills/equipment to do them down at debugger level, but I wanted to ask if you're aware about problems with Windows Server 2016 as ctld(8) initiator. The Symptoms: The system locks up for about 30-60 seconds with iSCSI load from w2k16. When the lockup happens, systat(1) shows 25% intr usage (which is one core) and not even the login session is responsive anymore. Neither updating userland-output nor reacting to input. But, the input is queued and gets processed after the lockup releases. The lockup vanishes as soon as iSCSI session was reset: Jun 28 06:14:09 bansta kernel: WARNING: 172.24.32.172 (iqn.1991-05.com.microsoft:dafus.mgn.mo1.psw-online.de): no ping reply (NOP-Out) after 5 seconds; dropping connection Jun 28 06:14:09 bansta kernel: WARNING: 172.24.32.172 (iqn.1991-05.com.microsoft:dafus.mgn.mo1.psw-online.de): waiting for CTL to terminate 94 tasks Jun 28 06:14:09 bansta kernel: WARNING: 172.24.32.172 (iqn.1991-05.com.microsoft:dafus.mgn.mo1.psw-online.de): tasks terminated Sometimes it's possible to transfer 30GB before the lockup happens, sometimes even a NTFS-quick-format leads to the lockup. Yesterday I used istgt(1) instead of ctld(8) to export the exactly same ZVOL using the exactly same network backend, with exactly the same initiator. The lockup hasn't occured anymore, the complete WindowsServerBackup taks finishes successfully on the Windows Server 2016 initiator.? So I strongly suspect a ctld(8) locking problem. Like mentioned, target backed is a ZFS volume.? I already used a HDD as target backed (and observed a much better performance, which drops even if I use a UFS vnode backend on the same HDD), but I'm not sure anymore whether the lockup also occured... For now I can't tell anything helpfuly, just describe the symptoms and ask if you have any hints for me what to try next to narrow down the problem, or if this is a already known problem. Thanks, -harry
Harry Schmalzbauer
2018-Aug-25 13:04 UTC
ctld(8) 11.2-release lockup with w2k16 [Was: Re: ctld(8), multiple 'portal-group' on same socket (individual 'discovery-auth-group' restrictions)]
Am 05.07.2018 um 18:17 schrieb Harry Schmalzbauer:> Am 21.10.2014 um 12:43 schrieb Edward Tomasz Napiera?a: >> On 1020T1035, Harald Schmalzbauer wrote: >>> ? Hello, >>> >>> I'm trying to move from istgt(1) to ctld(8), but it seems my setup >>> isn't >>> possible with ctld. >>> Besides missing support for virtual-DVDs ('UnitType DVD' in istgt) and >>> real ODD-devices ('UnitType pass' in istgt), >> Yup, we don't implement virtual DVDs and passthrough. Especially the >> latter would be a nice feature to have. > > > Hello Edward, > > my current problem is unrelated. > But this old mail illustrates the timeframe I've been happily using > ctld(8) without problems :-) Thanks! > > Recently, I discovered that WindowsServerBackup fails with Win2k16 > (never used 2k12). > Old initiators running 2008R2 (or ESXi 5.5) are still able to use > ctld(8) ZVOL targets for WindowsServerBackup on 11.2-release without > problems.Unfortunately also ESXi6.5 initiatiors are not working well with ctld(8) anymore. Read performace is incredibly slow. I have a 2x3z1 pool with 6SAS10krpm spindels. Local ZFS performance doesn't show anything unexpected. But reading from a ctld(8) ZVOL backed target under ESXi6.5 seems to cause a interrupt deadlock ? not completely dead, but almost. gstat(8) tells me that all 6 HDDs are idle. top(1) shows no thread consuming CPU cycles, with one exception (besides idle): 12 root???????? 38 -56??? -???? 0K?? 608K WAIT?? -1 569:02 482.78% intr systat(1) shows NICs almost idle (<100irqs/s) and permanent 25% INTR load (one of 4 cores). This is with 11.2 release. It's a ESXi guest, which I used severla years with previous FreeBSD versions without such massive iSCSI performance problems. Using the same /dev/zvol with istgt(1) on the same 11.2-release VM also solves the performance issue. Is anybody using ctld(8) in production post 10.x? If so, without observing a similar regression? Thanks, -harry