thr3ads.net - freebsd stable - ctld(8) 11.2-release lockup with w2k16 [Was: Re: ctld(8), multiple 'portal-group' on same socket (individual 'discovery-auth-group' restrictions)] [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Harry Schmalzbauer

2018-Jul-05 16:17 UTC

ctld(8) 11.2-release lockup with w2k16 [Was: Re: ctld(8), multiple 'portal-group' on same socket (individual 'discovery-auth-group' restrictions)]

Am 21.10.2014 um 12:43 schrieb Edward Tomasz Napiera?a:> On 1020T1035, Harald Schmalzbauer wrote:
>>   Hello,
>>
>> I'm trying to move from istgt(1) to ctld(8), but it seems my setup
isn't
>> possible with ctld.
>> Besides missing support for virtual-DVDs ('UnitType DVD' in
istgt) and
>> real ODD-devices ('UnitType pass' in istgt),
> Yup, we don't implement virtual DVDs and passthrough.  Especially the
> latter would be a nice feature to have.

Hello Edward,

my current problem is unrelated.
But this old mail illustrates the timeframe I've been happily using 
ctld(8) without problems :-) Thanks!

Recently, I discovered that WindowsServerBackup fails with Win2k16 
(never used 2k12).
Old initiators running 2008R2 (or ESXi 5.5) are still able to use 
ctld(8) ZVOL targets for WindowsServerBackup on 11.2-release without 
problems.

I haven't had time to do much analysis and I'm lacking skills/equipment 
to do them down at debugger level, but I wanted to ask if you're aware 
about problems with Windows Server 2016 as ctld(8) initiator.

The Symptoms:

The system locks up for about 30-60 seconds with iSCSI load from w2k16.
When the lockup happens, systat(1) shows 25% intr usage (which is one 
core) and not even the login session is responsive anymore. Neither 
updating userland-output nor reacting to input.
But, the input is queued and gets processed after the lockup releases.
The lockup vanishes as soon as iSCSI session was reset:
Jun 28 06:14:09 bansta kernel: WARNING: 172.24.32.172 
(iqn.1991-05.com.microsoft:dafus.mgn.mo1.psw-online.de): no ping reply 
(NOP-Out) after 5 seconds; dropping
connection
Jun 28 06:14:09 bansta kernel: WARNING: 172.24.32.172 
(iqn.1991-05.com.microsoft:dafus.mgn.mo1.psw-online.de): waiting for CTL 
to terminate 94 tasks
Jun 28 06:14:09 bansta kernel: WARNING: 172.24.32.172 
(iqn.1991-05.com.microsoft:dafus.mgn.mo1.psw-online.de): tasks terminated

Sometimes it's possible to transfer 30GB before the lockup happens, 
sometimes even a NTFS-quick-format leads to the lockup.

Yesterday I used istgt(1) instead of ctld(8) to export the exactly same 
ZVOL using the exactly same network backend, with exactly the same 
initiator.
The lockup hasn't occured anymore, the complete WindowsServerBackup taks 
finishes successfully on the Windows Server 2016 initiator.? So I 
strongly suspect a ctld(8) locking problem.
Like mentioned, target backed is a ZFS volume.? I already used a HDD as 
target backed (and observed a much better performance, which drops even 
if I use a UFS vnode backend on the same HDD), but I'm not sure anymore 
whether the lockup also occured...

For now I can't tell anything helpfuly, just describe the symptoms and 
ask if you have any hints for me what to try next to narrow down the 
problem, or if this is a already known problem.

Thanks,

-harry

Harry Schmalzbauer

2018-Aug-25 13:04 UTC

head link

ctld(8) 11.2-release lockup with w2k16 [Was: Re: ctld(8), multiple 'portal-group' on same socket (individual 'discovery-auth-group' restrictions)]

Am 05.07.2018 um 18:17 schrieb Harry Schmalzbauer:> Am 21.10.2014 um 12:43 schrieb Edward Tomasz Napiera?a:
>> On 1020T1035, Harald Schmalzbauer wrote:
>>> ? Hello,
>>>
>>> I'm trying to move from istgt(1) to ctld(8), but it seems my
setup
>>> isn't
>>> possible with ctld.
>>> Besides missing support for virtual-DVDs ('UnitType DVD' in
istgt) and
>>> real ODD-devices ('UnitType pass' in istgt),
>> Yup, we don't implement virtual DVDs and passthrough. Especially
the
>> latter would be a nice feature to have.
>
>
> Hello Edward,
>
> my current problem is unrelated.
> But this old mail illustrates the timeframe I've been happily using 
> ctld(8) without problems :-) Thanks!
>
> Recently, I discovered that WindowsServerBackup fails with Win2k16 
> (never used 2k12).
> Old initiators running 2008R2 (or ESXi 5.5) are still able to use 
> ctld(8) ZVOL targets for WindowsServerBackup on 11.2-release without 
> problems.
Unfortunately also ESXi6.5 initiatiors are not working well with ctld(8) 
anymore.
Read performace is incredibly slow.
I have a 2x3z1 pool with 6SAS10krpm spindels.
Local ZFS performance doesn't show anything unexpected.
But reading from a ctld(8) ZVOL backed target under ESXi6.5 seems to 
cause a interrupt deadlock ? not completely dead, but almost.
gstat(8) tells me that all 6 HDDs are idle.
top(1) shows no thread consuming CPU cycles, with one exception (besides 
idle):
12 root???????? 38 -56??? -???? 0K?? 608K WAIT?? -1 569:02 482.78% intr
systat(1) shows NICs almost idle (<100irqs/s) and permanent 25% INTR 
load (one of 4 cores).

This is with 11.2 release.
It's a ESXi guest, which I used severla years with previous FreeBSD 
versions without such massive iSCSI performance problems.

Using the same /dev/zvol with istgt(1) on the same 11.2-release VM also 
solves the performance issue.

Is anybody using ctld(8) in production post 10.x? If so, without 
observing a similar regression?

Thanks,

-harry

freebsd stable - Aug 2018 - ctld(8) 11.2-release lockup with w2k16 [Was: Re: ctld(8), multiple 'portal-group' on same socket (individual 'discovery-auth-group' restrictions)]

ctld(8) 11.2-release lockup with w2k16 [Was: Re: ctld(8), multiple 'portal-group' on same socket (individual 'discovery-auth-group' restrictions)]

ctld(8) 11.2-release lockup with w2k16 [Was: Re: ctld(8), multiple 'portal-group' on same socket (individual 'discovery-auth-group' restrictions)]