On 07/05/2015 10:50, Slawa Olhovchenkov wrote:> On Thu, May 07, 2015 at 09:41:43AM +0100, Steven Hartland wrote:
>
>> On 07/05/2015 09:07, Slawa Olhovchenkov wrote:
>>> I have zpool of 12 vdev (zmirrors).
>>> One disk in one vdev out of service and stop serving reuquest:
>>>
>>> dT: 1.036s w: 1.000s
>>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy
Name
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
ada0
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
ada1
>>> 1 0 0 0 0.0 0 0 0.0 0.0|
ada2
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
ada3
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da0
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da1
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da2
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da3
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da4
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da5
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da6
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da7
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da8
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da9
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da10
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da11
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da12
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da13
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da14
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da15
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da16
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da17
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da18
>>> 24 0 0 0 0.0 0 0 0.0 0.0|
da19
>>>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da20
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da21
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da22
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da23
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da24
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da25
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da26
>>> 0 0 0 0 0.0 0 0 0.0 0.0|
da27
>>>
>>> As result zfs operation on this pool stoped too.
>>> `zpool list -v` don't worked.
>>> `zpool detach tank da19` don't worked.
>>> Application worked with this pool sticking in `zfs` wchan and
don't killed.
>>>
>>> # camcontrol tags da19 -v
>>> (pass19:isci0:0:3:0): dev_openings 7
>>> (pass19:isci0:0:3:0): dev_active 25
>>> (pass19:isci0:0:3:0): allocated 25
>>> (pass19:isci0:0:3:0): queued 0
>>> (pass19:isci0:0:3:0): held 0
>>> (pass19:isci0:0:3:0): mintags 2
>>> (pass19:isci0:0:3:0): maxtags 255
>>>
>>> How I can cancel this 24 requst?
>>> Why this requests don't timeout (3 hours already)?
>>> How I can forced detach this disk? (I am lready try `camcontrol
reset`, `camconrol rescan`).
>>> Why ZFS (or geom) don't timeout on request and don't
rerouted to da18?
>>>
>> If they are in mirrors, in theory you can just pull the disk, isci will
>> report to cam and cam will report to ZFS which should all recover.
> Yes, zmirror with da18.
> I am surprise that ZFS don't use da18. All zpool fully stuck.
A single low level request can only be handled by one device, if that
device returns an error then ZFS will use the other device, but not
until.>
>> With regards to not timing out this could be a default issue, but
having
> I am understand, no universal acceptable timeout for all cases: good
> disk, good saturated disk, tape, tape library, failed disk, etc.
> In my case -- failed disk. This model already failed (other specimen)
> with same symptoms).
>
> May be exist some tricks for cancel/aborting all request in queue and
> removing disk from system?
Unlikely tbh, pulling the disk however should.>
>> a very quick look that's not obvious in the code as
>> isci_io_request_construct etc do indeed set a timeout when
>> CAM_TIME_INFINITY hasn't been requested.
>>
>> The sysctl hw.isci.debug_level may be able to provide more information,
>> but be aware this can be spammy.
> I am already have this situation, what command interesting after
> setting hw.isci.debug_level?
I'm afraid I'm not familiar isci I'm afraid possibly someone else
who is
can chime in.
Regards
Steve