On Thu, May 07, 2015 at 09:41:43AM +0100, Steven Hartland wrote:
> On 07/05/2015 09:07, Slawa Olhovchenkov wrote:
> > I have zpool of 12 vdev (zmirrors).
> > One disk in one vdev out of service and stop serving reuquest:
> >
> > dT: 1.036s w: 1.000s
> > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> > 0 0 0 0 0.0 0 0 0.0 0.0| ada0
> > 0 0 0 0 0.0 0 0 0.0 0.0| ada1
> > 1 0 0 0 0.0 0 0 0.0 0.0| ada2
> > 0 0 0 0 0.0 0 0 0.0 0.0| ada3
> > 0 0 0 0 0.0 0 0 0.0 0.0| da0
> > 0 0 0 0 0.0 0 0 0.0 0.0| da1
> > 0 0 0 0 0.0 0 0 0.0 0.0| da2
> > 0 0 0 0 0.0 0 0 0.0 0.0| da3
> > 0 0 0 0 0.0 0 0 0.0 0.0| da4
> > 0 0 0 0 0.0 0 0 0.0 0.0| da5
> > 0 0 0 0 0.0 0 0 0.0 0.0| da6
> > 0 0 0 0 0.0 0 0 0.0 0.0| da7
> > 0 0 0 0 0.0 0 0 0.0 0.0| da8
> > 0 0 0 0 0.0 0 0 0.0 0.0| da9
> > 0 0 0 0 0.0 0 0 0.0 0.0| da10
> > 0 0 0 0 0.0 0 0 0.0 0.0| da11
> > 0 0 0 0 0.0 0 0 0.0 0.0| da12
> > 0 0 0 0 0.0 0 0 0.0 0.0| da13
> > 0 0 0 0 0.0 0 0 0.0 0.0| da14
> > 0 0 0 0 0.0 0 0 0.0 0.0| da15
> > 0 0 0 0 0.0 0 0 0.0 0.0| da16
> > 0 0 0 0 0.0 0 0 0.0 0.0| da17
> > 0 0 0 0 0.0 0 0 0.0 0.0| da18
> > 24 0 0 0 0.0 0 0 0.0 0.0| da19
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 0 0 0 0 0.0 0 0 0.0 0.0| da20
> > 0 0 0 0 0.0 0 0 0.0 0.0| da21
> > 0 0 0 0 0.0 0 0 0.0 0.0| da22
> > 0 0 0 0 0.0 0 0 0.0 0.0| da23
> > 0 0 0 0 0.0 0 0 0.0 0.0| da24
> > 0 0 0 0 0.0 0 0 0.0 0.0| da25
> > 0 0 0 0 0.0 0 0 0.0 0.0| da26
> > 0 0 0 0 0.0 0 0 0.0 0.0| da27
> >
> > As result zfs operation on this pool stoped too.
> > `zpool list -v` don't worked.
> > `zpool detach tank da19` don't worked.
> > Application worked with this pool sticking in `zfs` wchan and
don't killed.
> >
> > # camcontrol tags da19 -v
> > (pass19:isci0:0:3:0): dev_openings 7
> > (pass19:isci0:0:3:0): dev_active 25
> > (pass19:isci0:0:3:0): allocated 25
> > (pass19:isci0:0:3:0): queued 0
> > (pass19:isci0:0:3:0): held 0
> > (pass19:isci0:0:3:0): mintags 2
> > (pass19:isci0:0:3:0): maxtags 255
> >
> > How I can cancel this 24 requst?
> > Why this requests don't timeout (3 hours already)?
> > How I can forced detach this disk? (I am lready try `camcontrol
reset`, `camconrol rescan`).
> > Why ZFS (or geom) don't timeout on request and don't rerouted
to da18?
> >
> If they are in mirrors, in theory you can just pull the disk, isci will
> report to cam and cam will report to ZFS which should all recover.
Yes, zmirror with da18.
I am surprise that ZFS don't use da18. All zpool fully stuck.
> With regards to not timing out this could be a default issue, but having
I am understand, no universal acceptable timeout for all cases: good
disk, good saturated disk, tape, tape library, failed disk, etc.
In my case -- failed disk. This model already failed (other specimen)
with same symptoms).
May be exist some tricks for cancel/aborting all request in queue and
removing disk from system?
> a very quick look that's not obvious in the code as
> isci_io_request_construct etc do indeed set a timeout when
> CAM_TIME_INFINITY hasn't been requested.
>
> The sysctl hw.isci.debug_level may be able to provide more information,
> but be aware this can be spammy.
I am already have this situation, what command interesting after
setting hw.isci.debug_level?