thr3ads.net - zfs discuss - [zfs-discuss] zfs over iscsi not recovering from timeouts [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Tuomas Leikola

2011-Apr-17 08:55 UTC

[zfs-discuss] zfs over iscsi not recovering from timeouts

Hei,

I''m crossposting this to zfs as i''m not sure which bit is to
blame here.

I''ve been having this issue that i cannot really fix myself:

I have a OI 148 server, which hosts a log of disks on SATA
controllers. Now it''s full and needs some data moving work to be done,
so i''ve acquired another box which runs linux and has several sata
enclosures. I''m using solaris iscsi on static-config to connect the
device.

Normally, when everything is fine, no problems. I can even restart the
iet daemon and theres just a short hiccup in the IO-stream.

Things go bad when i turn the iscsi target off for a longer period
(reboot, etc). The solaris iscsi times out, and responds these as
errors to zfs. zfs increases error counts (loses writes maybe) and
eventually marks all devices as failed and the array halts
(failmode=wait).

When in this state, there is no luck returning to running state. The
failed condition doesn''t purge itself after the target becomes online
again. I''ve tried zpool clear but it still reports data errors and
devices as faulted. zpool export hangs.

how i see this problem is that
a) iscsi initiator reports timeouts as permanent
b) zfs handles them as such
c) there is no timeout "never" to be chosen as far as i can see

What I would like is a mode equivalent to nfs hard mount - wait
forever for the device to become available (but ability to kick the
array from cmdline if it is really dead).

Any clues?



-- 
- Tuomas

Bing Zhao

2011-Apr-18 02:20 UTC

head link

[zfs-discuss] [storage-discuss] zfs over iscsi not recovering from timeouts

Hi Tuomas:

Before you run "zpool clear", please make sure that the os device name
exists in the output of ''iscsiadm list target -S''.

#iscsiadm list target -S
Target: iqn.1986-03.com.sun:02:test
     Alias: -
     TPGT: 1
     ISID: 4000002a0000
     Connections: 1
     LUN: 0
          Vendor:  SUN
          Product: COMSTAR
          OS Device Name: 
/dev/rdsk/c0t600144F008002797ACDE4CCBC0480001d0s2 <========OS device name


Regards,
Bing

On 04/17/11 16:55, Tuomas Leikola wrote:> Hei,
>
> I''m crossposting this to zfs as i''m not sure which bit is
to blame here.
>
> I''ve been having this issue that i cannot really fix myself:
>
> I have a OI 148 server, which hosts a log of disks on SATA
> controllers. Now it''s full and needs some data moving work to be
done,
> so i''ve acquired another box which runs linux and has several sata
> enclosures. I''m using solaris iscsi on static-config to connect
the
> device.
>
> Normally, when everything is fine, no problems. I can even restart the
> iet daemon and theres just a short hiccup in the IO-stream.
>
> Things go bad when i turn the iscsi target off for a longer period
> (reboot, etc). The solaris iscsi times out, and responds these as
> errors to zfs. zfs increases error counts (loses writes maybe) and
> eventually marks all devices as failed and the array halts
> (failmode=wait).
>
> When in this state, there is no luck returning to running state. The
> failed condition doesn''t purge itself after the target becomes
online
> again. I''ve tried zpool clear but it still reports data errors and
> devices as faulted. zpool export hangs.
>
> how i see this problem is that
> a) iscsi initiator reports timeouts as permanent
> b) zfs handles them as such
> c) there is no timeout "never" to be chosen as far as i can see
>
> What I would like is a mode equivalent to nfs hard mount - wait
> forever for the device to become available (but ability to kick the
> array from cmdline if it is really dead).
>
> Any clues?
>
>
>

zfs discuss - Apr 2011 - zfs over iscsi not recovering from timeouts

[zfs-discuss] zfs over iscsi not recovering from timeouts

[zfs-discuss] [storage-discuss] zfs over iscsi not recovering from timeouts