Tuomas Leikola
2011-Apr-17 08:55 UTC
[zfs-discuss] zfs over iscsi not recovering from timeouts
Hei, I''m crossposting this to zfs as i''m not sure which bit is to blame here. I''ve been having this issue that i cannot really fix myself: I have a OI 148 server, which hosts a log of disks on SATA controllers. Now it''s full and needs some data moving work to be done, so i''ve acquired another box which runs linux and has several sata enclosures. I''m using solaris iscsi on static-config to connect the device. Normally, when everything is fine, no problems. I can even restart the iet daemon and theres just a short hiccup in the IO-stream. Things go bad when i turn the iscsi target off for a longer period (reboot, etc). The solaris iscsi times out, and responds these as errors to zfs. zfs increases error counts (loses writes maybe) and eventually marks all devices as failed and the array halts (failmode=wait). When in this state, there is no luck returning to running state. The failed condition doesn''t purge itself after the target becomes online again. I''ve tried zpool clear but it still reports data errors and devices as faulted. zpool export hangs. how i see this problem is that a) iscsi initiator reports timeouts as permanent b) zfs handles them as such c) there is no timeout "never" to be chosen as far as i can see What I would like is a mode equivalent to nfs hard mount - wait forever for the device to become available (but ability to kick the array from cmdline if it is really dead). Any clues? -- - Tuomas
Bing Zhao
2011-Apr-18 02:20 UTC
[zfs-discuss] [storage-discuss] zfs over iscsi not recovering from timeouts
Hi Tuomas: Before you run "zpool clear", please make sure that the os device name exists in the output of ''iscsiadm list target -S''. #iscsiadm list target -S Target: iqn.1986-03.com.sun:02:test Alias: - TPGT: 1 ISID: 4000002a0000 Connections: 1 LUN: 0 Vendor: SUN Product: COMSTAR OS Device Name: /dev/rdsk/c0t600144F008002797ACDE4CCBC0480001d0s2 <========OS device name Regards, Bing On 04/17/11 16:55, Tuomas Leikola wrote:> Hei, > > I''m crossposting this to zfs as i''m not sure which bit is to blame here. > > I''ve been having this issue that i cannot really fix myself: > > I have a OI 148 server, which hosts a log of disks on SATA > controllers. Now it''s full and needs some data moving work to be done, > so i''ve acquired another box which runs linux and has several sata > enclosures. I''m using solaris iscsi on static-config to connect the > device. > > Normally, when everything is fine, no problems. I can even restart the > iet daemon and theres just a short hiccup in the IO-stream. > > Things go bad when i turn the iscsi target off for a longer period > (reboot, etc). The solaris iscsi times out, and responds these as > errors to zfs. zfs increases error counts (loses writes maybe) and > eventually marks all devices as failed and the array halts > (failmode=wait). > > When in this state, there is no luck returning to running state. The > failed condition doesn''t purge itself after the target becomes online > again. I''ve tried zpool clear but it still reports data errors and > devices as faulted. zpool export hangs. > > how i see this problem is that > a) iscsi initiator reports timeouts as permanent > b) zfs handles them as such > c) there is no timeout "never" to be chosen as far as i can see > > What I would like is a mode equivalent to nfs hard mount - wait > forever for the device to become available (but ability to kick the > array from cmdline if it is really dead). > > Any clues? > > >