thr3ads.net - zfs discuss - [zfs-discuss] Very sick iSCSI pool [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Ian Collins

2012-Jun-29 05:47 UTC

[zfs-discuss] Very sick iSCSI pool

I''m trying to work out the case a remedy for a very sick iSCSI pool on
a
Solaris 11 host.

The volume is exported from an Oracle storage appliance and there are no 
errors reported there.  The host has no entries in its logs relating to 
the network connections.

Any zfs or zpool commands the change the state of the pool (such as zfs 
mount or zpool export) hang and can''t be killed.

fmadm faulty reports:

Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d  ZFS-8000-FD    Major

Host        : taitaklsc01
Platform    : SUN-FIRE-X4170-M2-SERVER  Chassis_id  : 1142FMM02N
Product_sn  : 1142FMM02N

Fault class : fault.fs.zfs.vdev.io
Affects     : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
                   faulted but still in service
Problem in  : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
                   faulted but still in service

Description : The number of I/O errors associated with a ZFS device exceeded
                      acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD
               for more information.

The zpool status paints a very gloomy picture:

   pool: fileserver
  state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
         continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
   scan: resilver in progress since Fri Jun 29 11:59:59 2012
     858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time)
     567K resilvered, 0.00% done
config:

         NAME                                     STATE     READ WRITE CKSUM
         fileserver                               ONLINE       0 1.16M     0
           c0t600144F096C94AC700004ECD96F20001d0  ONLINE       0 
1.16M     0  (resilvering)

errors: 1557164 data errors, use ''-v'' for a list

Any ideas how to determine the cause of the problem and remedy it?

-- 
Ian.

Richard Elling

2012-Jun-29 15:01 UTC

head link

[zfs-discuss] Very sick iSCSI pool

Hi Ian,
Chapter 7 of the DTrace book has some examples of how to look at iSCSI target
and initiator behaviour.
 -- richard

On Jun 28, 2012, at 10:47 PM, Ian Collins wrote:
> I''m trying to work out the case a remedy for a very sick iSCSI
pool on a Solaris 11 host.
> 
> The volume is exported from an Oracle storage appliance and there are no
errors reported there.  The host has no entries in its logs relating to the
network connections.
> 
> Any zfs or zpool commands the change the state of the pool (such as zfs
mount or zpool export) hang and can''t be killed.
> 
> fmadm faulty reports:
> 
> Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d  ZFS-8000-FD    Major
> 
> Host        : taitaklsc01
> Platform    : SUN-FIRE-X4170-M2-SERVER  Chassis_id  : 1142FMM02N
> Product_sn  : 1142FMM02N
> 
> Fault class : fault.fs.zfs.vdev.io
> Affects     : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
>                  faulted but still in service
> Problem in  : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
>                  faulted but still in service
> 
> Description : The number of I/O errors associated with a ZFS device
exceeded
>                     acceptable levels.  Refer to
http://sun.com/msg/ZFS-8000-FD
>              for more information.
> 
> The zpool status paints a very gloomy picture:
> 
>  pool: fileserver
> state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>        continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scan: resilver in progress since Fri Jun 29 11:59:59 2012
>    858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time)
>    567K resilvered, 0.00% done
> config:
> 
>        NAME                                     STATE     READ WRITE CKSUM
>        fileserver                               ONLINE       0 1.16M     0
>          c0t600144F096C94AC700004ECD96F20001d0  ONLINE       0 1.16M     0 
(resilvering)
> 
> errors: 1557164 data errors, use ''-v'' for a list
> 
> Any ideas how to determine the cause of the problem and remedy it?
> 
> -- 
> Ian.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
ZFS Performance and Training
Richard.Elling at RichardElling.com
+1-760-896-4422







-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120629/f44bf8b1/attachment.html>

Ian Collins

2012-Jun-30 21:18 UTC

head link

[zfs-discuss] Very sick iSCSI pool

On 06/30/12 03:01 AM, Richard Elling wrote:> Hi Ian,
> Chapter 7 of the DTrace book has some examples of how to look at iSCSI 
> target
> and initiator behaviour.
Thanks Richard, I ''ll have a look.

I''m assuming the pool is hosed?
>  -- richard
>
> On Jun 28, 2012, at 10:47 PM, Ian Collins wrote:
>
>> I''m trying to work out the case a remedy for a very sick iSCSI
pool
>> on a Solaris 11 host.
>>
>> The volume is exported from an Oracle storage appliance and there are 
>> no errors reported there.  The host has no entries in its logs 
>> relating to the network connections.
>>
>> Any zfs or zpool commands the change the state of the pool (such as 
>> zfs mount or zpool export) hang and can''t be killed.
>>
>> fmadm faulty reports:
>>
>> Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d  ZFS-8000-FD 
>>    Major
>>
>> Host        : taitaklsc01
>> Platform    : SUN-FIRE-X4170-M2-SERVER  Chassis_id  : 1142FMM02N
>> Product_sn  : 1142FMM02N
>>
>> Fault class : fault.fs.zfs.vdev.io
>> Affects     : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
>>                  faulted but still in service
>> Problem in  : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
>>                  faulted but still in service
>>
>> Description : The number of I/O errors associated with a ZFS device 
>> exceeded
>>                     acceptable levels.  Refer to 
>> http://sun.com/msg/ZFS-8000-FD
>>              for more information.
>>
>> The zpool status paints a very gloomy picture:
>>
>>  pool: fileserver
>> state: ONLINE
>> status: One or more devices is currently being resilvered.  The pool
will
>>        continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>  scan: resilver in progress since Fri Jun 29 11:59:59 2012
>>    858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time)
>>    567K resilvered, 0.00% done
>> config:
>>
>>        NAME                                     STATE     READ WRITE 
>> CKSUM
>>        fileserver                               ONLINE       0 1.16M 
>>     0
>>          c0t600144F096C94AC700004ECD96F20001d0  ONLINE       0 1.16M 
>>     0  (resilvering)
>>
>> errors: 1557164 data errors, use ''-v'' for a list
>>
>> Any ideas how to determine the cause of the problem and remedy it?
>>
>> -- 
>> Ian.
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org <mailto:zfs-discuss at
opensolaris.org>
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> ZFS Performance and Training
> Richard.Elling at RichardElling.com <mailto:Richard.Elling at
RichardElling.com>
> +1-760-896-4422
>
>
>
>
>
>
>

-- 
Ian.

Fajar A. Nugraha

2012-Jun-30 22:20 UTC

head link

[zfs-discuss] Very sick iSCSI pool

On Sun, Jul 1, 2012 at 4:18 AM, Ian Collins <ian at ianshome.com>
wrote:> On 06/30/12 03:01 AM, Richard Elling wrote:
>>
>> Hi Ian,
>> Chapter 7 of the DTrace book has some examples of how to look at iSCSI
>> target
>> and initiator behaviour.
>
>
> Thanks Richard, I ''ll have a look.
>
> I''m assuming the pool is hosed?
Before making that assumption, I''d try something simple first:
- reading from the imported iscsi disk (e.g. with dd) to make sure
it''s not iscsi-related problem
- import the disk in another host, and try to read the disk again, to
make sure it''s not client-specific problem
- possibly restart the iscsi server, just to make sure

I suspect the problem is with your oracle storage appliance. But since
you say there''s no errors there, then the simple tests should make
sure whethere it''s client, disk, or zfs problem.

-- 
Fajar

Ian Collins

2012-Jul-01 08:57 UTC

head link

[zfs-discuss] Very sick iSCSI pool

On 07/ 1/12 10:20 AM, Fajar A. Nugraha wrote:> On Sun, Jul 1, 2012 at 4:18 AM, Ian Collins<ian at ianshome.com> 
wrote:
>> On 06/30/12 03:01 AM, Richard Elling wrote:
>>> Hi Ian,
>>> Chapter 7 of the DTrace book has some examples of how to look at
iSCSI
>>> target
>>> and initiator behaviour.
>>
>> Thanks Richard, I ''ll have a look.
>>
>> I''m assuming the pool is hosed?
> Before making that assumption, I''d try something simple first:
> - reading from the imported iscsi disk (e.g. with dd) to make sure
> it''s not iscsi-related problem
> - import the disk in another host, and try to read the disk again, to
> make sure it''s not client-specific problem
> - possibly restart the iscsi server, just to make sure
Booting the initiator host from a live DVD image and attempting to 
import the pool gives the same error report.> I suspect the problem is with your oracle storage appliance. But since
> you say there''s no errors there, then the simple tests should make
> sure whethere it''s client, disk, or zfs problem.
>So did I.

I''ll get the admin for that system to dig a little deeper and export a 
new volume to see if I can create a new pool.

-- 
Ian.

Ian Collins

2012-Jul-03 04:08 UTC

head link

[zfs-discuss] Very sick iSCSI pool

On 07/ 1/12 08:57 PM, Ian Collins wrote:> On 07/ 1/12 10:20 AM, Fajar A. Nugraha wrote:
>> On Sun, Jul 1, 2012 at 4:18 AM, Ian Collins<ian at ianshome.com> 
wrote:
>>> On 06/30/12 03:01 AM, Richard Elling wrote:
>>>> Hi Ian,
>>>> Chapter 7 of the DTrace book has some examples of how to look
at iSCSI
>>>> target
>>>> and initiator behaviour.
>>> Thanks Richard, I ''ll have a look.
>>>
>>> I''m assuming the pool is hosed?
>> Before making that assumption, I''d try something simple first:
>> - reading from the imported iscsi disk (e.g. with dd) to make sure
>> it''s not iscsi-related problem
>> - import the disk in another host, and try to read the disk again, to
>> make sure it''s not client-specific problem
>> - possibly restart the iscsi server, just to make sure
> Booting the initiator host from a live DVD image and attempting to
> import the pool gives the same error report.
The pool''s data appears to be recoverable when I import it read only.

The storage appliance is so full they can''t delete files from it!  Now 
that shouldn''t have caused problems with a fixed sized volume, but who 
knows?

-- 
Ian.

Fajar A. Nugraha

2012-Jul-03 04:21 UTC

head link

[zfs-discuss] Very sick iSCSI pool

On Tue, Jul 3, 2012 at 11:08 AM, Ian Collins <ian at ianshome.com>
wrote:>>>> I''m assuming the pool is hosed?
>>>
>>> Before making that assumption, I''d try something simple
first:
>>> - reading from the imported iscsi disk (e.g. with dd) to make sure
>>> it''s not iscsi-related problem
>>> - import the disk in another host, and try to read the disk again,
to
>>> make sure it''s not client-specific problem
>>> - possibly restart the iscsi server, just to make sure
>>
>> Booting the initiator host from a live DVD image and attempting to
>> import the pool gives the same error report.
>
>
> The pool''s data appears to be recoverable when I import it read
only.
That''s good
>
> The storage appliance is so full they can''t delete files from it!
Hahaha :D
>  Now that
> shouldn''t have caused problems with a fixed sized volume, but who
knows?
AFAIK you''ll always need space, e.g. to replay/rollback transactions
during pool import.

The best way is, of course, fix the appliance. Sometimes something
simple like deleting snapshots/datasets will do the trick.

-- 
Fajar

zfs discuss - Jun 2012 - Very sick iSCSI pool

[zfs-discuss] Very sick iSCSI pool

[zfs-discuss] Very sick iSCSI pool

[zfs-discuss] Very sick iSCSI pool

[zfs-discuss] Very sick iSCSI pool

[zfs-discuss] Very sick iSCSI pool

[zfs-discuss] Very sick iSCSI pool

[zfs-discuss] Very sick iSCSI pool