thr3ads.net - zfs discuss - [zfs-discuss] Woeful performance from an iSCSI pool [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Ian Collins

2012-Nov-21 21:15 UTC

[zfs-discuss] Woeful performance from an iSCSI pool

I look after a remote server that has two iSCSI pools.  The volumes for 
each pool are sparse volumes and a while back the target''s storage 
became full, causing weird and wonderful corruption issues until they 
manges to free some space.

Since then, one pool has been reasonably OK, but the other has terrible 
performance receiving snapshots.  Despite both iSCSI devices using the 
same IP connection, iostat shows one with reasonable service times while 
the other shows really high (up to 9 seconds) service times and 100% 
busy.  This kills performance for snapshots with many random file 
removals and additions.

I''m currently zero filling the bad pool to recover space on the target 
storage to see if that improves matters.

Has anyone else seen similar behaviour with previously degraded iSCSI 
pools?

-- 
Ian.

Ian Collins

2012-Nov-22 00:57 UTC

head link

[zfs-discuss] Woeful performance from an iSCSI pool

On 11/22/12 10:15, Ian Collins wrote:> I look after a remote server that has two iSCSI pools.  The volumes for
> each pool are sparse volumes and a while back the target''s storage
> became full, causing weird and wonderful corruption issues until they
> manges to free some space.
>
> Since then, one pool has been reasonably OK, but the other has terrible
> performance receiving snapshots.  Despite both iSCSI devices using the
> same IP connection, iostat shows one with reasonable service times while
> the other shows really high (up to 9 seconds) service times and 100%
> busy.  This kills performance for snapshots with many random file
> removals and additions.
>
> I''m currently zero filling the bad pool to recover space on the
target
> storage to see if that improves matters.
>
> Has anyone else seen similar behaviour with previously degraded iSCSI
> pools?
>As a data point, both pools are being zero filled with dd.  A 30 second 
iostat sample shows one device getting more than double the write 
throughput of the other:

r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.2   64.0    0.0   50.1  0.0  5.6    0.7   87.9   4  64 
c0t600144F096C94AC700004ECD96F20001d0
5.6   44.9    0.0   18.2  0.0  5.8    0.3  115.7   2  76 
c0t600144F096C94AC700004FF354B00002d0

-- 
Ian.

Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

2012-Nov-22 13:06 UTC

head link

[zfs-discuss] Woeful performance from an iSCSI pool

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Ian Collins
> 
> I look after a remote server that has two iSCSI pools.  The volumes for
> each pool are sparse volumes and a while back the target''s storage
> became full, causing weird and wonderful corruption issues until they
> manges to free some space.
> 
> Since then, one pool has been reasonably OK, but the other has terrible
> performance receiving snapshots.  Despite both iSCSI devices using the
> same IP connection, iostat shows one with reasonable service times while
> the other shows really high (up to 9 seconds) service times and 100%
> busy.  This kills performance for snapshots with many random file
> removals and additions.
> 
> I''m currently zero filling the bad pool to recover space on the
target
> storage to see if that improves matters.
> 
> Has anyone else seen similar behaviour with previously degraded iSCSI
> pools?
This sounds exactly like the behavior I was seeing with my attempt at two
machines zpool mirror''ing each other via iscsi.  In my case, I had two
machines that are both targets and initiators.  I made the initiator service
dependent on the target service, and I made the zpool mount dependent on the
initiator service, and I made the virtualbox guest start dependent on the zpool
mount.

Everything seemed fine for a while, including some reboots.  But then one
reboot, one of my systems stayed down too long, and when it finally came back
up, both machines started choking.

So far I haven''t found any root cause, and so far the only solution
I''ve found was to reinstall the OS.  I tried everything I know in terms
of removing, forgetting, recreating the targets, initiators, and pool, but
somehow none of that was sufficient.

I recently (yesterday) got budgetary approval to dig into this more, so
hopefully maybe I''ll have some insight before too long, but
don''t hold your breath.  I could fail, and even if I don''t,
it''s likely to be weeks or months.

What I want to know from you is:

Which machines are your solaris machines?  Just the targets?  Just the
initiators?  All of them?

You say you''re having problems just with snapshots.  Are you sure
you''re not having trouble with all sorts of IO, and not just snapshots?
What about import / export?

In my case, I found I was able to zfs send, zfs receive, zfs status, all fine. 
But when I launched a guest VM, there would be a massive delay - you said up to
9 seconds - I was sometimes seeing over 30s - sometimes crashing the host
system.  And the guest OS was acting like it was getting IO error, without
actually displaying error message indicating IO error.  I would attempt, and
sometimes fail, to power off the guest vm (kill -KILL VirtualBox).  After the
failure began, zpool status still works (and reports no errors), but if I try to
do things like export/import, they fail indefinitely, and I need to power cycle
the host.  While in the failure mode, I can zpool iostat, and I sometimes see 0
transactions with nonzero bandwidth.  Which defies my understanding.

Did you ever see the iscsi targets "offline" or "degraded"
in any way?  Did you do anything like "online" or "clear?"

My systems are openindiana - the latest, I forget if that''s 151a5 or a6

Ian Collins

2012-Nov-24 23:32 UTC

head link

[zfs-discuss] Woeful performance from an iSCSI pool

Ian Collins wrote:> I look after a remote server that has two iSCSI pools.  The volumes for
> each pool are sparse volumes and a while back the target''s storage
> became full, causing weird and wonderful corruption issues until they
> manges to free some space.
>
> Since then, one pool has been reasonably OK, but the other has terrible
> performance receiving snapshots.  Despite both iSCSI devices using the
> same IP connection, iostat shows one with reasonable service times while
> the other shows really high (up to 9 seconds) service times and 100%
> busy.  This kills performance for snapshots with many random file
> removals and additions.
>
> I''m currently zero filling the bad pool to recover space on the
target
> storage to see if that improves matters.
It did. Maybe the volume''s free space had become very fragmented.

There are a couple of lessons here:

1) When using a thin provisioned volume for an iSCSI target, don''t let 
the volume''s pool become full!

2) if the pool using the iSCSI target has a lot of churn, consider zero 
filling the pool to flush out the free blocks.

-- 
Ian.

zfs discuss - Nov 2012 - Woeful performance from an iSCSI pool

[zfs-discuss] Woeful performance from an iSCSI pool

[zfs-discuss] Woeful performance from an iSCSI pool

[zfs-discuss] Woeful performance from an iSCSI pool

[zfs-discuss] Woeful performance from an iSCSI pool