thr3ads.net - zfs discuss - [zfs-discuss] ZFS list snapshots incurs large delay [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Brent Jones

2010-Jul-13 17:44 UTC

[zfs-discuss] ZFS list snapshots incurs large delay

I have been running a pair of X4540''s for almost 2 years now, the
usual spec (Quad core, 64GB RAM, 48x 1TB).
I have a pair of mirrored drives for rpool, and a Raidz set with 5-6
disks in each vdev for the rest of the disks.
I am running snv_132 on both systems.

I noticed an oddity on one particular system, that when running a
scrub, or a zfs list -t snapshot, the results take forever.
Mind you, these are identical systems in hardware, and software. The
primary system replicates all data sets to the secondary nightly, so
there isn''t much of a discrepancy of space used.

Primary system:
# time zfs list -t snapshot | wc -l
979

real    1m23.995s
user    0m0.360s
sys     0m4.911s

Secondary system:
# time zfs list -t snapshot | wc -l
979

real    0m1.534s
user    0m0.223s
sys     0m0.663s


At the time of running both of those, no other activity was happening,
load average of .05 or so. Subsequent runs also take just as long on
the primary, no matter how many times I run it, it will take about 1
minute and 25 seconds each time, very little drift (+- 1 second if
that)

Both systems are at about 77% used space on the storage pool, no other
distinguishing factors that I can discern.
Upon a reboot, performance is respectable for a little while, but
within days, it will sink back to those levels. I suspect a memory
leak, but both systems run the same software versions and packages, so
I can''t envision that.

Would anyone have any ideas what may cause this?

-- 
Brent Jones
brent at servuhome.net

Giovanni Tirloni

2010-Jul-13 19:03 UTC

head link

[zfs-discuss] [osol-help] ZFS list snapshots incurs large delay

On Tue, Jul 13, 2010 at 2:44 PM, Brent Jones <brent at servuhome.net>
wrote:> I have been running a pair of X4540''s for almost 2 years now, the
> usual spec (Quad core, 64GB RAM, 48x 1TB).
> I have a pair of mirrored drives for rpool, and a Raidz set with 5-6
> disks in each vdev for the rest of the disks.
> I am running snv_132 on both systems.
>
> I noticed an oddity on one particular system, that when running a
> scrub, or a zfs list -t snapshot, the results take forever.
> Mind you, these are identical systems in hardware, and software. The
> primary system replicates all data sets to the secondary nightly, so
> there isn''t much of a discrepancy of space used.
>
> Primary system:
> # time zfs list -t snapshot | wc -l
> 979
>
> real ? ?1m23.995s
> user ? ?0m0.360s
> sys ? ? 0m4.911s
>
> Secondary system:
> # time zfs list -t snapshot | wc -l
> 979
>
> real ? ?0m1.534s
> user ? ?0m0.223s
> sys ? ? 0m0.663s
>
>
> At the time of running both of those, no other activity was happening,
> load average of .05 or so. Subsequent runs also take just as long on
> the primary, no matter how many times I run it, it will take about 1
> minute and 25 seconds each time, very little drift (+- 1 second if
> that)
>
> Both systems are at about 77% used space on the storage pool, no other
> distinguishing factors that I can discern.
> Upon a reboot, performance is respectable for a little while, but
> within days, it will sink back to those levels. I suspect a memory
> leak, but both systems run the same software versions and packages, so
> I can''t envision that.
>
> Would anyone have any ideas what may cause this?
It could be a disk failing and dragging I/O down with it.

Try to check for high asvc_t with `iostat -XCn 1` and errors in `iostat -En`

Any timeouts or retries in /var/adm/messages ?

-- 
Giovanni Tirloni
gtirloni at sysdroid.com

Brent Jones

2010-Jul-13 19:15 UTC

head link

[zfs-discuss] [osol-help] ZFS list snapshots incurs large delay

>
> It could be a disk failing and dragging I/O down with it.
>
> Try to check for high asvc_t with `iostat -XCn 1` and errors in `iostat
-En`
>
> Any timeouts or retries in /var/adm/messages ?
>
> --
> Giovanni Tirloni
> gtirloni at sysdroid.com
>
I checked for high service times during a scrub, and all disks are
pretty equal.During a scrub, each disks peaks about 350 reads/sec,
with an asvc time of up to 30 during those read spikes (I assume it
means 30ms, which isn''t terrible for a highly loaded SATA disk).
No errors reported by smartctl, iostat, or adm/messages

I opened a case on Sunsolve, but I fear since I am running a dev build
that I will be out of luck. I cannot run 2009.06 due to CIFS
segfaults, and problems with zfs send/recv hanging pools (well
documented issues).
I''d run Solaris proper, but not having in-kernel CIFS or COMSTAR would
be a major setback for me.



-- 
Brent Jones
brent at servuhome.net

zfs discuss - Jul 2010 - ZFS list snapshots incurs large delay

[zfs-discuss] ZFS list snapshots incurs large delay

[zfs-discuss] [osol-help] ZFS list snapshots incurs large delay

[zfs-discuss] [osol-help] ZFS list snapshots incurs large delay