On Thu, May 10, 2012 at 5:37 AM, Ian Collins <ian at ianshome.com>
wrote:> I have an application I have been using to manage data replication for a
> number of years. ?Recently we started using a new machine as a staging
> server (not that new, an x4540) running Solaris 11 with a single pool built
> from 7x6 drive raidz. ?No dedup and no reported errors.
>
> On that box and nowhere else is see empty snapshots taking 17 or 18 seconds
> to write. ?Everywhere else they return in under a second.
>
> Using truss and the last published source code, it looks like the pause is
> between a printf and ?the call to zfs_ioctl and there aren''t any
other
> functions calls between them:
For each snapshot in a stream, there is one zfs_ioctl() call. During
that time, the kernel will read the entire substream (that is, for one
snapshot) from the input file descriptor.
>
> 100.5124 ? ? 0.0004 ? ?open("/dev/zfs", O_RDWR|O_EXCL) ? ? ? ? ?
?= 10
> 100.7582 ? ? 0.0001 ? ?read(7, "\0\0\0\0\0\0\0\0ACCBBAF5".., 312)
? ?= 312
> 100.7586 ? ? 0.0000 ? ?read(7, 0x080464F8, 0) ? ? ? ? ? ? ? ?= 0
> 100.7591 ? ? 0.0000 ? ?time() ? ? ? ? ? ? ? ? ? ? ? ?= 1336628656
> 100.7653 ? ? 0.0035 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040CF0) ? ?= 0
> 100.7699 ? ? 0.0022 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900) ? ?= 0
> 100.7740 ? ? 0.0016 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040580) ? ?= 0
> 100.7787 ? ? 0.0026 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x080405B0) ? ?= 0
> 100.7794 ? ? 0.0001 ? ?write(1, " r e c e i v i n g ? i n".., 75)
? ?= 75
> 118.3551 ? ? 0.6927 ? ?ioctl(8, ZFS_IOC_RECV, 0x08042570) ? ? ? ?= 0
> 118.3596 ? ? 0.0010 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900) ? ?= 0
> 118.3598 ? ? 0.0000 ? ?time() ? ? ? ? ? ? ? ? ? ? ? ?= 1336628673
> 118.3600 ? ? 0.0000 ? ?write(1, " r e c e i v e d ? 3 1 2".., 45)
? ?= 45
>
> zpool iostat (1 second interval) for the period is:
>
> tank ? ? ? ?12.5T ?6.58T ? ?175 ? ? ?0 ? 271K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?176 ? ? ?0 ? 299K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?189 ? ? ?0 ? 259K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?156 ? ? ?0 ? 231K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?170 ? ? ?0 ? 243K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?252 ? ? ?0 ? 295K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?179 ? ? ?0 ? 200K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?214 ? ? ?0 ? 258K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?165 ? ? ?0 ? 210K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?154 ? ? ?0 ? 178K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?186 ? ? ?0 ? 221K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?184 ? ? ?0 ? 215K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?218 ? ? ?0 ? 248K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?175 ? ? ?0 ? 228K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?146 ? ? ?0 ? 194K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ? 99 ? ?258 ? 209K ?1.50M
> tank ? ? ? ?12.5T ?6.58T ? ?196 ? ?296 ? 294K ?1.31M
> tank ? ? ? ?12.5T ?6.58T ? ?188 ? ?130 ? 229K ? 776K
>
> Can anyone offer any insight or further debugging tips?
I have yet to see a time when zpool iostat tells me something useful.
I''d take a look at "iostat -xzn 1" or similar output. It
could point
to imbalanced I/O or a particular disk that has abnormally high
service times.
Have you installed any SRUs? If not, you could be seeing:
7060894 zfs recv is excruciatingly slow
which is fixed in Solaris 11 SRU 5.
If you are using zones and are using any https pkg(5) origins (such as
https://pkg.oracle.com/solaris/support), I suggest reading
https://forums.oracle.com/forums/thread.jspa?threadID=2380689&tstart=15
before updating to SRU 6 (SRU 5 is fine, however). The fix for the
problem mentioned in that forums thread should show up in an upcoming
SRU via CR 7157313.
--
Mike Gerdts
http://mgerdts.blogspot.com/