thr3ads.net - zfs discuss - [zfs-discuss] Strange hang during snapshot receive [May 2012]

If this information is useful, please help other people find it:
Share via:

Ian Collins

2012-May-10 10:37 UTC

[zfs-discuss] Strange hang during snapshot receive

I have an application I have been using to manage data replication for a 
number of years.  Recently we started using a new machine as a staging 
server (not that new, an x4540) running Solaris 11 with a single pool 
built from 7x6 drive raidz.  No dedup and no reported errors.

On that box and nowhere else is see empty snapshots taking 17 or 18 
seconds to write.  Everywhere else they return in under a second.

Using truss and the last published source code, it looks like the pause 
is between a printf and  the call to zfs_ioctl and there aren''t any 
other functions calls between them:

100.5124     0.0004    open("/dev/zfs", O_RDWR|O_EXCL)            = 10
100.7582     0.0001    read(7, "\0\0\0\0\0\0\0\0ACCBBAF5".., 312)    =
312
100.7586     0.0000    read(7, 0x080464F8, 0)                = 0
100.7591     0.0000    time()                        = 1336628656
100.7653     0.0035    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040CF0)    = 0
100.7699     0.0022    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)    = 0
100.7740     0.0016    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040580)    = 0
100.7787     0.0026    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x080405B0)    = 0
100.7794     0.0001    write(1, " r e c e i v i n g   i n".., 75)    =
75
118.3551     0.6927    ioctl(8, ZFS_IOC_RECV, 0x08042570)        = 0
118.3596     0.0010    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)    = 0
118.3598     0.0000    time()                        = 1336628673
118.3600     0.0000    write(1, " r e c e i v e d   3 1 2".., 45)    =
45

zpool iostat (1 second interval) for the period is:

tank        12.5T  6.58T    175      0   271K      0
tank        12.5T  6.58T    176      0   299K      0
tank        12.5T  6.58T    189      0   259K      0
tank        12.5T  6.58T    156      0   231K      0
tank        12.5T  6.58T    170      0   243K      0
tank        12.5T  6.58T    252      0   295K      0
tank        12.5T  6.58T    179      0   200K      0
tank        12.5T  6.58T    214      0   258K      0
tank        12.5T  6.58T    165      0   210K      0
tank        12.5T  6.58T    154      0   178K      0
tank        12.5T  6.58T    186      0   221K      0
tank        12.5T  6.58T    184      0   215K      0
tank        12.5T  6.58T    218      0   248K      0
tank        12.5T  6.58T    175      0   228K      0
tank        12.5T  6.58T    146      0   194K      0
tank        12.5T  6.58T     99    258   209K  1.50M
tank        12.5T  6.58T    196    296   294K  1.31M
tank        12.5T  6.58T    188    130   229K   776K

Can anyone offer any insight or further debugging tips?

Thanks.

-- 
Ian.

Mike Gerdts

2012-May-10 14:01 UTC

head link

[zfs-discuss] Strange hang during snapshot receive

On Thu, May 10, 2012 at 5:37 AM, Ian Collins <ian at ianshome.com>
wrote:> I have an application I have been using to manage data replication for a
> number of years. ?Recently we started using a new machine as a staging
> server (not that new, an x4540) running Solaris 11 with a single pool built
> from 7x6 drive raidz. ?No dedup and no reported errors.
>
> On that box and nowhere else is see empty snapshots taking 17 or 18 seconds
> to write. ?Everywhere else they return in under a second.
>
> Using truss and the last published source code, it looks like the pause is
> between a printf and ?the call to zfs_ioctl and there aren''t any
other
> functions calls between them:
For each snapshot in a stream, there is one zfs_ioctl() call.  During
that time, the kernel will read the entire substream (that is, for one
snapshot) from the input file descriptor.
>
> 100.5124 ? ? 0.0004 ? ?open("/dev/zfs", O_RDWR|O_EXCL) ? ? ? ? ?
?= 10
> 100.7582 ? ? 0.0001 ? ?read(7, "\0\0\0\0\0\0\0\0ACCBBAF5".., 312)
? ?= 312
> 100.7586 ? ? 0.0000 ? ?read(7, 0x080464F8, 0) ? ? ? ? ? ? ? ?= 0
> 100.7591 ? ? 0.0000 ? ?time() ? ? ? ? ? ? ? ? ? ? ? ?= 1336628656
> 100.7653 ? ? 0.0035 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040CF0) ? ?= 0
> 100.7699 ? ? 0.0022 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900) ? ?= 0
> 100.7740 ? ? 0.0016 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040580) ? ?= 0
> 100.7787 ? ? 0.0026 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x080405B0) ? ?= 0
> 100.7794 ? ? 0.0001 ? ?write(1, " r e c e i v i n g ? i n".., 75)
? ?= 75
> 118.3551 ? ? 0.6927 ? ?ioctl(8, ZFS_IOC_RECV, 0x08042570) ? ? ? ?= 0
> 118.3596 ? ? 0.0010 ? ?ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900) ? ?= 0
> 118.3598 ? ? 0.0000 ? ?time() ? ? ? ? ? ? ? ? ? ? ? ?= 1336628673
> 118.3600 ? ? 0.0000 ? ?write(1, " r e c e i v e d ? 3 1 2".., 45)
? ?= 45
>
> zpool iostat (1 second interval) for the period is:
>
> tank ? ? ? ?12.5T ?6.58T ? ?175 ? ? ?0 ? 271K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?176 ? ? ?0 ? 299K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?189 ? ? ?0 ? 259K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?156 ? ? ?0 ? 231K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?170 ? ? ?0 ? 243K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?252 ? ? ?0 ? 295K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?179 ? ? ?0 ? 200K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?214 ? ? ?0 ? 258K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?165 ? ? ?0 ? 210K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?154 ? ? ?0 ? 178K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?186 ? ? ?0 ? 221K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?184 ? ? ?0 ? 215K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?218 ? ? ?0 ? 248K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?175 ? ? ?0 ? 228K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ?146 ? ? ?0 ? 194K ? ? ?0
> tank ? ? ? ?12.5T ?6.58T ? ? 99 ? ?258 ? 209K ?1.50M
> tank ? ? ? ?12.5T ?6.58T ? ?196 ? ?296 ? 294K ?1.31M
> tank ? ? ? ?12.5T ?6.58T ? ?188 ? ?130 ? 229K ? 776K
>
> Can anyone offer any insight or further debugging tips?
I have yet to see a time when zpool iostat tells me something useful.
I''d take a look at "iostat -xzn 1" or similar output.  It
could point
to imbalanced I/O or a particular disk that has abnormally high
service times.

Have you installed any SRUs?  If not, you could be seeing:

7060894 zfs recv is excruciatingly slow

which is fixed in Solaris 11 SRU 5.

If you are using zones and are using any https pkg(5) origins (such as
https://pkg.oracle.com/solaris/support), I suggest reading
https://forums.oracle.com/forums/thread.jspa?threadID=2380689&tstart=15
before updating to SRU 6 (SRU 5 is fine, however).  The fix for the
problem mentioned in that forums thread should show up in an upcoming
SRU via CR 7157313.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Ian Collins

2012-May-11 08:12 UTC

head link

[zfs-discuss] Strange hang during snapshot receive

On 05/11/12 02:01 AM, Mike Gerdts wrote:> On Thu, May 10, 2012 at 5:37 AM, Ian Collins<ian at ianshome.com> 
wrote:
>> I have an application I have been using to manage data replication for
a
>> number of years.  Recently we started using a new machine as a staging
>> server (not that new, an x4540) running Solaris 11 with a single pool
built
>> from 7x6 drive raidz.  No dedup and no reported errors.
>>
>> On that box and nowhere else is see empty snapshots taking 17 or 18
seconds
>> to write.  Everywhere else they return in under a second.
>>
> Have you installed any SRUs?  If not, you could be seeing:
The machine was at SRU 3.
> 7060894 zfs recv is excruciatingly slow
>
> which is fixed in Solaris 11 SRU 5.
Thanks Mike, that appears to be it.  Updating to SRU 6 fixed the
issue.> If you are using zones and are using any https pkg(5) origins (such as
> https://pkg.oracle.com/solaris/support), I suggest reading
> https://forums.oracle.com/forums/thread.jspa?threadID=2380689&tstart=15
> before updating to SRU 6 (SRU 5 is fine, however).  The fix for the
> problem mentioned in that forums thread should show up in an upcoming
> SRU via CR 7157313.
>
Luckily I have a local repository!

-- 
Ian.

zfs discuss - May 2012 - Strange hang during snapshot receive

[zfs-discuss] Strange hang during snapshot receive

[zfs-discuss] Strange hang during snapshot receive

[zfs-discuss] Strange hang during snapshot receive