thr3ads.net - zfs discuss - [zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list

If this information is useful, please help other people find it:
Share via:

Lachlan Mulcahy

2011-Dec-05 18:47 UTC

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

Hi All,

We are currently doing a zfs send/recv with mbuffer to send incremental
changes across and it seems to be running quite slowly, with zfs receive
the apparent bottle neck.

The process itself seems to be using almost 100% of a single CPU in
"sys"
time.

Wondering if anyone has any ideas if this is normal or if this is just
going to run forever and never finish...


details - two machines connected via Gigabit Ethernet on the same LAN.

Sending server:

zfs send -i 20111201_1 data at 20111205_1 | mbuffer -s 128k -m 1G -O
tdp03r-int:9090

Receiving server:

mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data

mbuffer showing:

in @  256 KiB/s, out @  256 KiB/s,  306 GiB total, buffer 100% ful



My debug:

DTraceToolkit hotkernel reports:

zfs`lzjb_decompress                                        10   0.0%
unix`page_nextn                                            31   0.0%
genunix`fsflush_do_pages                                   37   0.0%
zfs`dbuf_free_range                                       183   0.1%
genunix`list_next                                        5822   3.7%
unix`mach_cpu_idle                                     150261  96.1%


Top shows:

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 22945 root        1  60    0   13M 3004K cpu/6  144:21  3.79% zfs
   550 root       28  59    0   39M   22M sleep   10:19  0.06% fmd

I''d say the 3.7% or so here is so low because we are providing not per
CPU,
but aggregate CPU usage. mpstat seems to show the real story.

mpstat 1 shows output much like this each second:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0   329  108   83    0   17    3    0     0    0   0   0 100
  1    0   0    0   100    1   94    0   23    1    0     0    0   0   0 100
  2    0   0    0    32    0   28    0    5    1    0     0    0   0   0 100
  3    0   0    0    18    0   11    0    0    0    0     0    0   0   0 100
  4    0   0    0    16    6   10    0    2    0    0     0    0   0   0 100
  5    0   0    0     6    0    2    0    0    0    0     0    0   0   0 100
  6    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
  7    0   0    0     9    0    4    0    0    0    0    16    0   0   0 100
  8    0   0    0     6    0    3    0    0    0    0     0    0   3   0  97
  9    0   0    0     3    1    0    0    0    0    0     0    0   0   0 100
 10    0   0    0    22    2   35    0    1    1    0     0    0  89   0  11
 11    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 12    0   0    0     3    0    2    0    1    0    0     2    0   0   0 100
 13    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 14    0   0    0    24   17    6    0    0    2    0    61    0   0   0 100
 15    0   0    0    14    0   24    0    0    1    0     2    0   0   0 100
 16    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 17    0   0    0    10    2    8    0    0    5    0    78    0   1   0  99
 18    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 19    0   0    0     5    1    2    0    0    0    0    10    0   0   0 100
 20    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 21    0   0    0     9    2    4    0    0    0    0     4    0   0   0 100
 22    0   0    0     4    0    0    0    0    0    0     0    0   0   0 100
 23    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100


So I''m lead to believe that zfs receive is spending almost 100% of a
single
CPUs time doing a lot of genunix`list_next ...

Any ideas what is going on here?

Best Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/707f9108/attachment.html>

Bob Friesenhahn

2011-Dec-05 19:19 UTC

head link

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:> genunix`list_next??????????????????????????????????????? 5822?? 3.7%
> unix`mach_cpu_idle???????????????????????????????????? 150261? 96.1%
Rather idle.
> Top shows:
> 
> ?? PID USERNAME NLWP PRI NICE? SIZE?? RES STATE??? TIME??? CPU COMMAND
> ?22945 root??????? 1? 60??? 0?? 13M 3004K cpu/6? 144:21? 3.79% zfs
> ?? 550 root?????? 28? 59??? 0?? 39M?? 22M sleep?? 10:19? 0.06% fmd
Having ''fmd'' (fault monitor daemon) show up in top at all is
rather
ominous since it implies that faults are actively being reported.  You 
might want to look into what is making it active.
> So I''m lead to believe that zfs receive is spending almost 100% of
a single CPUs time doing a lot of
> genunix`list_next ...
Or maybe it is only doing 3.7% of this.  There seems to be a whole lot 
of nothing going on.
> Any ideas what is going on here?
Definitely check if low-level faults are being reported to fmd.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bill Sommerfeld

2011-Dec-05 19:27 UTC

head link

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

On 12/05/11 10:47, Lachlan Mulcahy wrote:> zfs`lzjb_decompress                                        10   0.0%
> unix`page_nextn                                            31   0.0%
> genunix`fsflush_do_pages                                   37   0.0%
> zfs`dbuf_free_range                                       183   0.1%
> genunix`list_next                                        5822   3.7%
> unix`mach_cpu_idle                                     150261  96.1%
your best bet in a situation like this -- where there''s a lot of cpu
time
spent in a generic routine -- is to use an alternate profiling method that
shows complete stack traces rather than just the top function on the stack.

often the names of functions two or three or four deep in the stack will point
at what''s really responsible.

something as simple as:

dtrace -n ''profile-1001 { @[stack()] = count(); }''

(let it run for a bit then interrupt it).

should show who''s calling list_next() so much.

						- Bill

Lachlan Mulcahy

2011-Dec-05 20:08 UTC

head link

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

Hi Bob,

On Mon, Dec 5, 2011 at 11:19 AM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:
>
>> genunix`list_next             **                           5822   3.7%
>> unix`mach_cpu_idle            **                         150261  96.1%
>>
>
> Rather idle.

Actually this is a bit misleading since it is averaged over all the cores
in the system.

This system has 24 cores and the approximate 90% used on a single core in
system time can be shown to be 3.75%

90 / 24 = 3.75

Similarly 23/24 cores are close to 100% idle.

 Top shows:>
>    PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>  22945 root        1  60    0   13M 3004K cpu/6  144:21  3.79% zfs
>    550 root       28  59    0   39M   22M sleep   10:19  0.06% fmd
>
Having ''fmd'' (fault monitor daemon) show up in top at all is
rather ominous> since it implies that faults are actively being reported.  You might want
> to look into what is making it active.

Nothing in /var/adm/messages as this is the last two days entries:

Dec  4 18:46:20 mslvstdp03r sshd[20926]: [ID 800047 auth.crit] fatal: Read
from socket failed: Connection reset by peer
Dec  5 01:40:07 mslvstdp03r sshd[21808]: [ID 800047 auth.crit] fatal: Read
from socket failed: Connection reset by peer


Anything else you suggest I''d check for faults? (Though I''m
sort of
doubting it is an issue, I''m happy to be thorough)



 So I''m lead to believe that zfs receive is spending almost 100% of
a> single CPUs time doing a lot of
> genunix`list_next ...
>
Or maybe it is only doing 3.7% of this.  There seems to be a whole lot
of> nothing going on.

See the math above -- also mpstat shows a single CPU at around 90% system
time and since top reports zfs as the only active process the
circumstantial evidence is fairly indicative.

> Any ideas what is going on here?
>
Definitely check if low-level faults are being reported to
fmd.>
Any logs other than /var/adm/messages I should check (apologies, I''m
quite
new to (open)solaris )

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/f92dc399/attachment.html>

Bob Friesenhahn

2011-Dec-05 20:31 UTC

head link

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:> 
> Anything else you suggest I''d check for faults? (Though
I''m sort of doubting it is an issue, I''m happy to be
> thorough)
Try running

   fmdump -ef

and see if new low-level fault events are comming in during the zfs 
receive.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Lachlan Mulcahy

2011-Dec-05 21:06 UTC

head link

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

Hi Bob,


On Mon, Dec 5, 2011 at 12:31 PM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:
>
>>
>> Anything else you suggest I''d check for faults? (Though
I''m sort of
>> doubting it is an issue, I''m happy to be
>> thorough)
>>
>
> Try running
>
>  fmdump -ef
>
> and see if new low-level fault events are comming in during the zfs
> receive.
>
>Just a bunch of what I would guess is unrelated USB errors?

Dec 05 20:58:09.0246 ereport.io.usb.epse
Dec 05 20:58:50.9207 ereport.io.usb.epse
Dec 05 20:59:36.0242 ereport.io.usb.epse
Dec 05 20:59:39.0230 ereport.io.usb.epse
Dec 05 21:00:21.0223 ereport.io.usb.epse
Dec 05 21:01:06.0215 ereport.io.usb.epse
Dec 05 21:01:09.0314 ereport.io.usb.epse
Dec 05 21:01:50.9213 ereport.io.usb.epse
Dec 05 21:02:36.0299 ereport.io.usb.epse
Dec 05 21:02:39.0298 ereport.io.usb.epse

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/7e0d2463/attachment.html>

Lachlan Mulcahy

2011-Dec-06 01:16 UTC

head link

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

Hi All,

Just a follow up - it seems like whatever it was doing it eventually got
done with and the speed picked back up again. The send/recv finally
finished -- I guess I could do with a little patience :)

Lachlan

On Mon, Dec 5, 2011 at 10:47 AM, Lachlan Mulcahy <lmulcahy at
marinsoftware.com> wrote:
> Hi All,
>
> We are currently doing a zfs send/recv with mbuffer to send incremental
> changes across and it seems to be running quite slowly, with zfs receive
> the apparent bottle neck.
>
> The process itself seems to be using almost 100% of a single CPU in
"sys"
> time.
>
> Wondering if anyone has any ideas if this is normal or if this is just
> going to run forever and never finish...
>
>
> details - two machines connected via Gigabit Ethernet on the same LAN.
>
> Sending server:
>
> zfs send -i 20111201_1 data at 20111205_1 | mbuffer -s 128k -m 1G -O
> tdp03r-int:9090
>
> Receiving server:
>
> mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data
>
> mbuffer showing:
>
> in @  256 KiB/s, out @  256 KiB/s,  306 GiB total, buffer 100% ful
>
>
>
> My debug:
>
> DTraceToolkit hotkernel reports:
>
> zfs`lzjb_decompress                                        10   0.0%
> unix`page_nextn                                            31   0.0%
> genunix`fsflush_do_pages                                   37   0.0%
> zfs`dbuf_free_range                                       183   0.1%
> genunix`list_next                                        5822   3.7%
> unix`mach_cpu_idle                                     150261  96.1%
>
>
> Top shows:
>
>    PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>  22945 root        1  60    0   13M 3004K cpu/6  144:21  3.79% zfs
>    550 root       28  59    0   39M   22M sleep   10:19  0.06% fmd
>
> I''d say the 3.7% or so here is so low because we are providing not
per
> CPU, but aggregate CPU usage. mpstat seems to show the real story.
>
> mpstat 1 shows output much like this each second:
>
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
> idl
>   0    0   0    0   329  108   83    0   17    3    0     0    0   0   0
> 100
>   1    0   0    0   100    1   94    0   23    1    0     0    0   0   0
> 100
>   2    0   0    0    32    0   28    0    5    1    0     0    0   0   0
> 100
>   3    0   0    0    18    0   11    0    0    0    0     0    0   0   0
> 100
>   4    0   0    0    16    6   10    0    2    0    0     0    0   0   0
> 100
>   5    0   0    0     6    0    2    0    0    0    0     0    0   0   0
> 100
>   6    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>   7    0   0    0     9    0    4    0    0    0    0    16    0   0   0
> 100
>   8    0   0    0     6    0    3    0    0    0    0     0    0   3   0
> 97
>   9    0   0    0     3    1    0    0    0    0    0     0    0   0   0
> 100
>  10    0   0    0    22    2   35    0    1    1    0     0    0  89   0
> 11
>  11    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  12    0   0    0     3    0    2    0    1    0    0     2    0   0   0
> 100
>  13    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  14    0   0    0    24   17    6    0    0    2    0    61    0   0   0
> 100
>  15    0   0    0    14    0   24    0    0    1    0     2    0   0   0
> 100
>  16    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  17    0   0    0    10    2    8    0    0    5    0    78    0   1   0
> 99
>  18    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  19    0   0    0     5    1    2    0    0    0    0    10    0   0   0
> 100
>  20    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>  21    0   0    0     9    2    4    0    0    0    0     4    0   0   0
> 100
>  22    0   0    0     4    0    0    0    0    0    0     0    0   0   0
> 100
>  23    0   0    0     2    0    0    0    0    0    0     0    0   0   0
> 100
>
>
> So I''m lead to believe that zfs receive is spending almost 100% of
a
> single CPUs time doing a lot of genunix`list_next ...
>
> Any ideas what is going on here?
>
> Best Regards,
> --
> Lachlan Mulcahy
> Senior DBA,
> Marin Software Inc.
> San Francisco, USA
>
> AU Mobile: +61 458 448 721
> US Mobile: +1 (415) 867 2839
> Office : +1 (415) 671 6080
>
>

-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/b7caa999/attachment-0001.html>

zfs discuss - Dec 2011 - zfs receive slowness - lots of systime spent in genunix`list_next ?

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?

[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?