Lachlan Mulcahy
2011-Dec-05 18:47 UTC
[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?
Hi All, We are currently doing a zfs send/recv with mbuffer to send incremental changes across and it seems to be running quite slowly, with zfs receive the apparent bottle neck. The process itself seems to be using almost 100% of a single CPU in "sys" time. Wondering if anyone has any ideas if this is normal or if this is just going to run forever and never finish... details - two machines connected via Gigabit Ethernet on the same LAN. Sending server: zfs send -i 20111201_1 data at 20111205_1 | mbuffer -s 128k -m 1G -O tdp03r-int:9090 Receiving server: mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data mbuffer showing: in @ 256 KiB/s, out @ 256 KiB/s, 306 GiB total, buffer 100% ful My debug: DTraceToolkit hotkernel reports: zfs`lzjb_decompress 10 0.0% unix`page_nextn 31 0.0% genunix`fsflush_do_pages 37 0.0% zfs`dbuf_free_range 183 0.1% genunix`list_next 5822 3.7% unix`mach_cpu_idle 150261 96.1% Top shows: PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND 22945 root 1 60 0 13M 3004K cpu/6 144:21 3.79% zfs 550 root 28 59 0 39M 22M sleep 10:19 0.06% fmd I''d say the 3.7% or so here is so low because we are providing not per CPU, but aggregate CPU usage. mpstat seems to show the real story. mpstat 1 shows output much like this each second: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 0 329 108 83 0 17 3 0 0 0 0 0 100 1 0 0 0 100 1 94 0 23 1 0 0 0 0 0 100 2 0 0 0 32 0 28 0 5 1 0 0 0 0 0 100 3 0 0 0 18 0 11 0 0 0 0 0 0 0 0 100 4 0 0 0 16 6 10 0 2 0 0 0 0 0 0 100 5 0 0 0 6 0 2 0 0 0 0 0 0 0 0 100 6 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 7 0 0 0 9 0 4 0 0 0 0 16 0 0 0 100 8 0 0 0 6 0 3 0 0 0 0 0 0 3 0 97 9 0 0 0 3 1 0 0 0 0 0 0 0 0 0 100 10 0 0 0 22 2 35 0 1 1 0 0 0 89 0 11 11 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 12 0 0 0 3 0 2 0 1 0 0 2 0 0 0 100 13 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 14 0 0 0 24 17 6 0 0 2 0 61 0 0 0 100 15 0 0 0 14 0 24 0 0 1 0 2 0 0 0 100 16 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 17 0 0 0 10 2 8 0 0 5 0 78 0 1 0 99 18 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 19 0 0 0 5 1 2 0 0 0 0 10 0 0 0 100 20 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 21 0 0 0 9 2 4 0 0 0 0 4 0 0 0 100 22 0 0 0 4 0 0 0 0 0 0 0 0 0 0 100 23 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 So I''m lead to believe that zfs receive is spending almost 100% of a single CPUs time doing a lot of genunix`list_next ... Any ideas what is going on here? Best Regards, -- Lachlan Mulcahy Senior DBA, Marin Software Inc. San Francisco, USA AU Mobile: +61 458 448 721 US Mobile: +1 (415) 867 2839 Office : +1 (415) 671 6080 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/707f9108/attachment.html>
Bob Friesenhahn
2011-Dec-05 19:19 UTC
[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?
On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:> genunix`list_next??????????????????????????????????????? 5822?? 3.7% > unix`mach_cpu_idle???????????????????????????????????? 150261? 96.1%Rather idle.> Top shows: > > ?? PID USERNAME NLWP PRI NICE? SIZE?? RES STATE??? TIME??? CPU COMMAND > ?22945 root??????? 1? 60??? 0?? 13M 3004K cpu/6? 144:21? 3.79% zfs > ?? 550 root?????? 28? 59??? 0?? 39M?? 22M sleep?? 10:19? 0.06% fmdHaving ''fmd'' (fault monitor daemon) show up in top at all is rather ominous since it implies that faults are actively being reported. You might want to look into what is making it active.> So I''m lead to believe that zfs receive is spending almost 100% of a single CPUs time doing a lot of > genunix`list_next ...Or maybe it is only doing 3.7% of this. There seems to be a whole lot of nothing going on.> Any ideas what is going on here?Definitely check if low-level faults are being reported to fmd. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bill Sommerfeld
2011-Dec-05 19:27 UTC
[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?
On 12/05/11 10:47, Lachlan Mulcahy wrote:> zfs`lzjb_decompress 10 0.0% > unix`page_nextn 31 0.0% > genunix`fsflush_do_pages 37 0.0% > zfs`dbuf_free_range 183 0.1% > genunix`list_next 5822 3.7% > unix`mach_cpu_idle 150261 96.1%your best bet in a situation like this -- where there''s a lot of cpu time spent in a generic routine -- is to use an alternate profiling method that shows complete stack traces rather than just the top function on the stack. often the names of functions two or three or four deep in the stack will point at what''s really responsible. something as simple as: dtrace -n ''profile-1001 { @[stack()] = count(); }'' (let it run for a bit then interrupt it). should show who''s calling list_next() so much. - Bill
Lachlan Mulcahy
2011-Dec-05 20:08 UTC
[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?
Hi Bob, On Mon, Dec 5, 2011 at 11:19 AM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Mon, 5 Dec 2011, Lachlan Mulcahy wrote: > >> genunix`list_next ** 5822 3.7% >> unix`mach_cpu_idle ** 150261 96.1% >> > > Rather idle.Actually this is a bit misleading since it is averaged over all the cores in the system. This system has 24 cores and the approximate 90% used on a single core in system time can be shown to be 3.75% 90 / 24 = 3.75 Similarly 23/24 cores are close to 100% idle. Top shows:> > PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND > 22945 root 1 60 0 13M 3004K cpu/6 144:21 3.79% zfs > 550 root 28 59 0 39M 22M sleep 10:19 0.06% fmd >Having ''fmd'' (fault monitor daemon) show up in top at all is rather ominous> since it implies that faults are actively being reported. You might want > to look into what is making it active.Nothing in /var/adm/messages as this is the last two days entries: Dec 4 18:46:20 mslvstdp03r sshd[20926]: [ID 800047 auth.crit] fatal: Read from socket failed: Connection reset by peer Dec 5 01:40:07 mslvstdp03r sshd[21808]: [ID 800047 auth.crit] fatal: Read from socket failed: Connection reset by peer Anything else you suggest I''d check for faults? (Though I''m sort of doubting it is an issue, I''m happy to be thorough) So I''m lead to believe that zfs receive is spending almost 100% of a> single CPUs time doing a lot of > genunix`list_next ... >Or maybe it is only doing 3.7% of this. There seems to be a whole lot of> nothing going on.See the math above -- also mpstat shows a single CPU at around 90% system time and since top reports zfs as the only active process the circumstantial evidence is fairly indicative.> Any ideas what is going on here? >Definitely check if low-level faults are being reported to fmd.>Any logs other than /var/adm/messages I should check (apologies, I''m quite new to (open)solaris ) Regards, -- Lachlan Mulcahy Senior DBA, Marin Software Inc. San Francisco, USA AU Mobile: +61 458 448 721 US Mobile: +1 (415) 867 2839 Office : +1 (415) 671 6080 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/f92dc399/attachment.html>
Bob Friesenhahn
2011-Dec-05 20:31 UTC
[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?
On Mon, 5 Dec 2011, Lachlan Mulcahy wrote:> > Anything else you suggest I''d check for faults? (Though I''m sort of doubting it is an issue, I''m happy to be > thorough)Try running fmdump -ef and see if new low-level fault events are comming in during the zfs receive. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Lachlan Mulcahy
2011-Dec-05 21:06 UTC
[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?
Hi Bob, On Mon, Dec 5, 2011 at 12:31 PM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Mon, 5 Dec 2011, Lachlan Mulcahy wrote: > >> >> Anything else you suggest I''d check for faults? (Though I''m sort of >> doubting it is an issue, I''m happy to be >> thorough) >> > > Try running > > fmdump -ef > > and see if new low-level fault events are comming in during the zfs > receive. > >Just a bunch of what I would guess is unrelated USB errors? Dec 05 20:58:09.0246 ereport.io.usb.epse Dec 05 20:58:50.9207 ereport.io.usb.epse Dec 05 20:59:36.0242 ereport.io.usb.epse Dec 05 20:59:39.0230 ereport.io.usb.epse Dec 05 21:00:21.0223 ereport.io.usb.epse Dec 05 21:01:06.0215 ereport.io.usb.epse Dec 05 21:01:09.0314 ereport.io.usb.epse Dec 05 21:01:50.9213 ereport.io.usb.epse Dec 05 21:02:36.0299 ereport.io.usb.epse Dec 05 21:02:39.0298 ereport.io.usb.epse Regards, -- Lachlan Mulcahy Senior DBA, Marin Software Inc. San Francisco, USA AU Mobile: +61 458 448 721 US Mobile: +1 (415) 867 2839 Office : +1 (415) 671 6080 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/7e0d2463/attachment.html>
Lachlan Mulcahy
2011-Dec-06 01:16 UTC
[zfs-discuss] zfs receive slowness - lots of systime spent in genunix`list_next ?
Hi All, Just a follow up - it seems like whatever it was doing it eventually got done with and the speed picked back up again. The send/recv finally finished -- I guess I could do with a little patience :) Lachlan On Mon, Dec 5, 2011 at 10:47 AM, Lachlan Mulcahy <lmulcahy at marinsoftware.com> wrote:> Hi All, > > We are currently doing a zfs send/recv with mbuffer to send incremental > changes across and it seems to be running quite slowly, with zfs receive > the apparent bottle neck. > > The process itself seems to be using almost 100% of a single CPU in "sys" > time. > > Wondering if anyone has any ideas if this is normal or if this is just > going to run forever and never finish... > > > details - two machines connected via Gigabit Ethernet on the same LAN. > > Sending server: > > zfs send -i 20111201_1 data at 20111205_1 | mbuffer -s 128k -m 1G -O > tdp03r-int:9090 > > Receiving server: > > mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data > > mbuffer showing: > > in @ 256 KiB/s, out @ 256 KiB/s, 306 GiB total, buffer 100% ful > > > > My debug: > > DTraceToolkit hotkernel reports: > > zfs`lzjb_decompress 10 0.0% > unix`page_nextn 31 0.0% > genunix`fsflush_do_pages 37 0.0% > zfs`dbuf_free_range 183 0.1% > genunix`list_next 5822 3.7% > unix`mach_cpu_idle 150261 96.1% > > > Top shows: > > PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND > 22945 root 1 60 0 13M 3004K cpu/6 144:21 3.79% zfs > 550 root 28 59 0 39M 22M sleep 10:19 0.06% fmd > > I''d say the 3.7% or so here is so low because we are providing not per > CPU, but aggregate CPU usage. mpstat seems to show the real story. > > mpstat 1 shows output much like this each second: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt > idl > 0 0 0 0 329 108 83 0 17 3 0 0 0 0 0 > 100 > 1 0 0 0 100 1 94 0 23 1 0 0 0 0 0 > 100 > 2 0 0 0 32 0 28 0 5 1 0 0 0 0 0 > 100 > 3 0 0 0 18 0 11 0 0 0 0 0 0 0 0 > 100 > 4 0 0 0 16 6 10 0 2 0 0 0 0 0 0 > 100 > 5 0 0 0 6 0 2 0 0 0 0 0 0 0 0 > 100 > 6 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 7 0 0 0 9 0 4 0 0 0 0 16 0 0 0 > 100 > 8 0 0 0 6 0 3 0 0 0 0 0 0 3 0 > 97 > 9 0 0 0 3 1 0 0 0 0 0 0 0 0 0 > 100 > 10 0 0 0 22 2 35 0 1 1 0 0 0 89 0 > 11 > 11 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 12 0 0 0 3 0 2 0 1 0 0 2 0 0 0 > 100 > 13 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 14 0 0 0 24 17 6 0 0 2 0 61 0 0 0 > 100 > 15 0 0 0 14 0 24 0 0 1 0 2 0 0 0 > 100 > 16 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 17 0 0 0 10 2 8 0 0 5 0 78 0 1 0 > 99 > 18 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 19 0 0 0 5 1 2 0 0 0 0 10 0 0 0 > 100 > 20 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > 21 0 0 0 9 2 4 0 0 0 0 4 0 0 0 > 100 > 22 0 0 0 4 0 0 0 0 0 0 0 0 0 0 > 100 > 23 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > 100 > > > So I''m lead to believe that zfs receive is spending almost 100% of a > single CPUs time doing a lot of genunix`list_next ... > > Any ideas what is going on here? > > Best Regards, > -- > Lachlan Mulcahy > Senior DBA, > Marin Software Inc. > San Francisco, USA > > AU Mobile: +61 458 448 721 > US Mobile: +1 (415) 867 2839 > Office : +1 (415) 671 6080 > >-- Lachlan Mulcahy Senior DBA, Marin Software Inc. San Francisco, USA AU Mobile: +61 458 448 721 US Mobile: +1 (415) 867 2839 Office : +1 (415) 671 6080 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/b7caa999/attachment-0001.html>