It seems we are hitting a boundary with zfs send/receive over a network link (10Gb/s). We can see peak values of up to 150MB/s, but on average about 40-50MB/s are replicated. This is far away from the bandwidth that a 10Gb link can offer. Is it possible, that ZFS is giving replication a too low priority/throttling it too much?
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Mika Borner > > It seems we are hitting a boundary with zfs send/receive over a network > link (10Gb/s). We can see peak values of up to 150MB/s, but on average > about 40-50MB/s are replicated. This is far away from the bandwidth > that > a 10Gb link can offer. > > Is it possible, that ZFS is giving replication a too low > priority/throttling it too much?I don''t think this is called "replication," so ... careful about terminology. zfs send can go as fast as your hardware is able to read. If you''d like to know how fast your hardware is, try this: zfs send somefilesystem | pv -i 30 > /dev/null (You might want to install pv from opencsw or blastwave.) I think, in your case, you''ll see something around 40-50MB/s I will also add this much: If you send the original snapshot of your complete filesystem, it''ll probably go very fast. (Much faster than 40-50 MB/s). Because all those blocks are essentially sequential blocks on disk. When you''re sending incrementals ... They are essentially more fragmented ... so the total throughput is lower. The disks have to perform a greater random IO percentage. I have a very fast server, and my zfs send is about half as fast as yours. In both cases, it''s enormously faster than some other backup tool, like tar or rsync or whatever.
Thomas Maier-Komor
2010-Jun-25 13:25 UTC
[zfs-discuss] Maximum zfs send/receive throughput
On 25.06.2010 14:32, Mika Borner wrote:> > It seems we are hitting a boundary with zfs send/receive over a network > link (10Gb/s). We can see peak values of up to 150MB/s, but on average > about 40-50MB/s are replicated. This is far away from the bandwidth that > a 10Gb link can offer. > > Is it possible, that ZFS is giving replication a too low > priority/throttling it too much? > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussyou can probably improve overall performance by using mbuffer [1] to stream the data over the network. At least some people have reported increased performance. mbuffer will buffer the datastream and disconnect zfs send operations from network latencies. Get it there: original source: http://www.maier-komor.de/mbuffer.html binary package: http://www.opencsw.org/packages/CSWmbuffer/ - Thomas
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Thomas Maier-Komor > > you can probably improve overall performance by using mbuffer [1] to > stream the data over the network. At least some people have reported > increased performance. mbuffer will buffer the datastream and > disconnect > zfs send operations from network latencies. > > Get it there: > original source: http://www.maier-komor.de/mbuffer.html > binary package: http://www.opencsw.org/packages/CSWmbuffer/mbuffer is also available in opencsw / blastwave. IMHO, easier and faster and better than building things from source, most of the time.
I have been looking at why a zfs receive operation is terribly slow and one observation that seemed directly linked to why it is slow is that at any one time one of the cpus is pegged at 100% sys while the other 5 in my case are relatively quiet. I haven''t dug any deeper than that, but was curious to know if anyone else observed the same behavior? -- This message posted from opensolaris.org
Just an update, I had a ticket open with Sun regarding this and it looks like they have a CR for what I was seeing (6975124). -- This message posted from opensolaris.org
Jim Barker wrote:> Just an update, I had a ticket open with Sun regarding this and it looks like they have a CR for what I was seeing (6975124). >That would seem to describe a zfs receive which has stopped for 12 hours. You described yours as slow, which is not the term I personally would use for one which is stopped. However, you haven''t given anything like enough detail here of your situation and what''s happening for me to make any worthwhile guesses. -- Andrew Gabriel
Andrew, Correct. The reason I initially opened the case was because I could essentially hang a "zfs receive" operation and any further zfs commands issued on the box would never come back. Just today I had one of my "slow" receives just come to a screaching halt and where I saw 1 cpu spike all the time, it is now exhibiting the same behavior as the hang (absolutely no activity, quiet as a mouse). I guess I didn''t wait long enough for the "slow" process to finally hang. It is hung now and will stay that way until the end of time. I thought I had found a way to get around the freeze, but I guess I just delayed the freeze a little longer. I provided Oracle some explorer output and a crash dump to analyze and this is the data they used to provide the information I passed on. Jim Barker -- This message posted from opensolaris.org
Does anyone know the current state of bug #6975124? Has there been any progress since August? I currently have an OpenSolaris 2009.06 snv_111b system (entire 0.5.11-0.111.14) which *repeatedly* gets stuck after a couple of minutes during a large (xxx GB) incremental zfs receive operation. The process does not crash, it simply keeps sleeping and there is no progress at all. PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND 641 root 1 60 0 7660K 2624K sleep 2:16 0.00% zfs Both truss and mdb are not able to show *any* activity or status of the zfs receive process: # truss -p 641 *hangs* I''m not very familar with mdb. I''ve tried this: # mdb -p 641 mdb: failed to initialize //lib/libc_db.so.1: libthread_db call failed unexpectedly mdb: warning: debugger will only be able to examine raw LWPs Loading modules: [ ld.so.1 libumem.so.1 libavl.so.1 libnvpair.so.1 ]> ::stack > ::stackregs > ::statusdebugging PID 641 (32-bit) file: /sbin/zfs threading model: raw lwps status: process is running, debugger stop directive pending I''m wondering if #6975124 could be the cause of my problem, too. -- This message posted from opensolaris.org
> I''m wondering if #6975124 could be the cause of my problem, too.there are several zfs send (and receive) related issues with 111b. You might seriously want to consider upgrading to more recent opensolaris (134) or openindiana Yours Markus Kovero
> I''m not very familar with mdb. I''ve tried this:Ah, this looks much better: root 641 0.0 0.0 7660 2624 ? S Nov 08 2:16 /sbin/zfs receive -dF datapool/share/ (...) # echo "0t641::pid2proc|::walk thread|::findstack -v" | mdb -k stack pointer for thread ffffff09236198e0: ffffff003d9b5670 [ ffffff003d9b5670 _resume_from_idle+0xf1() ] ffffff003d9b56a0 swtch+0x147() ffffff003d9b56d0 cv_wait+0x61(ffffff0a4fbd4228, ffffff0a4fbd40e8) ffffff003d9b5710 dmu_tx_wait+0x80(ffffff0948aa4600) ffffff003d9b5750 dmu_tx_assign+0x4b(ffffff0948aa4600, 1) ffffff003d9b57e0 dmu_free_long_range_impl+0x12a(ffffff0911456d60, ffffff0a4fbd4028, 0, ffffffffffffffff, 0) ffffff003d9b5840 dmu_free_long_range+0x5b(ffffff0911456d60, 53e34, 0, ffffffffffffffff) ffffff003d9b58d0 dmu_object_reclaim+0x112(ffffff0911456d60, 53e34, 13, 1e00, 11, 108) ffffff003d9b5930 restore_object+0xff(ffffff003d9b5950, ffffff0911456d60, ffffff003d9b59c0) ffffff003d9b5a90 dmu_recv_stream+0x48d(ffffff003d9b5be0, ffffff094d089440, ffffff003d9b5ad8) ffffff003d9b5c40 zfs_ioc_recv+0x2c0(ffffff092492b000) ffffff003d9b5cc0 zfsdev_ioctl+0x10b(b600000000, 5a1c, 8044e50, 100003, ffffff0948b60e50, ffffff003d9b5de4) ffffff003d9b5d00 cdev_ioctl+0x45(b600000000, 5a1c, 8044e50, 100003, ffffff0948b60e50, ffffff003d9b5de4) ffffff003d9b5d40 spec_ioctl+0x83(ffffff0921e54640, 5a1c, 8044e50, 100003, ffffff0948b60e50, ffffff003d9b5de4, 0) ffffff003d9b5dc0 fop_ioctl+0x7b(ffffff0921e54640, 5a1c, 8044e50, 100003, ffffff0948b60e50, ffffff003d9b5de4, 0) ffffff003d9b5ec0 ioctl+0x18e(3, 5a1c, 8044e50) ffffff003d9b5f10 sys_syscall32+0x101() Does this maybe ring a bell with someone? -- This message posted from opensolaris.org
> Does this maybe ring a bell with someone?Update: The cause of the problem was OpenSolaris bug 6826836 "Deadlock possible in dmu_object_reclaim() http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6826836 It could be fixed by upgrading the OpenSolaris 2009.06 system to 0.5.11-0.111.17 (via the non-free official "support" repository). -- This message posted from opensolaris.org