On Wed 07/01/09 20:31 , Carsten Aulbert carsten.aulbert at aei.mpg.de sent:> Brent Jones wrote: > > > > Using mbuffer can speed it up dramatically, but > > this seems like a hack> without addressing a real problem with zfs > > send/recv.> Trying to send any meaningful sized snapshots > > from say an X4540 takes> up to 24 hours, for as little as 300GB > > changerate.> I have not found a solution yet also. But it seems to depend highly on > the distribution of file sizes, number of files per directory or > whatever. The last tests I made showed still more than 50 hours for 700 > GB and ~45 hours for 5 TB (both tests were null tests where zfs send > wrote to /dev/null).Send/receive speeds appear to be very data dependent. I have several different filesystems containing differing data types. The slowest to replicate is mail and my guess it''s the changes to the index files that takes the time. Similar sized filesystems with similar deltas where files are mainly added or deleted appear to replicate faster. Cutting out ssh (I use direct socket connections and large circular buffers) makes more of a difference for full sends (doubled the throughput between Thumpers). The improvement to incrementals varies. Data gets sent over the wire quickly, but the receive can still take a long time. Ian.
On Thu 08/01/09 08:08 , "Brent Jones" brent at servuhome.net sent:> > I have yet to devise a script that starts Mbuffer zfs recv on the > receiving side with proper parameters, then start an Mbuffer ZFS send > on the sending side, but I may work on one later this week. > I''d like the snapshots to be sent every 15 minutes, just to keep the > amount of change that needs to be sent as low as possible. >It probably won''t make much of an impact. I have run tests with my direct connection application using buffers bigger than the incremental size and send/receive processing times still dominate. -- Ian.
Ian Collins wrote:> Send/receive speeds appear to be very data dependent. I have several different filesystems containing differing data types. The slowest to replicate is mail and my guess it''s the changes to the index files that takes the time. Similar sized filesystems with similar deltas where files are mainly added or deleted appear to replicate faster. > >Has anyone investigated this? I have been replicating a server today and the differences between incremental processing is huge, for example: filesystem A: received 1.19Gb stream in 52 seconds (23.4Mb/sec) filesystem B: received 729Mb stream in 4564 seconds (164Kb/sec) I can delve further into the content if anyone is interested. -- Ian.
On Fri, Jan 9, 2009 at 7:53 PM, Ian Collins <ian at ianshome.com> wrote:> Ian Collins wrote: >> Send/receive speeds appear to be very data dependent. I have several different filesystems containing differing data types. The slowest to replicate is mail and my guess it''s the changes to the index files that takes the time. Similar sized filesystems with similar deltas where files are mainly added or deleted appear to replicate faster. >> >> > Has anyone investigated this? I have been replicating a server today > and the differences between incremental processing is huge, for example: > > filesystem A: > > received 1.19Gb stream in 52 seconds (23.4Mb/sec) > > filesystem B: > > received 729Mb stream in 4564 seconds (164Kb/sec) > > I can delve further into the content if anyone is interested. > > -- > Ian. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >What hardware, to/from is this? How are those filesystems laid out, what is their total size, used space, and guessable file count / file size distribution? I''m also trying to put together the puzzle to provide more detail to a case I opened with Sun regarding this. -- Brent Jones brent at servuhome.net
On Fri, Jan 9, 2009 at 11:41 PM, Brent Jones <brent at servuhome.net> wrote:> On Fri, Jan 9, 2009 at 7:53 PM, Ian Collins <ian at ianshome.com> wrote: >> Ian Collins wrote: >>> Send/receive speeds appear to be very data dependent. I have several different filesystems containing differing data types. The slowest to replicate is mail and my guess it''s the changes to the index files that takes the time. Similar sized filesystems with similar deltas where files are mainly added or deleted appear to replicate faster. >>> >>> >> Has anyone investigated this? I have been replicating a server today >> and the differences between incremental processing is huge, for example: >> >> filesystem A: >> >> received 1.19Gb stream in 52 seconds (23.4Mb/sec) >> >> filesystem B: >> >> received 729Mb stream in 4564 seconds (164Kb/sec) >> >> I can delve further into the content if anyone is interested. >> >> -- >> Ian. >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > What hardware, to/from is this? > > How are those filesystems laid out, what is their total size, used > space, and guessable file count / file size distribution? > > I''m also trying to put together the puzzle to provide more detail to a > case I opened with Sun regarding this. > > -- > Brent Jones > brent at servuhome.net >Just to update this, hope no one is tired of hearing about it. I just image-updated to snv_105 to obtain patch for CR 6418042 at the recommendation from a Sun support technician. My results are much improved, on the order of 5-100 times faster (either over Mbuffer or SSH). Not only do snapshots begin sending right away (no longer requiring several minutes of reads before sending any data), the actual send will sustain about 35-50MB/sec over SSH, and up to 100MB/s via Mbuffer (on a single Gbit link, I am network limited now, something I never thought I would say I love to see!). Previously, I was lucky if the snapshot would begin sending any data after about 10 minutes, and once it did begin sending, it would usually peak at about 1MB/sec via SSH, and up to 20MB/sec over Mbuffer. Mbuffer seems to play a much larger role now, as SSH appears to only be single threaded for compression/encryption, peaking a single CPU worth of power. Mbuffers raw network performance saturates my Gigabit link, and making me consider link bonding or something to see how fast it -really- can go, now that the taps are open! So, my issues appears pretty much resolved, although snv_105 is in the /dev branch, things appear stable for the most part. Please let me know if you have any questions, or want additional info on my setup and testing. Regards, -- Brent Jones brent at servuhome.net
Brent Jones wrote:> On Fri, Jan 9, 2009 at 11:41 PM, Brent Jones <brent at servuhome.net> wrote: > >> On Fri, Jan 9, 2009 at 7:53 PM, Ian Collins <ian at ianshome.com> wrote: >> >>> Ian Collins wrote: >>> >>>> Send/receive speeds appear to be very data dependent. I have several different filesystems containing differing data types. The slowest to replicate is mail and my guess it''s the changes to the index files that takes the time. Similar sized filesystems with similar deltas where files are mainly added or deleted appear to replicate faster. >>>> >>>> >>>> >>> Has anyone investigated this? I have been replicating a server today >>> and the differences between incremental processing is huge, for example: >>> >>> filesystem A: >>> >>> received 1.19Gb stream in 52 seconds (23.4Mb/sec) >>> >>> filesystem B: >>> >>> received 729Mb stream in 4564 seconds (164Kb/sec) >>> >>> I can delve further into the content if anyone is interested. >>> >>> -- >>> Ian. >>> >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >>> >> What hardware, to/from is this? >> >> How are those filesystems laid out, what is their total size, used >> space, and guessable file count / file size distribution? >> >> I''m also trying to put together the puzzle to provide more detail to a >> case I opened with Sun regarding this. >> >> -- >> Brent Jones >> brent at servuhome.net >> >> > > Just to update this, hope no one is tired of hearing about it. I just > image-updated to snv_105 to obtain patch for CR 6418042 at the > recommendation from a Sun support technician. > > My results are much improved, on the order of 5-100 times faster > (either over Mbuffer or SSH). Not only do snapshots begin sending > right away (no longer requiring several minutes of reads before > sending any data), the actual send will sustain about 35-50MB/sec over > SSH, and up to 100MB/s via Mbuffer (on a single Gbit link, I am > network limited now, something I never thought I would say I love to > see!). >Thanks for the heads up Brent, I''ll have to sweet talk one of my former clients into running OpenSolaris on their x4540s. Anyone know if NetVault is supported on OpenSolaris? Do any of the Sun folks know if these update will be back-ported to Solaris 10 in a patch or update release? -- Ian.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brent Jones:> My results are much improved, on the order of 5-100 times faster > (either over Mbuffer or SSH).this is good news - although not quite soon enough for my current 5TB zfs send ;-) have you tested if this also improves the performance of incremental sends? - river. -----BEGIN PGP SIGNATURE----- iD4DBQFJeZ8GIXd7fCuc5vIRAjPeAJ9Ed9AdwcTWdqkAizVqIPp1qUyNtACY8DA6 a9zguVE8f/TZ6pH/Haa4/Q==vf9T -----END PGP SIGNATURE-----
It definitely does. I made some tests today comparing b101 with b105 while doing ''zfs send -R -I A B >/dev/null'' with several dozen snapshots between A and B. Well, b105 is almost 5x faster in my case - that''s pretty good. -- Robert Milkowski http://milek.blogspot.com -- This message posted from opensolaris.org
On Mon, Feb 2, 2009 at 6:55 AM, Robert Milkowski <milek at task.gda.pl> wrote:> It definitely does. I made some tests today comparing b101 with b105 while doing ''zfs send -R -I A B >/dev/null'' with several dozen snapshots between A and B. Well, b105 is almost 5x faster in my case - that''s pretty good. > > -- > Robert Milkowski > http://milek.blogspot.com > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Sad to report that I am seeing the slow zfs recv issue cropping up again while running b105 :( Not sure what has triggered the change, but I am seeing the same behavior again: massive amounts of reads on the receiving side, while only receiving just tiny bursts of data amounting to a mere megabyte a second. It doesn''t seem to happen every single time though which is odd, but I can provoke it by destroying a snapshot from the pool I am sending, then taking another snapshot and re-sending it. It seems to cause the receiving side to go into this "read storm" before any data is transferred. I''m going to open a case in the morning, and see if I can''t get an engineer to look at this. -- Brent Jones brent at servuhome.net
Hello Brent, Friday, February 13, 2009, 8:15:55 AM, you wrote: BJ> Sad to report that I am seeing the slow zfs recv issue cropping up BJ> again while running b105 :( BJ> Not sure what has triggered the change, but I am seeing the same BJ> behavior again: massive amounts of reads on the receiving side, while BJ> only receiving just tiny bursts of data amounting to a mere megabyte a BJ> second. BJ> It doesn''t seem to happen every single time though which is odd, but I BJ> can provoke it by destroying a snapshot from the pool I am sending, BJ> then taking another snapshot and re-sending it. It seems to cause the BJ> receiving side to go into this "read storm" before any data is BJ> transferred. IIRC the bug which was fixed in b105 was about sending side not a receiving one. BJ> I''m going to open a case in the morning, and see if I can''t get an BJ> engineer to look at this. Please let us know of the outcome. -- Best regards, Robert Milkowski http://milek.blogspot.com