thr3ads.net - zfs discuss - Help needed with zfs send/receive [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Arnaud Brand

2010-Feb-02 20:05 UTC

Help needed with zfs send/receive

Hi folks,

I''m having (as the title suggests) a problem with zfs send/receive.

Command line is like this :

pfexec zfs send -Rp tank/tsm@snapshot | ssh remotehost pfexec zfs recv
-v -F -d tank

This works like a charm as long as the snapshot is small enough.

When it gets too big (meaning somewhere between 17G and 900G), I get
ssh errors (can''t read from remote host).

I tried various encryption options (the fastest being in my case
arcfour) with no better results.

I tried to setup a script to insert dd on the sending and receiving
side to buffer the flow, still read errors.

I tried with mbuffer (which gives better performance), it didn''t get
better.

Today I tried with netcat (and mbuffer) and I got better throughput,
but it failed at 269GB transferred.

The two machines are connected to the switch with 2x1GbE (Intel) joined
together with LACP.

The switch logs show no errors on the ports.

kstat -p | grep e1000g shows one recv error on the sending side.

I can''t find anything in the logs which could give me a clue about
what''s happening.

I''m running build 131.

If anyone has the slightest clue of where I could look or what I could
do to pinpoint/solve the problem, I''d be very gratefull if (s)he could
share it with me.

Thanks and have a nice evening.

Arnaud


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Richard Elling

2010-Feb-02 21:25 UTC

head link

[zfs-discuss] Help needed with zfs send/receive

On Feb 2, 2010, at 12:05 PM, Arnaud Brand wrote:> Hi folks,
> 
> I''m having (as the title suggests) a problem with zfs
send/receive.
> Command line is like this :
> pfexec zfs send -Rp tank/tsm at snapshot | ssh remotehost pfexec zfs recv
-v -F -d tank
> 
> This works like a charm as long as the snapshot is small enough.
> 
> When it gets too big (meaning somewhere between 17G and 900G), I get ssh
errors (can''t read from remote host).
> 
> I tried various encryption options (the fastest being in my case arcfour)
with no better results.
> I tried to setup a script to insert dd on the sending and receiving side to
buffer the flow, still read errors.
> I tried with mbuffer (which gives better performance), it didn''t
get better.
> Today I tried with netcat (and mbuffer) and I got better throughput, but it
failed at 269GB transferred.
> 
> The two machines are connected to the switch with 2x1GbE (Intel) joined
together with LACP.
LACP is spawned from the devil to plague mankind.  It won''t
help your ssh transfer at all. It will cause your hair to turn grey and
get pulled out by the roots.  Try turning it off or using a separate 
network for your transfer.
 -- richard

Tim Cook

2010-Feb-02 21:49 UTC

head link

[zfs-discuss] Help needed with zfs send/receive

On Tue, Feb 2, 2010 at 3:25 PM, Richard Elling <richard.elling at
gmail.com>wrote:
> On Feb 2, 2010, at 12:05 PM, Arnaud Brand wrote:
> > Hi folks,
> >
> > I''m having (as the title suggests) a problem with zfs
send/receive.
> > Command line is like this :
> > pfexec zfs send -Rp tank/tsm at snapshot | ssh remotehost pfexec zfs
recv
> -v -F -d tank
> >
> > This works like a charm as long as the snapshot is small enough.
> >
> > When it gets too big (meaning somewhere between 17G and 900G), I get
ssh
> errors (can''t read from remote host).
> >
> > I tried various encryption options (the fastest being in my case
arcfour)
> with no better results.
> > I tried to setup a script to insert dd on the sending and receiving
side
> to buffer the flow, still read errors.
> > I tried with mbuffer (which gives better performance), it
didn''t get
> better.
> > Today I tried with netcat (and mbuffer) and I got better throughput,
but
> it failed at 269GB transferred.
> >
> > The two machines are connected to the switch with 2x1GbE (Intel)
joined
> together with LACP.
>
> LACP is spawned from the devil to plague mankind.  It won''t
> help your ssh transfer at all. It will cause your hair to turn grey and
> get pulled out by the roots.  Try turning it off or using a separate
> network for your transfer.
>  -- richard
>
>
That''s a bit harsh :)

To further what Richard said though, LACP isn''t going to help with your
issue.  LACP is NOT round-robin load balancing.  Think of it more like
source-destination.  You need to have multiple connections going out to
different source/destination mac/ip/whatever addresses.  Typically it works
great for something like a fileserver that has 50 clients hitting it.  Then
those clients will be balanced across the multiple links.  When you''ve
got
one server talking to one other server, it isn''t going to buy you much
of
anything 99% of the time.

Also, depending on your switch, it can actually hamper you quite a bit.  If
you''ve got a good cisco/hp/brocade/extreme networks/force10/etc switch,
it''s
fine.  If you''ve got a $50 soho netgear, you typically are going to get
what
you paid for :)

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100202/92b3fb37/attachment.html>

Arnaud Brand

2010-Feb-03 00:28 UTC

head link

Re: Help needed with zfs send/receive

Le 02/02/10 22:49, Tim Cook a écrit :

  On Tue, Feb 2, 2010 at 3:25 PM, Richard
Elling &lt;richard.elling@gmail.com&gt;
wrote:

  On
Feb 2, 2010, at 12:05 PM, Arnaud Brand wrote:

&gt; Hi folks,

&gt;

&gt; I''m having (as the title suggests) a problem with zfs
send/receive.

&gt; Command line is like this :

&gt; pfexec zfs send -Rp tank/tsm@snapshot | ssh remotehost pfexec zfs
recv -v -F -d tank

&gt;

&gt; This works like a charm as long as the snapshot is small enough.

&gt;

&gt; When it gets too big (meaning somewhere between 17G and 900G), I
get ssh errors (can''t read from remote host).

&gt;

&gt; I tried various encryption options (the fastest being in my case
arcfour) with no better results.

&gt; I tried to setup a script to insert dd on the sending and
receiving side to buffer the flow, still read errors.

&gt; I tried with mbuffer (which gives better performance), it
didn''t
get better.

&gt; Today I tried with netcat (and mbuffer) and I got better
throughput, but it failed at 269GB transferred.

&gt;

&gt; The two machines are connected to the switch with 2x1GbE (Intel)
joined together with LACP.

LACP is spawned from the devil to plague mankind.  It won''t

help your ssh transfer at all. It will cause your hair to turn grey and

get pulled out by the roots.  Try turning it off or using a separate

network for your transfer.

 -- richard

  That''s a bit harsh :)  

To further what Richard said though, LACP isn''t going to help with your
issue.  LACP is NOT round-robin load balancing.  Think of it more like
source-destination.  You need to have multiple connections going out to
different source/destination mac/ip/whatever addresses.  Typically it
works great for something like a fileserver that has 50 clients hitting
it.  Then those clients will be balanced across the multiple links. 
When you''ve got one server talking to one other server, it
isn''t going
to buy you much of anything 99% of the time.

Also, depending on your switch, it can actually hamper you quite a
bit.  If you''ve got a good cisco/hp/brocade/extreme
networks/force10/etc switch, it''s fine.  If you''ve got a $50
soho
netgear, you typically are going to get what you paid for :)

--Tim

I''ll remove LACP when I get back to work tomorrow (that''s in a
few
hours).

I already knew about it''s principles (doesn''t hurt to repeat
them
though), but as we have at least two machines connecting simultaneously
to this server, plus occasionnal clients, plus the replication stream,
I thought I could win some bandwidth.

I think I should''ve stayed by the rule : first make it work, then make
it fast.

In the mean time, I''ve launched the same command with a dd to a local
file instead of a zfs recv (ie:something along the lines pfexec
zfs send -Rp tank/tsm@snapshot | ssh remotehost dd of=/tank/repl.zfs
bs=128k).

I hope I''m not running into the issues related to e1000g problems under
load (zfs recv eats up all the CPU when it flushes and then the
transfer almost stalls for a second or two).

For the switch, it''s an HP4208 with reasonnably up to date firmware
(less than 6 month old, next update of our switches scheduled on Feb,
20th).

Strange thing is that the connection is lost on the sending side, but
the receiving side show it''s still "established" (in netstat
-an).

I could try changing the network cables too, maybe one of them has a
problem.

Thanks,

Arnaud


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Brent Jones

2010-Feb-03 03:41 UTC

head link

[zfs-discuss] Help needed with zfs send/receive

On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand <tib at tib.cc>
wrote:> Hi folks,
>
> I''m having (as the title suggests) a problem with zfs
send/receive.
> Command line is like this :
> pfexec zfs send -Rp tank/tsm at snapshot | ssh remotehost pfexec zfs recv
-v -F
> -d tank
>
> This works like a charm as long as the snapshot is small enough.
>
> When it gets too big (meaning somewhere between 17G and 900G), I get ssh
> errors (can''t read from remote host).
>
> I tried various encryption options (the fastest being in my case arcfour)
> with no better results.
> I tried to setup a script to insert dd on the sending and receiving side to
> buffer the flow, still read errors.
> I tried with mbuffer (which gives better performance), it didn''t
get better.
> Today I tried with netcat (and mbuffer) and I got better throughput, but it
> failed at 269GB transferred.
>
> The two machines are connected to the switch with 2x1GbE (Intel) joined
> together with LACP.
> The switch logs show no errors on the ports.
> kstat -p | grep e1000g shows one recv error on the sending side.
>
> I can''t find anything in the logs which could give me a clue about
what''s
> happening.
>
> I''m running build 131.
>
> If anyone has the slightest clue of where I could look or what I could do
to
> pinpoint/solve the problem, I''d be very gratefull if (s)he could
share it
> with me.
>
> Thanks and have a nice evening.
>
> Arnaud
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
This issue seems to have started after snv_129 for me. I get "connect
reset by peer", or transfers (of any kind) simply timeout.

Smaller transfers succeed most of the time, while larger ones usually
fail. Rolling back to snv_127 (my last one) does not exhibit this
issue. I have not had time to narrow down any causes, but I did find
one bug report that found some TCP test scenarios failed during one of
the builds, but unable to find that CR at this time.

-- 
Brent Jones
brent at servuhome.net

Brent Jones

2010-Feb-03 03:44 UTC

head link

[zfs-discuss] Help needed with zfs send/receive

On Tue, Feb 2, 2010 at 7:41 PM, Brent Jones <brent at servuhome.net>
wrote:> On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand <tib at tib.cc> wrote:
>> Hi folks,
>>
>> I''m having (as the title suggests) a problem with zfs
send/receive.
>> Command line is like this :
>> pfexec zfs send -Rp tank/tsm at snapshot | ssh remotehost pfexec zfs
recv -v -F
>> -d tank
>>
>> This works like a charm as long as the snapshot is small enough.
>>
>> When it gets too big (meaning somewhere between 17G and 900G), I get
ssh
>> errors (can''t read from remote host).
>>
>> I tried various encryption options (the fastest being in my case
arcfour)
>> with no better results.
>> I tried to setup a script to insert dd on the sending and receiving
side to
>> buffer the flow, still read errors.
>> I tried with mbuffer (which gives better performance), it
didn''t get better.
>> Today I tried with netcat (and mbuffer) and I got better throughput,
but it
>> failed at 269GB transferred.
>>
>> The two machines are connected to the switch with 2x1GbE (Intel) joined
>> together with LACP.
>> The switch logs show no errors on the ports.
>> kstat -p | grep e1000g shows one recv error on the sending side.
>>
>> I can''t find anything in the logs which could give me a clue
about what''s
>> happening.
>>
>> I''m running build 131.
>>
>> If anyone has the slightest clue of where I could look or what I could
do to
>> pinpoint/solve the problem, I''d be very gratefull if (s)he
could share it
>> with me.
>>
>> Thanks and have a nice evening.
>>
>> Arnaud
>>
>>
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>
> This issue seems to have started after snv_129 for me. I get "connect
> reset by peer", or transfers (of any kind) simply timeout.
>
> Smaller transfers succeed most of the time, while larger ones usually
> fail. Rolling back to snv_127 (my last one) does not exhibit this
> issue. I have not had time to narrow down any causes, but I did find
> one bug report that found some TCP test scenarios failed during one of
> the builds, but unable to find that CR at this time.
>
> --
> Brent Jones
> brent at servuhome.net
>
Ah, I found the CR that seemed to describe the situation (broken
pipe/connection reset by peer)

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510


-- 
Brent Jones
brent at servuhome.net

Arnaud Brand

2010-Feb-03 08:25 UTC

head link

[zfs-discuss] Help needed with zfs send/receive

Le 03/02/2010 04:44, Brent Jones a ?crit :> On Tue, Feb 2, 2010 at 7:41 PM, Brent Jones<brent at servuhome.net> 
wrote:
>    
>> On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand<tib at tib.cc> 
wrote:
>>      
>>> Hi folks,
>>>
>>> I''m having (as the title suggests) a problem with zfs
send/receive.
>>> Command line is like this :
>>> pfexec zfs send -Rp tank/tsm at snapshot | ssh remotehost pfexec
zfs recv -v -F
>>> -d tank
>>>
>>> This works like a charm as long as the snapshot is small enough.
>>>
>>> When it gets too big (meaning somewhere between 17G and 900G), I
get ssh
>>> errors (can''t read from remote host).
>>>
>>> I tried various encryption options (the fastest being in my case
arcfour)
>>> with no better results.
>>> I tried to setup a script to insert dd on the sending and receiving
side to
>>> buffer the flow, still read errors.
>>> I tried with mbuffer (which gives better performance), it
didn''t get better.
>>> Today I tried with netcat (and mbuffer) and I got better
throughput, but it
>>> failed at 269GB transferred.
>>>
>>> The two machines are connected to the switch with 2x1GbE (Intel)
joined
>>> together with LACP.
>>> The switch logs show no errors on the ports.
>>> kstat -p | grep e1000g shows one recv error on the sending side.
>>>
>>> I can''t find anything in the logs which could give me a
clue about what''s
>>> happening.
>>>
>>> I''m running build 131.
>>>
>>> If anyone has the slightest clue of where I could look or what I
could do to
>>> pinpoint/solve the problem, I''d be very gratefull if (s)he
could share it
>>> with me.
>>>
>>> Thanks and have a nice evening.
>>>
>>> Arnaud
>>>
>>>
>>>
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>>>
>>>        
>> This issue seems to have started after snv_129 for me. I get
"connect
>> reset by peer", or transfers (of any kind) simply timeout.
>>
>> Smaller transfers succeed most of the time, while larger ones usually
>> fail. Rolling back to snv_127 (my last one) does not exhibit this
>> issue. I have not had time to narrow down any causes, but I did find
>> one bug report that found some TCP test scenarios failed during one of
>> the builds, but unable to find that CR at this time.
>>
>> --
>> Brent Jones
>> brent at servuhome.net
>>
>>      
> Ah, I found the CR that seemed to describe the situation (broken
> pipe/connection reset by peer)
>
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510
>    This CR is marked as fixed in b131 and only relates to loopback or am I 
missing something ?

The transfer I started yesterday night finished with no errors :
/usr/bin/nc -l -p 8023 | /usr/local/bin/mbuffer -s1024k -m512M -P40 | dd 
of=/tank/repl.zfs bs=128k
summary: 1992 GByte in 7 h 10 min 78.9 MB/s, 8472x empty
2139334374612 bytes (2,1 TB) copied, 25860,5 s, 82,7 MB/s

So this seems to be linked someway to an high CPU load.
I''ll change the network cables and, as Richard suggested, remove LACP.
Then I''ll launch another transfer while at the same time zfs receiving 
the file I transferred this night.
If the transfer fails I guess it will be related to e1000g problems 
under load, not zfs, so a better place to post would be opensolaris-discuss.

Thanks for your help,
Arnaud

Arnaud Brand

2010-Feb-03 12:59 UTC

head link

[zfs-discuss] Help needed with zfs send/receive

Le 03/02/2010 09:25, Arnaud Brand a ?crit :> Le 03/02/2010 04:44, Brent Jones a ?crit :
>> On Tue, Feb 2, 2010 at 7:41 PM, Brent Jones<brent at
servuhome.net>  wrote:
>>> On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand<tib at tib.cc> 
wrote:
>>>> Hi folks,
>>>>
>>>> I''m having (as the title suggests) a problem with zfs
send/receive.
>>>> Command line is like this :
>>>> pfexec zfs send -Rp tank/tsm at snapshot | ssh remotehost
pfexec zfs
>>>> recv -v -F
>>>> -d tank
>>>>
>>>> This works like a charm as long as the snapshot is small
enough.
>>>>
>>>> When it gets too big (meaning somewhere between 17G and 900G),
I
>>>> get ssh
>>>> errors (can''t read from remote host).
>>>>
>>>> I tried various encryption options (the fastest being in my
case
>>>> arcfour)
>>>> with no better results.
>>>> I tried to setup a script to insert dd on the sending and
receiving
>>>> side to
>>>> buffer the flow, still read errors.
>>>> I tried with mbuffer (which gives better performance), it
didn''t
>>>> get better.
>>>> Today I tried with netcat (and mbuffer) and I got better 
>>>> throughput, but it
>>>> failed at 269GB transferred.
>>>>
>>>> The two machines are connected to the switch with 2x1GbE
(Intel)
>>>> joined
>>>> together with LACP.
>>>> The switch logs show no errors on the ports.
>>>> kstat -p | grep e1000g shows one recv error on the sending
side.
>>>>
>>>> I can''t find anything in the logs which could give me
a clue about
>>>> what''s
>>>> happening.
>>>>
>>>> I''m running build 131.
>>>>
>>>> If anyone has the slightest clue of where I could look or what
I
>>>> could do to
>>>> pinpoint/solve the problem, I''d be very gratefull if
(s)he could
>>>> share it
>>>> with me.
>>>>
>>>> Thanks and have a nice evening.
>>>>
>>>> Arnaud
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>
>>>>
>>> This issue seems to have started after snv_129 for me. I get
"connect
>>> reset by peer", or transfers (of any kind) simply timeout.
>>>
>>> Smaller transfers succeed most of the time, while larger ones
usually
>>> fail. Rolling back to snv_127 (my last one) does not exhibit this
>>> issue. I have not had time to narrow down any causes, but I did
find
>>> one bug report that found some TCP test scenarios failed during one
of
>>> the builds, but unable to find that CR at this time.
>>>
>>> -- 
>>> Brent Jones
>>> brent at servuhome.net
>>>
>> Ah, I found the CR that seemed to describe the situation (broken
>> pipe/connection reset by peer)
>>
>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510
> This CR is marked as fixed in b131 and only relates to loopback or am 
> I missing something ?
>
> The transfer I started yesterday night finished with no errors :
> /usr/bin/nc -l -p 8023 | /usr/local/bin/mbuffer -s1024k -m512M -P40 | 
> dd of=/tank/repl.zfs bs=128k
> summary: 1992 GByte in 7 h 10 min 78.9 MB/s, 8472x empty
> 2139334374612 bytes (2,1 TB) copied, 25860,5 s, 82,7 MB/s
>
> So this seems to be linked someway to an high CPU load.
> I''ll change the network cables and, as Richard suggested, remove
LACP.
> Then I''ll launch another transfer while at the same time zfs
receiving
> the file I transferred this night.
> If the transfer fails I guess it will be related to e1000g problems 
> under load, not zfs, so a better place to post would be 
> opensolaris-discuss.
>
> Thanks for your help,
> Arnaud
>Seems to be network related : the transfer failed after 129GB even 
without LACP and with other network cables.
I''ll post to networking-discuss.

Arnaud

Possibly Parallel Threads

Search for more reasonably related threads

zfs discuss - Feb 2010 - Help needed with zfs send/receive

Help needed with zfs send/receive

[zfs-discuss] Help needed with zfs send/receive

[zfs-discuss] Help needed with zfs send/receive

Re: Help needed with zfs send/receive

[zfs-discuss] Help needed with zfs send/receive

[zfs-discuss] Help needed with zfs send/receive

[zfs-discuss] Help needed with zfs send/receive

[zfs-discuss] Help needed with zfs send/receive

Possibly Parallel Threads