Hello,
I'm cross posting this from the OpenSwan mailing list, in case someone here
can help.
We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing 6
/24 subnets each (so 12 in total).
The problem we're having is completely randomly, be it in the middle of the
day, or in the middle of the night (so I don't believe it's traffic
related), certain (and sometimes all) routes will drop. They usually
recover after a few minutes, but it's still long enough for our monitoring
to detect downtime.
The configuration we have on each device is:
conn site-a
        keyingtries=0
        keylife=1h
        ikelifetime=8h
        left=1.1.1.1
        right=2.2.2.2
leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24}
rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24}
        pfs=yes
        auto=start
        authby=secret
        dpddelay=30
        dpdtimeout=120
        dpdaction=hold
        phase2alg=aes256-sha1;modp1536
        phase2=esp
        ike=aes256-sha1;modp1536
It's mirrored exactly the same on the other side.
I have tried changing the dead peer detection timeout to something high (5
minutes), and removing it completely (which I believe defaults it to 30
seconds), neither of which made any difference.
I can't see any very obvious errors in the logs, however the most recent
drop out produced the following message around the same time:
Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: max number
of
retransmissions (2) reached STATE_QUICK_I1
Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: starting
keying
attempt 2 of an unlimited number
Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95: initiating
Quick
Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39 {using
isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160
pfsgroup=OAKLEY_GROUP_MODP1536}
and also
Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring
Delete SA
payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired)
Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and
ignored informational message
Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring
Delete SA
payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired)
Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and
ignored informational message
Before we move to another solution, does anyone have any suggestions on
what the problem might be? Running a constant ping between the two hosts
doesn't drop *any* packets (even when the IPSec connection itself drops
out).
Thanks in advance.
Try setting lower keyexpiry time on other endpoint. -- Eero 2016-02-09 17:04 GMT+02:00 John Cenile <jcenile1983 at gmail.com>:> Hello, > > I'm cross posting this from the OpenSwan mailing list, in case someone here > can help. > > We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing 6 > /24 subnets each (so 12 in total). > > The problem we're having is completely randomly, be it in the middle of the > day, or in the middle of the night (so I don't believe it's traffic > related), certain (and sometimes all) routes will drop. They usually > recover after a few minutes, but it's still long enough for our monitoring > to detect downtime. > > The configuration we have on each device is: > > conn site-a > keyingtries=0 > keylife=1h > ikelifetime=8h > left=1.1.1.1 > right=2.2.2.2 > > > leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} > > > rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} > pfs=yes > auto=start > authby=secret > dpddelay=30 > dpdtimeout=120 > dpdaction=hold > phase2alg=aes256-sha1;modp1536 > phase2=esp > ike=aes256-sha1;modp1536 > > It's mirrored exactly the same on the other side. > > I have tried changing the dead peer detection timeout to something high (5 > minutes), and removing it completely (which I believe defaults it to 30 > seconds), neither of which made any difference. > > I can't see any very obvious errors in the logs, however the most recent > drop out produced the following message around the same time: > > Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: max number of > retransmissions (2) reached STATE_QUICK_I1 > Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: starting keying > attempt 2 of an unlimited number > Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95: initiating Quick > Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39 {using > isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160 > pfsgroup=OAKLEY_GROUP_MODP1536} > > and also > > Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA > payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired) > Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and > ignored informational message > Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA > payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired) > Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and > ignored informational message > > Before we move to another solution, does anyone have any suggestions on > what the problem might be? Running a constant ping between the two hosts > doesn't drop *any* packets (even when the IPSec connection itself drops > out). > > Thanks in advance. > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >
Thanks, I've updated the config with the following:
        keylife=20m
        ikelifetime=2h
I'll see how that goes.
In the mean time, any other suggestions would be greatly appreciated.
On 10 February 2016 at 02:14, Eero Volotinen <eero.volotinen at iki.fi>
wrote:
> Try setting lower keyexpiry time on other endpoint.
>
> --
> Eero
>
> 2016-02-09 17:04 GMT+02:00 John Cenile <jcenile1983 at gmail.com>:
>
>> Hello,
>>
>> I'm cross posting this from the OpenSwan mailing list, in case
someone
>> here
>> can help.
>>
>> We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing
6
>> /24 subnets each (so 12 in total).
>>
>> The problem we're having is completely randomly, be it in the
middle of
>> the
>> day, or in the middle of the night (so I don't believe it's
traffic
>> related), certain (and sometimes all) routes will drop. They usually
>> recover after a few minutes, but it's still long enough for our
monitoring
>> to detect downtime.
>>
>> The configuration we have on each device is:
>>
>> conn site-a
>>         keyingtries=0
>>         keylife=1h
>>         ikelifetime=8h
>>         left=1.1.1.1
>>         right=2.2.2.2
>>
>>
>>
leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24}
>>
>>
>>
rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24}
>>         pfs=yes
>>         auto=start
>>         authby=secret
>>         dpddelay=30
>>         dpdtimeout=120
>>         dpdaction=hold
>>         phase2alg=aes256-sha1;modp1536
>>         phase2=esp
>>         ike=aes256-sha1;modp1536
>>
>> It's mirrored exactly the same on the other side.
>>
>> I have tried changing the dead peer detection timeout to something high
(5
>> minutes), and removing it completely (which I believe defaults it to 30
>> seconds), neither of which made any difference.
>>
>> I can't see any very obvious errors in the logs, however the most
recent
>> drop out produced the following message around the same time:
>>
>> Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39:
max number of
>> retransmissions (2) reached STATE_QUICK_I1
>> Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39:
starting keying
>> attempt 2 of an unlimited number
>> Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95:
initiating
>> Quick
>> Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39
{using
>> isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160
>> pfsgroup=OAKLEY_GROUP_MODP1536}
>>
>> and also
>>
>> Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1:
ignoring Delete
>> SA
>> payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired)
>> Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1:
received and
>> ignored informational message
>> Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1:
ignoring Delete
>> SA
>> payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired)
>> Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1:
received and
>> ignored informational message
>>
>> Before we move to another solution, does anyone have any suggestions on
>> what the problem might be? Running a constant ping between the two
hosts
>> doesn't drop *any* packets (even when the IPSec connection itself
drops
>> out).
>>
>> Thanks in advance.
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>>
>
>
On 02/09/2016 07:04 AM, John Cenile wrote:> does anyone have any suggestions on what the problem might be?Not off the top of my head, but if I were you, I'd enable debugging of "control" and "dpd". See man ipsec.conf (/plutodebug) and man ipsec_pluto.
Centos 5 is also a bit old os. Is it possible to use newer version? (like centos 7 or centos 6?) Eero 2016-02-09 19:52 GMT+02:00 Gordon Messmer <gordon.messmer at gmail.com>:> On 02/09/2016 07:04 AM, John Cenile wrote: > >> does anyone have any suggestions on what the problem might be? >> > > Not off the top of my head, but if I were you, I'd enable debugging of > "control" and "dpd". See man ipsec.conf (/plutodebug) and man ipsec_pluto. > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >