Hello, I'm cross posting this from the OpenSwan mailing list, in case someone here can help. We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing 6 /24 subnets each (so 12 in total). The problem we're having is completely randomly, be it in the middle of the day, or in the middle of the night (so I don't believe it's traffic related), certain (and sometimes all) routes will drop. They usually recover after a few minutes, but it's still long enough for our monitoring to detect downtime. The configuration we have on each device is: conn site-a keyingtries=0 keylife=1h ikelifetime=8h left=1.1.1.1 right=2.2.2.2 leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} pfs=yes auto=start authby=secret dpddelay=30 dpdtimeout=120 dpdaction=hold phase2alg=aes256-sha1;modp1536 phase2=esp ike=aes256-sha1;modp1536 It's mirrored exactly the same on the other side. I have tried changing the dead peer detection timeout to something high (5 minutes), and removing it completely (which I believe defaults it to 30 seconds), neither of which made any difference. I can't see any very obvious errors in the logs, however the most recent drop out produced the following message around the same time: Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: max number of retransmissions (2) reached STATE_QUICK_I1 Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: starting keying attempt 2 of an unlimited number Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39 {using isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160 pfsgroup=OAKLEY_GROUP_MODP1536} and also Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired) Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and ignored informational message Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired) Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and ignored informational message Before we move to another solution, does anyone have any suggestions on what the problem might be? Running a constant ping between the two hosts doesn't drop *any* packets (even when the IPSec connection itself drops out). Thanks in advance.
Try setting lower keyexpiry time on other endpoint. -- Eero 2016-02-09 17:04 GMT+02:00 John Cenile <jcenile1983 at gmail.com>:> Hello, > > I'm cross posting this from the OpenSwan mailing list, in case someone here > can help. > > We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing 6 > /24 subnets each (so 12 in total). > > The problem we're having is completely randomly, be it in the middle of the > day, or in the middle of the night (so I don't believe it's traffic > related), certain (and sometimes all) routes will drop. They usually > recover after a few minutes, but it's still long enough for our monitoring > to detect downtime. > > The configuration we have on each device is: > > conn site-a > keyingtries=0 > keylife=1h > ikelifetime=8h > left=1.1.1.1 > right=2.2.2.2 > > > leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} > > > rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} > pfs=yes > auto=start > authby=secret > dpddelay=30 > dpdtimeout=120 > dpdaction=hold > phase2alg=aes256-sha1;modp1536 > phase2=esp > ike=aes256-sha1;modp1536 > > It's mirrored exactly the same on the other side. > > I have tried changing the dead peer detection timeout to something high (5 > minutes), and removing it completely (which I believe defaults it to 30 > seconds), neither of which made any difference. > > I can't see any very obvious errors in the logs, however the most recent > drop out produced the following message around the same time: > > Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: max number of > retransmissions (2) reached STATE_QUICK_I1 > Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: starting keying > attempt 2 of an unlimited number > Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95: initiating Quick > Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39 {using > isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160 > pfsgroup=OAKLEY_GROUP_MODP1536} > > and also > > Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA > payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired) > Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and > ignored informational message > Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete SA > payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired) > Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and > ignored informational message > > Before we move to another solution, does anyone have any suggestions on > what the problem might be? Running a constant ping between the two hosts > doesn't drop *any* packets (even when the IPSec connection itself drops > out). > > Thanks in advance. > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >
Thanks, I've updated the config with the following: keylife=20m ikelifetime=2h I'll see how that goes. In the mean time, any other suggestions would be greatly appreciated. On 10 February 2016 at 02:14, Eero Volotinen <eero.volotinen at iki.fi> wrote:> Try setting lower keyexpiry time on other endpoint. > > -- > Eero > > 2016-02-09 17:04 GMT+02:00 John Cenile <jcenile1983 at gmail.com>: > >> Hello, >> >> I'm cross posting this from the OpenSwan mailing list, in case someone >> here >> can help. >> >> We have two sites connected via OpenSwan 2.6.32-9 on CentOS 5, sharing 6 >> /24 subnets each (so 12 in total). >> >> The problem we're having is completely randomly, be it in the middle of >> the >> day, or in the middle of the night (so I don't believe it's traffic >> related), certain (and sometimes all) routes will drop. They usually >> recover after a few minutes, but it's still long enough for our monitoring >> to detect downtime. >> >> The configuration we have on each device is: >> >> conn site-a >> keyingtries=0 >> keylife=1h >> ikelifetime=8h >> left=1.1.1.1 >> right=2.2.2.2 >> >> >> leftsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} >> >> >> rightsubnets={x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24,x.x.x.x/24} >> pfs=yes >> auto=start >> authby=secret >> dpddelay=30 >> dpdtimeout=120 >> dpdaction=hold >> phase2alg=aes256-sha1;modp1536 >> phase2=esp >> ike=aes256-sha1;modp1536 >> >> It's mirrored exactly the same on the other side. >> >> I have tried changing the dead peer detection timeout to something high (5 >> minutes), and removing it completely (which I believe defaults it to 30 >> seconds), neither of which made any difference. >> >> I can't see any very obvious errors in the logs, however the most recent >> drop out produced the following message around the same time: >> >> Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: max number of >> retransmissions (2) reached STATE_QUICK_I1 >> Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #39: starting keying >> attempt 2 of an unlimited number >> Feb 10 00:53:09 site-b-vpn pluto[30584]: "site-a/5x5" #95: initiating >> Quick >> Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK to replace #39 {using >> isakmp#52 msgid:119495de proposal=AES(12)_256-SHA1(2)_160 >> pfsgroup=OAKLEY_GROUP_MODP1536} >> >> and also >> >> Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete >> SA >> payload: PROTO_IPSEC_ESP SA(0xde58eea3) not found (maybe expired) >> Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and >> ignored informational message >> Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: ignoring Delete >> SA >> payload: PROTO_IPSEC_ESP SA(0xa5298d7d) not found (maybe expired) >> Feb 10 00:52:25 site-a-vpn pluto[2414]: "site-b/6x6" #1: received and >> ignored informational message >> >> Before we move to another solution, does anyone have any suggestions on >> what the problem might be? Running a constant ping between the two hosts >> doesn't drop *any* packets (even when the IPSec connection itself drops >> out). >> >> Thanks in advance. >> _______________________________________________ >> CentOS mailing list >> CentOS at centos.org >> https://lists.centos.org/mailman/listinfo/centos >> > >
On 02/09/2016 07:04 AM, John Cenile wrote:> does anyone have any suggestions on what the problem might be?Not off the top of my head, but if I were you, I'd enable debugging of "control" and "dpd". See man ipsec.conf (/plutodebug) and man ipsec_pluto.
Centos 5 is also a bit old os. Is it possible to use newer version? (like centos 7 or centos 6?) Eero 2016-02-09 19:52 GMT+02:00 Gordon Messmer <gordon.messmer at gmail.com>:> On 02/09/2016 07:04 AM, John Cenile wrote: > >> does anyone have any suggestions on what the problem might be? >> > > Not off the top of my head, but if I were you, I'd enable debugging of > "control" and "dpd". See man ipsec.conf (/plutodebug) and man ipsec_pluto. > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >