Eirik Øverby
2012-Jan-15 18:17 UTC
Random 'Connection reset' issues between jails on same host
Hi all, We're trying to implement our puppet infrastructure, and have discovered something strange about TCP connections between jails on the same host. As our jails haven't generally been doing a lot of connections between each other, this issue hasn't popped up before. We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 8-core Intel systems, with 16GB RAM each. I have just upgraded one of the two systems to 9.0-RELEASE, and it shows the same problem. When the puppetmaster jail is running on the same host as the jail running puppet agent, connections from the puppet agent randomly fails with 'Connection reset by peer'. This happens at random stages of configuration sync. Now if either of the jails are moved to another system (jail stop, zfs snaphot, zfs send/recv, jail start) on the same physical network, there are no such problems. It is not a hardware issue, as this happens no matter which of the two hosts we use. If both puppetmaster and puppet agent reside on the same physical box, the errors will show up. There used to be a somewhat similar problem with FTP between jails on the same host, but this was taken care of some time after 8.0-RELEASE IIRC. That problem manifested itself in a combination of random connection failures (had to try 2-3 times to establish a connection) and very slow transfer rates (at most 150kbyte/s between jails on the same host, but >50mbyte/s between jails on different hosts on the same network). Has anyone seen this before? Is there anything I have missed, sysctls I should set/adjust? The /etc/rc.conf settings for the jails are very simple - the following differing from the default: jail_sysvipc_allow="YES" jail_mount_enable="YES" jail_devfs_enable="YES" /etc/sysctl.conf contains the following jail-related: security.jail.enforce_statfs=0 security.jail.mount_allowed=1 security.jail.allow_raw_sockets=1 Thanks, /Eirik
Eirik Øverby
2012-Jan-15 18:36 UTC
Random 'Connection reset' issues between jails on same host
On Jan 15, 2012, at 18:44, Eirik ?verby wrote:> Hi all, > > We're trying to implement our puppet infrastructure, and have discovered something strange about TCP connections between jails on the same host. As our jails haven't generally been doing a lot of connections between each other, this issue hasn't popped up before. > > We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 8-core Intel systems, with 16GB RAM each. I have just upgraded one of the two systems to 9.0-RELEASE, and it shows the same problem. > > When the puppetmaster jail is running on the same host as the jail running puppet agent, connections from the puppet agent randomly fails with 'Connection reset by peer'. This happens at random stages of configuration sync. Now if either of the jails are moved to another system (jail stop, zfs snaphot, zfs send/recv, jail start) on the same physical network, there are no such problems. It is not a hardware issue, as this happens no matter which of the two hosts we use. If both puppetmaster and puppet agent reside on the same physical box, the errors will show up.Replying to myself here: Assignig a cpuset with a single CPU to the jail with puppetmaster seems to cure the symptom. I've made a few thousand connects now and no failures so far. Repeatable on 8 and 9. This is obviously only a workaround - but may give some hints as to where the problem is. /Eirik
Eirik Øverby
2012-Jan-15 18:57 UTC
Random 'Connection reset' issues between jails on same host
Hi all, We're trying to implement our puppet infrastructure, and have discovered something strange about TCP connections between jails on the same host. As our jails haven't generally been doing a lot of connections between each other, this issue hasn't popped up before. We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 8-core Intel systems, with 16GB RAM each. When the puppetmaster jail is running on the same host as the jail running puppet agent, connections from the puppet agent randomly fails with 'Connection reset by peer'. This happens at random stages of configuration sync. Now if either of the jails are moved to another system (jail stop, zfs snaphot, zfs send/recv, jail start) on the same physical network, there are no such problems. It is not a hardware issue, as this happens no matter which of the two hosts we use. If both puppetmaster and puppet agent reside on the same physical box, the errors will show up. There used to be a somewhat similar problem with FTP between jails on the same host, but this was taken care of some time after 8.0-RELEASE IIRC. That problem manifested itself in a combination of random connection failures (had to try 2-3 times to establish a connection) and very slow transfer rates (at most 150kbyte/s between jails on the same host, but >50mbyte/s between jails on different hosts on the same network). I am going to try to repeat this on 9.0-RELEASE - but in the meantime, has anyone seen this before? Is there anything I have missed, sysctls I should set/adjust? The /etc/rc.conf settings for the jails are very simple - the following differing from the default: jail_sysvipc_allow="YES" jail_mount_enable="YES" jail_devfs_enable="YES" /etc/sysctl.conf contains the following jail-related: security.jail.enforce_statfs=0 security.jail.mount_allowed=1 security.jail.allow_raw_sockets=1 Thanks, /Eirik