Robert Watson
2007-Dec-03 15:38 UTC
Attention 7.x and 8.x ptmx/pts users (read if you set kern.pts.enable=1)
(If you aren't interested in the details of our ptmx/pty/pts driver, skip to the paragraph that reads "So, why the long-winded story?) Dear all: The current ptmx/pts implementation makes use of devfs(4) cloning: a user process wanting to allocate a pty/pts pair opens /dev/ptmx, which returns a reference to a new pty master. An ioctl is then performed to query which pts number was returned, and the pts device is then opened. Internally, the lookup of /dev/ptmx causes the driver to instantiate the pty, and then when the pty is opened, the pts is created. The pty and pts nodes are both destroyed when last close occurs, cleaning up the bits automatically when the last process attached to thee pair exits. Sounds good. :-) Unfortunately, the current implementation is subject to a potential resource leak: the pty is created when the lookup occurs, but if the open never takes place, then the pty is leaked. In principle, we have facilities to GC unused device nodes "eventually", although not a race-free way to determine that no race occurs, assuming that we implemented that. This leakage turns out to interact particularly poorly with our resource limits on pty/pts pairs -- both the administrative limit imposed by sysctl and also the functional limit on the number of entries in /etc/ttys. It's possible to imagine various sometimes messy techniques of performing this garbage collection. Instead, what I'd like to do is modify the ptmx code to have a race-free protocol, in which eventual termination of processes referencing the node results in freeing of the nodes. On some systems, ptmx performs a "bait-and-switch", in which the file descriptor of the pty node is silently substituted for the file descriptor of the ptmx code--similar to our model, only no window between lookup and open, but also not easily supported in our current VFS. Another possibility is to introduce a new system call and bypass ptmx entirely -- similar to pipe(), socketpair(), etc. The change that seemed to be the least disruptive, and which I have implemented, introduces ptmx as a true device node (not a devfs clone), and an ioctl that causes the allocation of the pty and pts pair -- however, the pair is also added to a garbage collection list. If the ptmx node is closed *before* the pty is opened, then the nodes are garbage collected. It turns out this also isn't easily implementable in our VFS, as we don't offer a per-file descriptor opaque to be used by device driver, nor offer the file descriptor pointer to the device driver (as in, say, Linux). At some point, this functionality will turn up, as there has been consistent interest in it over time. What I've done is implement an approximation of that model -- an "open counter" for ptmx, which when it hits zero across all references, causes a garbage collection sweep. If/when we can use per-file descriptor state, it is easily modified to sweep on close of a specific descriptor. --> start reading here if you were bored by the above Why the long-winded story? Well, this turns out to change the convention by which libc communications with the kernel -- instead of a simple open of ptmx and then ioctl to find the pts, we now open ptmx, perform an ioctl to allocate the pair, and then open both the pty and pts nodes explicitly. Thus, libc requires modification, and libcs that know how to speak to the old ptmx don't know how to speak to the new one, and, in effect, vice versa. This doesn't meet our ABI requirements for a stable branch, so what I plan to do is withdraw the ptmx/pts implementation from 7.0 before the release by disabling it in the kernel and libc. This will prevent us from nailing down the ABI, and we'll instead merge the revised protocol for 7.1. This change will, however, affect users of the 8-CURRENT branch, as during an upgrade cycle, it's likely that libc and kernel will be out of sync, and therefore if pts support is enabled (via the kern.pts.enable sysctl), pty devices will not be available, which might crimp the style of anyone performing a remote upgrade via, say, ssh. So, this is notice of two upcoming changes: (1) kern.pts.enable will be removed in 7.x, for reintroduction in 7.1. If kern.pts.enable was set, then your system will silently revert to using old-style ptys, and the setting of the sysctl will lead to an error. (2) I will merge the revised ptmx implementation to 7.x, potentially disrupting use of pty/pts devices for users who have kern.pts.enable explicitly set to a non-zero value. Hopefully this will resolve the known resource leaks in the ptmx code, and get us on track to start enabling it by default in the near future ... in 8.x, and at least offering it as a production feature in 7.x. Thanks, Robert N M Watson Computer Laboratory University of Cambridge
Ed Schouten
2007-Dec-04 02:23 UTC
Attention 7.x and 8.x ptmx/pts users (read if you set kern.pts.enable=1)
* Robert Watson <rwatson@FreeBSD.org> wrote:> Unfortunately, the current implementation is subject to a potential > resource leak: the pty is created when the lookup occurs, but if the open > never takes place, then the pty is leaked. In principle, we have > facilities to GC unused device nodes "eventually", although not a race-free > way to determine that no race occurs, assuming that we implemented that. > This leakage turns out to interact particularly poorly with our resource > limits on pty/pts pairs -- both the administrative limit imposed by sysctl > and also the functional limit on the number of entries in /etc/ttys. It's > possible to imagine various sometimes messy techniques of performing this > garbage collection.So this is the same issue I sent a message to arch@ about some time ago, that /dev/ptmx already returns a reference to the new pty, already when you stat(2) it (for example by running `ls -l /dev/ptmx')?> Instead, what I'd like to do is modify the ptmx code to have a race-free > protocol, in which eventual termination of processes referencing the node > results in freeing of the nodes. On some systems, ptmx performs a > "bait-and-switch", in which the file descriptor of the pty node is silently > substituted for the file descriptor of the ptmx code--similar to our model, > only no window between lookup and open, but also not easily supported in > our current VFS. Another possibility is to introduce a new system call and > bypass ptmx entirely -- similar to pipe(), socketpair(), etc.I actually think that this sounds pretty nice. You mean something like an in-kernel implementation for openpty()? Another thing that would make the TTY code a little bit cleaner in my opinion is removing the PRIV_TTY_PRISON check and making something generic inside devfs. If we have proper garbage collecting on TTY's, then we can just change make_dev_cred() to bind the new device node to a certain jail. That way you could even choose to hide nodes in /dev that don't belong to the jail in question. -- Ed Schouten <ed@fxq.nl> WWW: http://g-rave.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20071204/bcac58f8/attachment.pgp