On Mon, Jun 08, 2015 at 01:42:21AM +0900, kikuchan wrote:> From my curiosity, is my patch a technically bad? > Is there race condition in it? Or, enabling key_t separation for jail > could trigger race condition, perhaps? >I only briefly looked at the patch. The fact that you perform outside of ipcperm looks suspicious but may be harmless, so at best it's a bad style. If you need ipc mechanism-specifc functions, make them call ipcperm instead. The jail check is too simplistic. Jails higher in the hierarchy should be able to access whatever lower jails produced.> > > I do see great benefit in having jail-aware ipcs. > > > > I do not believe the way to achieve it is to add jail-aware permission > > checks. Support in question should provide support for separate > > namespaces. The are several upsides, including lack of conflict between > > jails and plugged infoleaks. > > Sorry but I might misunderstand what your "separate namespaces" means. > What namespace are you going to separate? key_t, shmid, kernel > structure of shm, or others? > What features do your "jail-aware ipcs" provide? >Well, as I said in my first mail the idea is to make ipc code look at structures assigned to given jail, so that we can have multiple jails with only their own objects. No "well, this id is used by other jail", unless the namespace is explicitly shared. I did have a patch with a meh implementation doing this, but I lost it along the way. It is easy to implement it for "private purposes" (i.e. disregarding possible attacks with jailing processes). The real work is making the whole business safe. For instance back then I could not find any reliable mechanism to tell me whether given process has a shared address space. There is only a vm_refcnt counter in vmspace which is modified on various occasions, thus is not suitable. Adding a separate counter sucks and adding a "once set, never cleared flag" sucks as well. Maybe there is a good method. -- Mateusz Guzik <mjguzik gmail.com>
kikuchan at uranus.dti.ne.jp
2015-Jun-09 16:43 UTC
[patch] separate SysV IPC namespace for jail
Hi Mateusz, Thank you for your reply. (and sorry about my ISP mail service jammed...) I still want to resolve sysv ipc confilicts between jails somehow. (It's welcome if someone else do, but I couldn't find any working example yet...)> I only briefly looked at the patch. The fact that you perform outside of > ipcperm looks suspicious but may be harmless, so at best it's a bad > style. If you need ipc mechanism-specifc functions, make them call > ipcperm instead.Sorry, I guess EACCES misled you. I should have chosen other value and/or concealed information for each jail completely. I intended to demonstrate it's enought to achieve IPC key_t space separation (to PostgreSQL work) for each jails without having shmid_kernel struct for each jails.> The jail check is too simplistic. Jails higher in the hierarchy should be > able to access whatever lower jails produced.Yes I thought so, however, I realized that a single key_t space per jail is enough, otherwise user get confused. Hmm, but yes, probably you're right. I can't find any technical reason to forbid the behavior you mentioned. (I think it needs to modify ipcs(1) so that can display JID to avoid confusing)> Well, as I said in my first mail the idea is to make ipc code look at > structures assigned to given jail, so that we can have multiple jails > with only their own objects. No "well, this id is used by other jail", > unless the namespace is explicitly shared.Ok, now I've understood what the idea is, and maybe it's done by Nick once before on https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=48471 I'm reading the code (shm so far) and kernel deeply now, and I'm so sorry about confusing you due to less knowledge of kernel code. There are two identifier, ID and KEY (shmid and shmkey for SHM) on SysV IPC object. The KEY space must be separated for each jails I think, but about the ID, is determined by kernel and userland users don't care its value, do you really think it should be managed separately in kernel? I agree that "multiple jails with only their own objects" is good design basically, but especially, if you want to support hierarchical jails, the objects will be referenced by multiple jails. If ID space separation is not so important, separating internal namespace for each jail is too complicate for simple KEY space separation, I think. I really should have implemented to conceal information instead of returning EACCES, sorry. ;/ Before jumping to the conclusion, I want to know that *current* code relative to SHM whether have any problems on sharing underlaying vm page between processes that jailed to different jails each other, especially on fork and jail_attach. multithreaded process perhaps? I'm also trying to port Nick's code to 10/stable, then I've found potentially dangerous behavior happens if a process do jail_attach to other jail, due to the separate namespaces have separate ID space for each jail. I guess it was not happned on 4.8 because lack of jail_attach. For example, a process attached to shmid = 65535 on jid=1, then the process changes its jail to jid=2, and if shmid = 65535 exists on jid=2, the process refers wrong vm mapping unless maintain shmmap_state data for the process every jail_attach. Maybe this behavior is something relative to the race that you mentioned before?> For instance back then I could not find any reliable mechanism to tell > me whether given process has a shared address space. There is only a > vm_refcnt counter in vmspace which is modified on various occasions,Hmm, sorry I can't understand what the problem is here... I'm not good at kernel internals yet, so I don't know details of when the processes share the address space, and I have no idea why you want to know whether the process has a shared address space or not...> It is easy to implement it for "private purposes" (i.e. > disregarding possible attacks with jailing processes). The real work is > making the whole business safe.I agree. Is there any project ongoing for this sysvipc issue? If any, what is needed to be done? P.S. I'm really sorry if you feel bad, due to my bad English skill... Regards, Kikuchan On Mon, 8 Jun 2015 19:17:03 +0200, Mateusz Guzik <mjguzik at gmail.com> wrote:> On Mon, Jun 08, 2015 at 01:42:21AM +0900, kikuchan wrote: >> From my curiosity, is my patch a technically bad? >> Is there race condition in it? Or, enabling key_t separation for jail >> could trigger race condition, perhaps? >> > > I only briefly looked at the patch. The fact that you perform outside of > ipcperm looks suspicious but may be harmless, so at best it's a bad > style. If you need ipc mechanism-specifc functions, make them call > ipcperm instead. > > The jail check is too simplistic. Jails higher in the hierarchy should be > able to access whatever lower jails produced. > >> >> > I do see great benefit in having jail-aware ipcs. >> > >> > I do not believe the way to achieve it is to add jail-aware permission >> > checks. Support in question should provide support for separate >> > namespaces. The are several upsides, including lack of conflict between >> > jails and plugged infoleaks. >> >> Sorry but I might misunderstand what your "separate namespaces" means. >> What namespace are you going to separate? key_t, shmid, kernel >> structure of shm, or others? >> What features do your "jail-aware ipcs" provide? >> > > Well, as I said in my first mail the idea is to make ipc code look at > structures assigned to given jail, so that we can have multiple jails > with only their own objects. No "well, this id is used by other jail", > unless the namespace is explicitly shared. > > I did have a patch with a meh implementation doing this, but I lost it > along the way. It is easy to implement it for "private purposes" (i.e. > disregarding possible attacks with jailing processes). The real work is > making the whole business safe. > > For instance back then I could not find any reliable mechanism to tell > me whether given process has a shared address space. There is only a > vm_refcnt counter in vmspace which is modified on various occasions, > thus is not suitable. Adding a separate counter sucks and adding a "once > set, never cleared flag" sucks as well. Maybe there is a good method.