Hi Mateusz,
Thanks for your reply!
First of all, I intend to *jail* SysV IPC user completely.
(unless user really want to interact with each other between jails)
I think SysV IPC is simple but obsolete, so you can design whatever
you want for jail system.
Also, I want keep everything simple.
My design (to be sure):
- Each entry of the list (shown in ipcs) belongs to a jail.
- Any operation to SHM/SEM/MSG attempted from another jail, will just
fail with EACCES.
> "Address space can be shared between multiple PROCESSES, what happens
if
> such a pair ends up in different jails? Preferably such a scenario would
> be prohibited to avoid future accidents."
>
> However, sysvipc namespace sharing is an ok feature esp. with
> multi-level jails. In the simplest scenario upon jail creation you
> decide whether it gets its own namespace or inherits it.
>
> > > What about existing sysvshm mappings when jailing?
> >
> > Real (not jailed) environment is treated as a jail with jid=0 in
kernel.
> > If you create sysvshm memory segment before entering a jail, the
> > segment simply owned by jid=0.
> >
>
> The point is you get a process with sysvshm segments from 2 different
> jails. Looks like solid trouble protential.
Ok, I think I've got what you'd concerned.
In my design, setting up such processes would be difficult.
This wouldn't be happend normally, because shared memory segments
should be obtained BEFORE entering a jail;
1. Create a segment on jid=0 with shmget()
2. shmat() to attach (get void *ptr)
3. fork()
4. A child process entering to jid=1 with jail_attach()
5. The child process and the parent process can share the address
space (via *ptr).
6. If the child process do shmat() on the same ID again, it simply
failed with EACCES.
It means, there is NO way to obtain a segment created in other jail
AFTER jailed (even if you're root or obtaining the segment created on
jid=0).
This looks like file descriptors.
It's also a fool-proof design, because most applications don't use
FreeBSD's unique and special jail system directly.
At this point, the patch breaks backward-compatibility, so we might
need an option in kernel configuration.
It's enough by configuration, because I think the current FreeBSD's
SysV IPC behavior of jail is a BUG, not a feature.
# Who using jail wants this dangerous behavior?!
> The feature in question is definitely desirable, but your patch is hack,
> with the "hack" part visible to userspace.
> As mentioned earlier there are some things to do before any kind of
> jail-aware ipcs land in the tree.
Thanks. These should be solved.
> As a minimum this is singlethreading
> when jailing, prevention of jailing processes with shared virtual address
> spaces and ones with existing sysvshm mappings. All this is to reduce
> amount of bugs one would have to deal with.
Virtual memory allocation and related stuff are protected and done by
kernel already, because it's an IPC (Inter Process Communication).
Moreover, you cannot change an owner of the IPC entry after creation,
so we don't need an additional protection in kernel.
Regards,
Kikuchan
On Sun, Jun 7, 2015 at 10:39 AM, Mateusz Guzik <mjguzik at gmail.com>
wrote:> On Sun, Jun 07, 2015 at 12:04:17AM +0900, kikuchan wrote:
>> Sorry for cross-post to freebsd-stable, but I want to get more
>> feedback for my patch.
>> (The patch is;
http://lists.freebsd.org/pipermail/freebsd-jail/attachments/20150606/7736309b/attachment.bin)
>>
>>
>> I believe this patch FIXES current SysV IPC for jail WITHOUT changing
>> current kernel architecture.
>> (so I hope it will be merged into stable/10)
>>
>> Let me explain what happens currently, with and without my patch,
>> since it's little confusing.
>>
>>
>> I use SysV IPC shared memory (SYSVSHM) as an example here, because
>> it's easy to understand.
>> Remember shmget / shmat / shmdt / shmctl, are syscalls of SYSVSHM.
>>
>> All normal processes have its own virtual memory space, it is done by
kernel.
>> A backend component of virtual memory is a page, is on real memory or
>> on swap devices.
>>
>> SYSVSHM provides a way to share memory segments on the page between
>> processes on userland.
>> A process can load the page into its own virtual memory space with
>> shmat syscall.
>> Once the page is loaded into the virtual memory space, the page is
>> accessible until further shmdt syscall or exit of process.
>>
>> Another process can obtain the exact same page, by calling shmat
syscall.
>> So, permission of shmat syscall is very important.
>>
>>
>> > Address space can be shared between multiple jails
>>
>
> This was a typo. Let me quote fixed version:
>
> "Address space can be shared between multiple PROCESSES, what happens
if
> such a pair ends up in different jails? Preferably such a scenario would
> be prohibited to avoid future accidents."
>
> However, sysvipc namespace sharing is an ok feature esp. with
> multi-level jails. In the simplest scenario upon jail creation you
> decide whether it gets its own namespace or inherits it.
>
>> > What about existing sysvshm mappings when jailing?
>>
>> Real (not jailed) environment is treated as a jail with jid=0 in
kernel.
>> If you create sysvshm memory segment before entering a jail, the
>> segment simply owned by jid=0.
>>
>
> The point is you get a process with sysvshm segments from 2 different
> jails. Looks like solid trouble protential.
>
>>
>> > Extending struct prison with relevant pointers and updating the
code to
>>
>> You don't need to extend the struct to separate IPC namespaces.
>> The word "namespaces" means a key (key_t) of IPC syscall,
here.
>>
>> Whether the struct should be extended or not, depends on how we want
>> to control IPC resources for each jail.
>> If you want to control SysV IPC resources by changing sysctl
>> parameters from inside of jail for each jail,
>> then it might be yes.
>> But I think per-jail resource control should be done with RACCT, and
>> it might be applied to my implementation too.
>>
>>
>> The one missing feature is how to export information to userland.
>> This should be discuss separately, even if my patch is rejected.
>> (If visibility control is needed for ipcs, maybe it should use similar
>> technique to ps or netstat?)
>>
>>
>> Conclusion;
>> I think my patch is better than broken. (SysV IPC + jail is buggy over
>> 10 years!)
>>
>
> The feature in question is definitely desirable, but your patch is hack,
> with the "hack" part visible to userspace.
>
> As mentioned earlier there are some things to do before any kind of
> jail-aware ipcs land in the tree. As a minimum this is singlethreading
> when jailing, prevention of jailing processes with shared virtual address
> spaces and ones with existing sysvshm mappings. All this is to reduce
> amount of bugs one would have to deal with.
>
> After the work is completed there is no problem whatsoever with
> providing per-jail sysvipcs. This avoids information leaks (no id list
> to look at) and conflicts.
>
> Exporting is not a problem either - a dedicated sysctl grabs JID and
> dumps its ipcs. It also gets a 'recursive' flag to know whether
ipcs
> for its own jails should be dumped as well (if different).
>
> --
> Mateusz Guzik <mjguzik gmail.com>