GILLIBERT, Andre
2021-Aug-25 21:00 UTC
[Rd] Is it a good choice to increase the NCONNECTION value?
Hello, The soft limit to the number of file descriptors is 1024 on GNU/Linux but the default hard limit is at 1048576 or 524288 on major modern distributions : Ubuntu, Fedora, Debian. I do not have access to a Macintosh, but it looks like the soft limit is 256 and hard limit is "unlimited", though actually, the real hard limit has been reported as 10240 (https://developer.r-project.org/Blog/public/2018/03/23/maximum-number-of-dlls/index.html). Therefore, R should easily be able to change the limit without superuser privileges, with a call to setrlimit(). This should make file descriptor exhaustion very unlikely, except for buggy programs leaking file descriptors. The simplest approach would be to set the soft limit to the value of the hard limit. Maybe to be nicer, R could set it to 10000 (or the hard limit if lower), which should be enough for intensive uses but would not use too much system resources in case of file descriptor leaks. To get R reliably work in more esoteric operating systems or in poorly configured systems (e.g. systems with a hard limit at 1024), a second security could be added: a request of a new connection would be denied if the actual number of open file descriptors (or connections if that is easier to compute) is too close to the hard limit. A fixed amount (e.g. 128) or a proportion (e.g. 25%) of file descriptors would be reserved for "other uses", such as shared libraries. This discussion reminds me of the fixed number of file descriptors of MS-DOS, defined at boot time in config.sys (e.g. files=20). This is incredible that 64 bits computers in 2021 with gigabytes of RAM still have similar limits, and that R, has a hard-coded limit at 128. -- Sincerely Andr? GILLIBERT ________________________________ De : qweytr1 at mail.ustc.edu.cn <qweytr1 at mail.ustc.edu.cn> Envoy? : mercredi 25 ao?t 2021 06:15:59 ? : Simon Urbanek Cc : Martin Maechler; GILLIBERT, Andre; R-devel Objet : ??: [SPAM] Re: [Rd] Is it a good choice to increase the NCONNECTION value? ATTENTION: Cet e-mail provient d?une adresse mail ext?rieure au CHU de Rouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pi?ces jointes ? moins de conna?tre l'exp?diteur et de savoir que le contenu est s?r. En cas de doute, transf?rer le mail ? ? DSI, S?curit? ? pour analyse. Merci de votre vigilance Simon, What about using a dynamically allocated connections and a modifiable MAX_NCONNECTIONS limit? ulimit could be modified by root users, at least now NCONNECTION could not. I tried changing the program using malloc and realloc to allocate memory, due to unfamiliar with `.Internal` calls, I could not provide a function that modify the MAX_NCONNECTIONS (but it is possible.) test and changes are shown below. I'll be appperciate if you could tell me whether there could be a bug. (a demo that may change MAX_NCONNECTIONS, not tested.) static int SetMaxNconnections(int now){ // return current value of MAX_NCONNECTIONS if(now<3)error(_("Could not shrink the MAX_NCONNECTIONS less than 3")); if(now>65536)warning(_("Setting MAX_NCONNECTIONS=%d, larger than 65536, may be crazy. Use at your own risk."),now); // setting MAX_NCONNECTIONS to a really large value is safe, since the allocation is not done immediately. Thus this is a warning. if(now>=NCONNECTIONS)return MAX_NCONNECTIONS=now; // if now is larger than NCONNECTIONS<=now,MAX_NCONNECTIONS, thus it is safe. R_gc(); /* Try to reclaim unused connections */ for(int i=NCONNECTIONS;i>=now;--i){// now >= 3 here, thus no underflow occurs. // shrink the value of MAX_NCONNECTIONS and NCONNECTIONS if(!Connections[i]){now=i+1;break;} } // here, we could call a realloc, since *Connections only capture several kilobytes, realloc seems meaningless. // a true realloc will trigger if NCONNECTIONS<MAX_NCONNECTIONS and call NextConnection with all connections are in use return MAX_NCONNECTIONS=NCONNECTIONS=now; } test result: $ LC_ALL=C R-4.1.1/bin/R -q -e 'library(doParallel);cl=makeForkCluster(128);max(sapply(clusterCall(cl,function()runif(10)),"+"))' WARNING: ignoring environment value of R_HOME> library(doParallel);cl=makeForkCluster(128);max(sapply(clusterCall(cl,function()runif(10)),"+"))Loading required package: foreach Loading required package: iterators Loading required package: parallel Warning messages: 1: In socketAccept(socket = socket, blocking = TRUE, open = "a+b", : increase max connections from 16 to 32 2: In socketAccept(socket = socket, blocking = TRUE, open = "a+b", : increase max connections from 32 to 64 3: In socketAccept(socket = socket, blocking = TRUE, open = "a+b", : increase max connections from 64 to 128 4: In socketAccept(socket = socket, blocking = TRUE, open = "a+b", : increase max connections from 128 to 256 [1] 0.9975836> >tested changes: ~line 127 static int NCONNECTIONS=16; /* need one per cluster node, 16 is the initial value which grows dynamically */ static int MAX_NCONNECTIONS=8192; /* increase it only affect the speed of finding the correct connection, if you have a machine with more than 4096 threads, you could submit an issue or modify this value manually */ #define NSINKS 21 static Rconnection *Connections=NULL; /* we will allocate it later */ ... ~line 146 static int NextConnection(void) { int i; for(i = 3; i < NCONNECTIONS; i++) if(!Connections[i]) break; if(i >= NCONNECTIONS) { R_gc(); /* Try to reclaim unused ones */ for(i = 3; i < NCONNECTIONS; i++) if(!Connections[i]) break; if(i >= NCONNECTIONS) { if(i >= MAX_NCONNECTIONS) error(_("all connections are in use")); int new_connections=NCONNECTIONS*2;//try dynamic alloc if(new_connections > MAX_NCONNECTIONS) new_connections = MAX_NCONNECTIONS; Rconnection*ptr = realloc(Connections,new_connections*sizeof(Rconnection)); if (ptr==NULL) error(_("alloc extra connections failed")); warning(_("increase max connections from %d to %d\n"),NCONNECTIONS,new_connections); Connections = ptr; NCONNECTIONS = new_connections; for(int j = i; j < NCONNECTIONS; j++) Connections[j] = NULL; } } return i; } ... ~line 5265 void attribute_hidden InitConnections() { int i; Connections=malloc(NCONNECTIONS*sizeof(Rconnection)); if(Connections == NULL) { error(_("Cannot alloc connections.")); abort(); } ...> -----????----- > ???: "Simon Urbanek" <simon.urbanek at R-project.org> > ????: 2021-08-25 08:25:47 (???) > ???: "Martin Maechler" <maechler at stat.math.ethz.ch> > ??: "GILLIBERT, Andre" <Andre.Gillibert at chu-rouen.fr>, "qweytr1 at mail.ustc.edu.cn" <qweytr1 at mail.ustc.edu.cn>, R-devel <R-devel at r-project.org> > ??: [SPAM] Re: [Rd] Is it a good choice to increase the NCONNECTION value? > > Martin, > > I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue. > > Cheers, > Simon > > > > On Aug 25, 2021, at 8:53 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote: > > > >>>>>> GILLIBERT, Andre > >>>>>> on Tue, 24 Aug 2021 09:49:52 +0000 writes: > > > >> RConnection is a pointer to a Rconn structure. The Rconn > >> structure must be allocated independently (e.g. by > >> malloc() in R_new_custom_connection). Therefore, > >> increasing NCONNECTION to 1024 should only use 8 > >> kilobytes on 64-bits platforms and 4 kilobytes on 32 > >> bits platforms. > > > > You are right indeed, and I was wrong. > > > >> Ideally, it should be dynamically allocated : either as > >> a linked list or as a dynamic array > >> (malloc/realloc). However, a simple change of > >> NCONNECTION to 1024 should be enough for most uses. > > > > There is one important other problem I've been made aware > > (similarly to the number of open DLL libraries, an issue 1-2 > > years ago) : > > > > The OS itself has limits on the number of open files > > (yes, I know that there are other connections than files) and > > these limits may quite differ from platform to platform. > > > > On my Linux laptop, in a shell, I see > > > > $ ulimit -n > > 1024 > > > > which is barely conformant with your proposed 1024 NCONNECTION. > > > > Now if NCONNCECTION is larger than the max allowed number of > > open files and if R opens more files than the OS allowed, the > > user may get quite unpleasant behavior, e.g. R being terminated brutally > > (or behaving crazily) without good R-level warning / error messages. > > > > It's also not at all sufficient to check for the open files > > limit at compile time, but rather at R process startup time > > > > So this may need considerably more work than you / we have > > hoped, and it's probably hard to find a safe number that is > > considerably larger than 128 and less than the smallest of all > > non-crazy platforms' {number of open files limit}. > > > >> Sincerely > >> Andr? GILLIBERT > > > > [............] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Simon Urbanek
2021-Aug-25 23:27 UTC
[Rd] Is it a good choice to increase the NCONNECTION value?
Andre, as stated earlier, R already uses setrlimit() to raise the limit (see my earlier reply). As for "special" connections, that is not feasible (without some serious re-write), since the connection doesn't know what it is used for and connections are not the only way descriptors may be used. Anyway, I think the take away was that likely the best way forward is to make it configurable at startup time with possible option to check that value against the feasibility of open connections. Cheers, Simon> On Aug 26, 2021, at 9:00 AM, GILLIBERT, Andre <Andre.Gillibert at chu-rouen.fr> wrote: > > Hello, > > > The soft limit to the number of file descriptors is 1024 on GNU/Linux but the default hard limit is at 1048576 or 524288 on major modern distributions : Ubuntu, Fedora, Debian. > > I do not have access to a Macintosh, but it looks like the soft limit is 256 and hard limit is "unlimited", though actually, the real hard limit has been reported as 10240 (https://developer.r-project.org/Blog/public/2018/03/23/maximum-number-of-dlls/index.html). > > > Therefore, R should easily be able to change the limit without superuser privileges, with a call to setrlimit(). > > This should make file descriptor exhaustion very unlikely, except for buggy programs leaking file descriptors. > > > The simplest approach would be to set the soft limit to the value of the hard limit. Maybe to be nicer, R could set it to 10000 (or the hard limit if lower), which should be enough for intensive uses but would not use too much system resources in case of file descriptor leaks. > > > To get R reliably work in more esoteric operating systems or in poorly configured systems (e.g. systems with a hard limit at 1024), a second security could be added: a request of a new connection would be denied if the actual number of open file descriptors (or connections if that is easier to compute) is too close to the hard limit. A fixed amount (e.g. 128) or a proportion (e.g. 25%) of file descriptors would be reserved for "other uses", such as shared libraries. > > > This discussion reminds me of the fixed number of file descriptors of MS-DOS, defined at boot time in config.sys (e.g. files=20). > > This is incredible that 64 bits computers in 2021 with gigabytes of RAM still have similar limits, and that R, has a hard-coded limit at 128. > > > -- > > Sincerely > > Andr? GILLIBERT > > ________________________________ > De : qweytr1 at mail.ustc.edu.cn <qweytr1 at mail.ustc.edu.cn> > Envoy? : mercredi 25 ao?t 2021 06:15:59 > ? : Simon Urbanek > Cc : Martin Maechler; GILLIBERT, Andre; R-devel > Objet : ??: [SPAM] Re: [Rd] Is it a good choice to increase the NCONNECTION value? > > ATTENTION: Cet e-mail provient d?une adresse mail ext?rieure au CHU de Rouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pi?ces jointes ? moins de conna?tre l'exp?diteur et de savoir que le contenu est s?r. En cas de doute, transf?rer le mail ? ? DSI, S?curit? ? pour analyse. Merci de votre vigilance > > > Simon, > > What about using a dynamically allocated connections and a modifiable MAX_NCONNECTIONS limit? > ulimit could be modified by root users, at least now NCONNECTION could not. > > I tried changing the program using malloc and realloc to allocate memory, due to unfamiliar with `.Internal` calls, I could not provide a function that modify the MAX_NCONNECTIONS (but it is possible.) > test and changes are shown below. I'll be appperciate if you could tell me whether there could be a bug. > > (a demo that may change MAX_NCONNECTIONS, not tested.) > static int SetMaxNconnections(int now){ // return current value of MAX_NCONNECTIONS > if(now<3)error(_("Could not shrink the MAX_NCONNECTIONS less than 3")); > if(now>65536)warning(_("Setting MAX_NCONNECTIONS=%d, larger than 65536, may be crazy. Use at your own risk."),now); > // setting MAX_NCONNECTIONS to a really large value is safe, since the allocation is not done immediately. Thus this is a warning. > if(now>=NCONNECTIONS)return MAX_NCONNECTIONS=now; // if now is larger than NCONNECTIONS<=now,MAX_NCONNECTIONS, thus it is safe. > R_gc(); /* Try to reclaim unused connections */ > for(int i=NCONNECTIONS;i>=now;--i){// now >= 3 here, thus no underflow occurs. > // shrink the value of MAX_NCONNECTIONS and NCONNECTIONS > if(!Connections[i]){now=i+1;break;} > } > // here, we could call a realloc, since *Connections only capture several kilobytes, realloc seems meaningless. > // a true realloc will trigger if NCONNECTIONS<MAX_NCONNECTIONS and call NextConnection with all connections are in use > return MAX_NCONNECTIONS=NCONNECTIONS=now; > } > > > > test result: > > $ LC_ALL=C R-4.1.1/bin/R -q -e 'library(doParallel);cl=makeForkCluster(128);max(sapply(clusterCall(cl,function()runif(10)),"+"))' > WARNING: ignoring environment value of R_HOME >> library(doParallel);cl=makeForkCluster(128);max(sapply(clusterCall(cl,function()runif(10)),"+")) > Loading required package: foreach > Loading required package: iterators > Loading required package: parallel > Warning messages: > 1: In socketAccept(socket = socket, blocking = TRUE, open = "a+b", : > increase max connections from 16 to 32 > 2: In socketAccept(socket = socket, blocking = TRUE, open = "a+b", : > increase max connections from 32 to 64 > 3: In socketAccept(socket = socket, blocking = TRUE, open = "a+b", : > increase max connections from 64 to 128 > 4: In socketAccept(socket = socket, blocking = TRUE, open = "a+b", : > increase max connections from 128 to 256 > [1] 0.9975836 >> >> > > > tested changes: > > > ~line 127 > > static int NCONNECTIONS=16; /* need one per cluster node, 16 is the > initial value which grows dynamically */ > static int MAX_NCONNECTIONS=8192; /* increase it only affect the speed of > finding the correct connection, if you have a machine with more than > 4096 threads, you could submit an issue or modify this value manually */ > #define NSINKS 21 > > static Rconnection *Connections=NULL; /* we will allocate it later */ > ... > > ~line 146 > > > > static int NextConnection(void) > { > int i; > for(i = 3; i < NCONNECTIONS; i++) > if(!Connections[i]) break; > if(i >= NCONNECTIONS) { > R_gc(); /* Try to reclaim unused ones */ > for(i = 3; i < NCONNECTIONS; i++) > if(!Connections[i]) break; > if(i >= NCONNECTIONS) { > if(i >= MAX_NCONNECTIONS) > error(_("all connections are in use")); > int new_connections=NCONNECTIONS*2;//try dynamic alloc > if(new_connections > MAX_NCONNECTIONS) > new_connections = MAX_NCONNECTIONS; > Rconnection*ptr = realloc(Connections,new_connections*sizeof(Rconnection)); > if (ptr==NULL) > error(_("alloc extra connections failed")); > warning(_("increase max connections from %d to %d\n"),NCONNECTIONS,new_connections); > Connections = ptr; > NCONNECTIONS = new_connections; > for(int j = i; j < NCONNECTIONS; j++) Connections[j] = NULL; > } > } > return i; > } > ... > > > > ~line 5265 > > void attribute_hidden InitConnections() > { > int i; > Connections=malloc(NCONNECTIONS*sizeof(Rconnection)); > if(Connections == NULL) { > error(_("Cannot alloc connections.")); > abort(); > } > ... > > >> -----????----- >> ???: "Simon Urbanek" <simon.urbanek at R-project.org> >> ????: 2021-08-25 08:25:47 (???) >> ???: "Martin Maechler" <maechler at stat.math.ethz.ch> >> ??: "GILLIBERT, Andre" <Andre.Gillibert at chu-rouen.fr>, "qweytr1 at mail.ustc.edu.cn" <qweytr1 at mail.ustc.edu.cn>, R-devel <R-devel at r-project.org> >> ??: [SPAM] Re: [Rd] Is it a good choice to increase the NCONNECTION value? >> >> Martin, >> >> I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue. >> >> Cheers, >> Simon >> >> >>> On Aug 25, 2021, at 8:53 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote: >>> >>>>>>>> GILLIBERT, Andre >>>>>>>> on Tue, 24 Aug 2021 09:49:52 +0000 writes: >>> >>>> RConnection is a pointer to a Rconn structure. The Rconn >>>> structure must be allocated independently (e.g. by >>>> malloc() in R_new_custom_connection). Therefore, >>>> increasing NCONNECTION to 1024 should only use 8 >>>> kilobytes on 64-bits platforms and 4 kilobytes on 32 >>>> bits platforms. >>> >>> You are right indeed, and I was wrong. >>> >>>> Ideally, it should be dynamically allocated : either as >>>> a linked list or as a dynamic array >>>> (malloc/realloc). However, a simple change of >>>> NCONNECTION to 1024 should be enough for most uses. >>> >>> There is one important other problem I've been made aware >>> (similarly to the number of open DLL libraries, an issue 1-2 >>> years ago) : >>> >>> The OS itself has limits on the number of open files >>> (yes, I know that there are other connections than files) and >>> these limits may quite differ from platform to platform. >>> >>> On my Linux laptop, in a shell, I see >>> >>> $ ulimit -n >>> 1024 >>> >>> which is barely conformant with your proposed 1024 NCONNECTION. >>> >>> Now if NCONNCECTION is larger than the max allowed number of >>> open files and if R opens more files than the OS allowed, the >>> user may get quite unpleasant behavior, e.g. R being terminated brutally >>> (or behaving crazily) without good R-level warning / error messages. >>> >>> It's also not at all sufficient to check for the open files >>> limit at compile time, but rather at R process startup time >>> >>> So this may need considerably more work than you / we have >>> hoped, and it's probably hard to find a safe number that is >>> considerably larger than 128 and less than the smallest of all >>> non-crazy platforms' {number of open files limit}. >>> >>>> Sincerely >>>> Andr? GILLIBERT >>> >>> [............] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel