Paul Harvey
2012-Dec-06 23:27 UTC
1000 Domains: Not able to access Domu via xm console from Dom0
Hi all, I am running Xen 4.1.2 with ubuntu Dom0. I have essentially got 1000 Modified Mini-OS DomU''s running at the same time. When i try and access the 1000th domain console: xm console DOM1000 xenconsole: could not read tty from store: No such file or directory The domain is alive and running according to xentop, and has been for some time. I can successfully access the first 338 domains with xm console, but (a sampling) of the rest give the above error. Any help, or is this a limitation of Xen? Thanks Paul _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Ian Campbell
2012-Dec-07 10:03 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote:> Any help, or is this a limitation of Xen?One limit you might be hitting is the number of event channels which dom0 can handle. The maximum is currently 1024 for a 32-bit domains and 4096 for 64-bit (that''s per domains, not total in the system). Depending on the configuration of the mini-os domains (e.g. number of devices etc) you might be hitting this -- "lsevtchn 0" might give a clue if this is happening (that tool is in /usr/lib/xen somewhere). Work has just started on expanding these limits to ~32k and ~512k for 32- and 64-bit domains respectively, the hope is that this will be done in time for 4.3. Look for posts from Wei Liu on xen-devel this week. If you aren''t hitting the evtchn limits then maybe you are hitting some dom0 OS level limitation, i.e. a ulimit on the number of open file descriptors which xenconsoled can have or some limit on the number of pty''s. Ian.
Paul Harvey
2012-Dec-11 22:07 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On 7 December 2012 10:03, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote: > > > Any help, or is this a limitation of Xen? > > One limit you might be hitting is the number of event channels which > dom0 can handle. The maximum is currently 1024 for a 32-bit domains and > 4096 for 64-bit (that''s per domains, not total in the system). Depending > on the configuration of the mini-os domains (e.g. number of devices etc) > you might be hitting this -- "lsevtchn 0" might give a clue if this is > happening (that tool is in /usr/lib/xen somewhere). > > Work has just started on expanding these limits to ~32k and ~512k for > 32- and 64-bit domains respectively, the hope is that this will be done > in time for 4.3. Look for posts from Wei Liu on xen-devel this week. > > If you aren''t hitting the evtchn limits then maybe you are hitting some > dom0 OS level limitation, i.e. a ulimit on the number of open file > descriptors which xenconsoled can have or some limit on the number of > pty''s. > > Ian. >Hi Ian, Thanks for the quick reply! Have looked into your suggestions and: * It is NOT the number of evntchns, this is much less that the limits you mention * It is NOT the number of allowable PTY''s, the number used is much less than the limit * The number of per process file descriptors was set to 1024, but i have increased this to thousands : ulimint -n 10240 To hammer this point home, i built a wee C file to allocate pty''s. Before i changed the limit i got problems past 1024, now it work fine as root, or any user. But, when i create ~350 domains: cat /proc/<xenconsoled>/fd | wc -l 1024 only ever goes as high as 1024, and does not increase for subsequently added domains. Any other ideas? Also, as a side note, any idea why the domain creation time grows quadratically? Thanks Paul _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Paul Harvey
2012-Dec-12 12:02 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
Further to that last email looking in the xen store is confirming that the tty (pty) is not being assigned to the domains above 338 root@desktop:~# xenstore-ls /local/domain/339/console ring-ref = "750902" port = "2" limit = "1048576" type = "xenconsoled" Whereas for 338 we get: root@desktop:~# xenstore-ls /local/domain/338/console ring-ref = "737537" port = "2" limit = "1048576" type = "xenconsoled" tty = "/dev/pts/342" On 11 December 2012 22:07, Paul Harvey <jhebus@googlemail.com> wrote:> On 7 December 2012 10:03, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote: >> >> > Any help, or is this a limitation of Xen? >> >> One limit you might be hitting is the number of event channels which >> dom0 can handle. The maximum is currently 1024 for a 32-bit domains and >> 4096 for 64-bit (that''s per domains, not total in the system). Depending >> on the configuration of the mini-os domains (e.g. number of devices etc) >> you might be hitting this -- "lsevtchn 0" might give a clue if this is >> happening (that tool is in /usr/lib/xen somewhere). >> >> Work has just started on expanding these limits to ~32k and ~512k for >> 32- and 64-bit domains respectively, the hope is that this will be done >> in time for 4.3. Look for posts from Wei Liu on xen-devel this week. >> >> If you aren''t hitting the evtchn limits then maybe you are hitting some >> dom0 OS level limitation, i.e. a ulimit on the number of open file >> descriptors which xenconsoled can have or some limit on the number of >> pty''s. >> >> Ian. >> > > Hi Ian, > > Thanks for the quick reply! > > Have looked into your suggestions and: > > * It is NOT the number of evntchns, this is much less that the limits you > mention > > * It is NOT the number of allowable PTY''s, the number used is much less > than the limit > > * The number of per process file descriptors was set to 1024, but i have > increased this to thousands : > ulimint -n > 10240 > > To hammer this point home, i built a wee C file to allocate pty''s. Before > i changed the limit i got problems past 1024, now it work fine as root, or > any user. > > But, when i create ~350 domains: > > cat /proc/<xenconsoled>/fd | wc -l > 1024 > > only ever goes as high as 1024, and does not increase for subsequently > added domains. > > Any other ideas? > > Also, as a side note, any idea why the domain creation time grows > quadratically? > > Thanks > > Paul > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Paul Harvey
2012-Dec-13 12:24 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
So, i attached strace to xenconsoled to see i could find what was going on and i got this ioctl(1023, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1023, TIOCGPTN, [345]) = 0 stat("/dev/pts/345", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 345), ...}) = 0 open("/dev/pts/345", O_RDWR|O_NOCTTY) = -1 EMFILE (Too many open files) close(1023) = 0 write(2, "Failed to create tty for domain-"..., 70) = 70 open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 1023 fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0 fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0 So this is definitely a problem with file limits, but i don''t understand as the current limit on files per process is 65000 On 12 December 2012 12:02, Paul Harvey <jhebus@googlemail.com> wrote:> Further to that last email looking in the xen store is confirming that the > tty (pty) is not being assigned to the domains above 338 > > root@desktop:~# xenstore-ls /local/domain/339/console > ring-ref = "750902" > port = "2" > limit = "1048576" > type = "xenconsoled" > > Whereas for 338 we get: > > root@desktop:~# xenstore-ls /local/domain/338/console > ring-ref = "737537" > port = "2" > limit = "1048576" > type = "xenconsoled" > tty = "/dev/pts/342" > > > > > > On 11 December 2012 22:07, Paul Harvey <jhebus@googlemail.com> wrote: > >> On 7 December 2012 10:03, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> >> On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote: >>> >>> > Any help, or is this a limitation of Xen? >>> >>> One limit you might be hitting is the number of event channels which >>> dom0 can handle. The maximum is currently 1024 for a 32-bit domains and >>> 4096 for 64-bit (that''s per domains, not total in the system). Depending >>> on the configuration of the mini-os domains (e.g. number of devices etc) >>> you might be hitting this -- "lsevtchn 0" might give a clue if this is >>> happening (that tool is in /usr/lib/xen somewhere). >>> >>> Work has just started on expanding these limits to ~32k and ~512k for >>> 32- and 64-bit domains respectively, the hope is that this will be done >>> in time for 4.3. Look for posts from Wei Liu on xen-devel this week. >>> >>> If you aren''t hitting the evtchn limits then maybe you are hitting some >>> dom0 OS level limitation, i.e. a ulimit on the number of open file >>> descriptors which xenconsoled can have or some limit on the number of >>> pty''s. >>> >>> Ian. >>> >> >> Hi Ian, >> >> Thanks for the quick reply! >> >> Have looked into your suggestions and: >> >> * It is NOT the number of evntchns, this is much less that the limits you >> mention >> >> * It is NOT the number of allowable PTY''s, the number used is much less >> than the limit >> >> * The number of per process file descriptors was set to 1024, but i have >> increased this to thousands : >> ulimint -n >> 10240 >> >> To hammer this point home, i built a wee C file to allocate pty''s. Before >> i changed the limit i got problems past 1024, now it work fine as root, or >> any user. >> >> But, when i create ~350 domains: >> >> cat /proc/<xenconsoled>/fd | wc -l >> 1024 >> >> only ever goes as high as 1024, and does not increase for subsequently >> added domains. >> >> Any other ideas? >> >> Also, as a side note, any idea why the domain creation time grows >> quadratically? >> >> Thanks >> >> Paul >> >> >> >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Ian Campbell
2012-Dec-13 12:36 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On Thu, 2012-12-13 at 12:24 +0000, Paul Harvey wrote:> So, i attached strace to xenconsoled to see i could find what was > going on and i got this > > ioctl(1023, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon > echo ...}) = 0 > ioctl(1023, TIOCGPTN, [345]) = 0 > stat("/dev/pts/345", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, > 345), ...}) = 0 > open("/dev/pts/345", O_RDWR|O_NOCTTY) = -1 EMFILE (Too many open > files) > close(1023) = 0 > write(2, "Failed to create tty for domain-"..., 70) = 70 > open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 1023 > fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0 > fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0 > > > So this is definitely a problem with file limits, but i don''t > understand as the current limit on files per process is 65000I wrote the following yesterday and although I see it in my sent box I can''t see it in the list archives and you don''t seem to have received it either. I''ve no idea where it got to... On Tue, 2012-12-11 at 22:07 +0000, Paul Harvey wrote:> On 7 December 2012 10:03, Ian > Campbell <Ian.Campbell@citrix.com> wrote: > On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote: > > > Any help, or is this a limitation of Xen? > > > One limit you might be hitting is the number of event channels > which > dom0 can handle. The maximum is currently 1024 for a 32-bit > domains and > 4096 for 64-bit (that''s per domains, not total in the system). > Depending > on the configuration of the mini-os domains (e.g. number of > devices etc) > you might be hitting this -- "lsevtchn 0" might give a clue if > this is > happening (that tool is in /usr/lib/xen somewhere). > > Work has just started on expanding these limits to ~32k and > ~512k for > 32- and 64-bit domains respectively, the hope is that this > will be done > in time for 4.3. Look for posts from Wei Liu on xen-devel this > week. > > If you aren''t hitting the evtchn limits then maybe you are > hitting some > dom0 OS level limitation, i.e. a ulimit on the number of open > file > descriptors which xenconsoled can have or some limit on the > number of > pty''s. > > Ian. > > > Hi Ian, > > > Thanks for the quick reply! > > > Have looked into your suggestions and: > > > * It is NOT the number of evntchns, this is much less that the limits > you mentionOOI how many event channels do your 1000 domains require?> * It is NOT the number of allowable PTY''s, the number used is much > less than the limitAgain OOI how many?> * The number of per process file descriptors was set to 1024, but i > have increased this to thousands : > ulimint -n > 10240Did you apply this to the xenconsoled and other daemon processes too? setting ulimit only effects the current process and its children.> To hammer this point home, i built a wee C file to allocate pty''s. > Before i changed the limit i got problems past 1024, now it work fine > as root, or any user. > > > But, when i create ~350 domains: > > > cat /proc/<xenconsoled>/fd | wc -l > 1024 > > > only ever goes as high as 1024, and does not increase for subsequently > added domains.I suspect you haven''t actually increased the ulimit for this process. What does /proc/<xenconsoled>/limits contain? There may also be sysctls which limit the number of fds a process can have.> Any other ideas?> Also, as a side note, any idea why the domain creation time grows > quadratically?Grows with the number of running domains you mean? There were some memory allocator optimisations discussed on xen-devel recently, but I don''t recall the details enough to know if it is relevant here, it could be that though. Other than that I''m afraid I''ve no ideas. Ian.
Paul Harvey
2012-Dec-13 15:28 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On 13 December 2012 14:58, Paul Harvey <jhebus@googlemail.com> wrote:> > > > On 13 December 2012 12:36, Ian Campbell <Ian.Campbell@citrix.com> wrote: > >> On Thu, 2012-12-13 at 12:24 +0000, Paul Harvey wrote: >> > So, i attached strace to xenconsoled to see i could find what was >> > going on and i got this >> > >> > ioctl(1023, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon >> > echo ...}) = 0 >> > ioctl(1023, TIOCGPTN, [345]) = 0 >> > stat("/dev/pts/345", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, >> > 345), ...}) = 0 >> > open("/dev/pts/345", O_RDWR|O_NOCTTY) = -1 EMFILE (Too many open >> > files) >> > close(1023) = 0 >> > write(2, "Failed to create tty for domain-"..., 70) = 70 >> > open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 1023 >> > fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0 >> > fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0 >> > >> > >> > So this is definitely a problem with file limits, but i don''t >> > understand as the current limit on files per process is 65000 >> >> I wrote the following yesterday and although I see it in my sent box I >> can''t see it in the list archives and you don''t seem to have received it >> either. I''ve no idea where it got to... >> >> >> On Tue, 2012-12-11 at 22:07 +0000, Paul Harvey wrote: >> > On 7 December 2012 10:03, Ian >> > Campbell <Ian.Campbell@citrix.com> wrote: >> > On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote: >> > >> > > Any help, or is this a limitation of Xen? >> > >> > >> > One limit you might be hitting is the number of event channels >> > which >> > dom0 can handle. The maximum is currently 1024 for a 32-bit >> > domains and >> > 4096 for 64-bit (that''s per domains, not total in the system). >> > Depending >> > on the configuration of the mini-os domains (e.g. number of >> > devices etc) >> > you might be hitting this -- "lsevtchn 0" might give a clue if >> > this is >> > happening (that tool is in /usr/lib/xen somewhere). >> > >> > Work has just started on expanding these limits to ~32k and >> > ~512k for >> > 32- and 64-bit domains respectively, the hope is that this >> > will be done >> > in time for 4.3. Look for posts from Wei Liu on xen-devel this >> > week. >> > >> > If you aren''t hitting the evtchn limits then maybe you are >> > hitting some >> > dom0 OS level limitation, i.e. a ulimit on the number of open >> > file >> > descriptors which xenconsoled can have or some limit on the >> > number of >> > pty''s. >> > >> > Ian. >> > >> > >> > Hi Ian, >> > >> > >> > Thanks for the quick reply! >> > >> > >> > Have looked into your suggestions and: >> > >> > >> > * It is NOT the number of evntchns, this is much less that the limits >> > you mention >> >> OOI how many event channels do your 1000 domains require? >> > > >> >> > * It is NOT the number of allowable PTY''s, the number used is much >> > less than the limit >> >> Again OOI how many? >> >> > * The number of per process file descriptors was set to 1024, but i >> > have increased this to thousands : >> > ulimint -n >> > 10240 >> >> Did you apply this to the xenconsoled and other daemon processes too? >> setting ulimit only effects the current process and its children. >> >> > To hammer this point home, i built a wee C file to allocate pty''s. >> > Before i changed the limit i got problems past 1024, now it work fine >> > as root, or any user. >> > >> > >> > But, when i create ~350 domains: >> > >> > >> > cat /proc/<xenconsoled>/fd | wc -l >> > 1024 >> > >> > >> > only ever goes as high as 1024, and does not increase for subsequently >> > added domains. >> >> I suspect you haven''t actually increased the ulimit for this process. >> What does /proc/<xenconsoled>/limits contain? >> >> There may also be sysctls which limit the number of fds a process can >> have. >> >> > Any other ideas? >> >> > Also, as a side note, any idea why the domain creation time grows >> > quadratically? >> >> Grows with the number of running domains you mean? >> >> There were some memory allocator optimisations discussed on xen-devel >> recently, but I don''t recall the details enough to know if it is >> relevant here, it could be that though. Other than that I''m afraid I''ve >> no ideas. >> >> Ian. >> >> >Hi Ian, Thanks for getting back to me :) So: ./lsevntchn 1000 1: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 72 2: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 73 cat /proc/sys/kernel/pty/max 4096 #with 338 Domains. There were 9 systems ones before starting cat /proc/sys/kernel/pty/nr 347 I have changed the configuration file /etc/security/limits.config and rebooted the machines and assumed that this would have applied the new limits to the deamons, but you were right and cat /proc/5388/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 87439 87439 processes Max open files 1024 1024 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 87439 87439 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us I killed all the domains and restarted the xenconsoled. This applies the new limits: cat /proc/27677/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 87439 87439 processes Max open files 65000 65000 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 87439 87439 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us BUT: There is now a buffer overflow happening somewhere which is crashing the deamon when creating the 340th domain, as shown by strace: write(4, "\v\0\0\0\0\0\0\0\0\0\0\0+\0\0\0", 16) = 16 write(4, "/local/domain/1020/console/tty\0", 31) = 31 write(4, "/dev/pts/345", 12) = 12 futex(0xd95124, FUTEX_WAIT_PRIVATE, 14161, NULL) = 0 futex(0xd950f8, FUTEX_WAKE_PRIVATE, 1) = 0 rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7fb5d50284a0}, NULL, 8) = 0 fcntl(1026, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = -1 ENXIO (No such device or address) writev(2, [{"*** ", 4}, {"buffer overflow detected", 24}, {" ***: ", 6}, {"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {" terminated\n", 12}], 5) = 78 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x7fb5d5eb3000 open("/usr/lib/xen-4.1/bin/../lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 1028 fstat(1028, {st_mode=S_IFREG|0644, st_size=85812, ...}) = 0 mmap(NULL, 85812, PROT_READ, MAP_PRIVATE, 1028, 0) = 0x7fb5d5e9e000 close(1028) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 1028 read(1028, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320(\0\0\0\0\0\0"..., 832) = 832 fstat(1028, {st_mode=S_IFREG|0644, st_size=88384, ...}) = 0 mmap(NULL, 2184216, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 1028, 0) = 0x7fb5cf9d1000 mprotect(0x7fb5cf9e6000, 2093056, PROT_NONE) = 0 mmap(0x7fb5cfbe5000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 1028, 0x14000) = 0x7fb5cfbe5000 close(1028) = 0 mprotect(0x7fb5cfbe5000, 4096, PROT_READ) = 0 munmap(0x7fb5d5e9e000, 85812) = 0 futex(0x7fb5d53aedf0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x7fb5cfbe61a4, FUTEX_WAKE_PRIVATE, 2147483647) = 0 write(2, "======= Backtrace: =========\n", 29) = 29 writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__fortify_fail", 14}, {"+0x", 3}, {"37", 2}, {")", 1}, {"[0x", 3}, {"7fb5d50fc807", 12}, {"]\n", 2}], 9) = 69 writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"109700", 6}, {")", 1}, {"[0x", 3}, {"7fb5d50fb700", 12}, {"]\n", 2}], 8) = 59 writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"10a7be", 6}, {")", 1}, {"[0x", 3}, {"7fb5d50fc7be", 12}, {"]\n", 2}], 8) = 59 writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"403cb8", 6}, {"]\n", 2}], 4) = 43 writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"4021d5", 6}, {"]\n", 2}], 4) = 43 writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__libc_start_main", 17}, {"+0x", 3}, {"ed", 2}, {")", 1}, {"[0x", 3}, {"7fb5d501376d", 12}, {"]\n", 2}], 9) = 72 writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"4022ad", 6}, {"]\n", 2}], 4) = 43 write(2, "======= Memory map: ========\n", 29) = 29 On 13 December 2012 15:27, Paul Harvey <jhebus@googlemail.com> wrote:> Sorry, thought that i pressed reply all > > > On 13 December 2012 15:19, Ian Campbell <Ian.Campbell@citrix.com> wrote: > >> Please can you keep this conversation on the mailing list. >> >> On Thu, 2012-12-13 at 15:12 +0000, Paul Harvey wrote: >> [...] >> >> >> >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Ian Campbell
2012-Dec-13 15:35 UTC
Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0
On Thu, 2012-12-13 at 15:28 +0000, Paul Harvey wrote:> ./lsevntchn 1000 > 1: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 72 > 2: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 73When I mentioned evtchn limitations I meant in dom0, IOW the other end of all these. At two evtchns per-minios domain you''d expect to hit issues around 512 domains on a 32 bit domain 0> I have changed the configuration file /etc/security/limits.config and > rebooted the machines and assumed that this would have applied the > new limits to the deamons, but you were right andI don''t have this file on Debian, so I guess it is particular to whichever distro you use -- perhaps there is an dependency issue between the xencommons initscript and whatever initscript applies the settings from /etc/security/limits.config?> I killed all the domains and restarted the xenconsoled. This applies > the new limits:Great!> BUT: > > > There is now a buffer overflow happening somewhere which is crashing > the deamon when creating the 340th domain,Not Great! :-/ I''ve added xen-devel@.> as shown by strace:Unfortunately strace doesn''t give the sort of information needed to diagnose this. Can you run the daemon under gdb? When it crashes you can type "bt" to get a backtrace. If there are debuginfo packages available in your distro installing the ones for the Xen packages would improve the output of this too. If you could figure out where (if anywhere) the daemons stderr (AKA fd 2) was going then that would be useful too. It may be enough to run it in the foreground. Ian.
Ian Campbell
2012-Dec-13 15:35 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On Thu, 2012-12-13 at 15:28 +0000, Paul Harvey wrote:> ./lsevntchn 1000 > 1: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 72 > 2: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 73When I mentioned evtchn limitations I meant in dom0, IOW the other end of all these. At two evtchns per-minios domain you''d expect to hit issues around 512 domains on a 32 bit domain 0> I have changed the configuration file /etc/security/limits.config and > rebooted the machines and assumed that this would have applied the > new limits to the deamons, but you were right andI don''t have this file on Debian, so I guess it is particular to whichever distro you use -- perhaps there is an dependency issue between the xencommons initscript and whatever initscript applies the settings from /etc/security/limits.config?> I killed all the domains and restarted the xenconsoled. This applies > the new limits:Great!> BUT: > > > There is now a buffer overflow happening somewhere which is crashing > the deamon when creating the 340th domain,Not Great! :-/ I''ve added xen-devel@.> as shown by strace:Unfortunately strace doesn''t give the sort of information needed to diagnose this. Can you run the daemon under gdb? When it crashes you can type "bt" to get a backtrace. If there are debuginfo packages available in your distro installing the ones for the Xen packages would improve the output of this too. If you could figure out where (if anywhere) the daemons stderr (AKA fd 2) was going then that would be useful too. It may be enough to run it in the foreground. Ian.
Paul Harvey
2012-Dec-14 13:06 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
SO #with 341 domains ./lsevntchn 0 | wc -l 724 Attaching gdb to xenconsoled, Program received signal SIGABRT, Aborted. 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166 Unfortunately strace doesn''t give the sort of information needed to diagnose this. Can you run the daemon under gdb? When it crashes you can type "bt" to get a backtrace. If there are debuginfo packages available in your distro installing the ones for the Xen packages would improve the output of this too. i don''t really know how to enable the debugging info for these libraries. I can''t see anything on Google about debuginfo packages for Ubuntu 12.04. Incidentally i just grabbed the xen version in there repo following this : https://help.ubuntu.com/community/Xen i did grab a copy of the source of xen 4.1.2 and compiled it with debug in the tools, so that is why i can see proper output for the first two Paul _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Paul Harvey
2012-Dec-14 13:06 UTC
Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0
SO #with 341 domains ./lsevntchn 0 | wc -l 724 Attaching gdb to xenconsoled, Program received signal SIGABRT, Aborted. 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166 Unfortunately strace doesn''t give the sort of information needed to diagnose this. Can you run the daemon under gdb? When it crashes you can type "bt" to get a backtrace. If there are debuginfo packages available in your distro installing the ones for the Xen packages would improve the output of this too. i don''t really know how to enable the debugging info for these libraries. I can''t see anything on Google about debuginfo packages for Ubuntu 12.04. Incidentally i just grabbed the xen version in there repo following this : https://help.ubuntu.com/community/Xen i did grab a copy of the source of xen 4.1.2 and compiled it with debug in the tools, so that is why i can see proper output for the first two Paul _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Wei Liu
2012-Dec-14 14:57 UTC
Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0
On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote:> SO > > #with 341 domains > ./lsevntchn 0 | wc -l > 724 > > Attaching gdb to xenconsoled, > > Program received signal SIGABRT, Aborted. > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > (gdb) bt > #0 0x00007fe588ca8425 in raise () > from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007fe588cabb8b in abort () > from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x00007fe588d7c807 in __fortify_fail () > from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x00007fe588d7b700 in __chk_fail () > from /lib/x86_64-linux-gnu/libc.so.6 > #5 0x00007fe588d7c7be in __fdelt_warn () > from /lib/x86_64-linux-gnu/libc.so.6 > #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 > #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at > daemon/main.c:166 >libc raises exception when it detects memory violation. You can probably try to use valgrind to identify memory leak in xenconsoled. Wei.
Wei Liu
2012-Dec-14 14:57 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote:> SO > > #with 341 domains > ./lsevntchn 0 | wc -l > 724 > > Attaching gdb to xenconsoled, > > Program received signal SIGABRT, Aborted. > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > (gdb) bt > #0 0x00007fe588ca8425 in raise () > from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007fe588cabb8b in abort () > from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x00007fe588d7c807 in __fortify_fail () > from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x00007fe588d7b700 in __chk_fail () > from /lib/x86_64-linux-gnu/libc.so.6 > #5 0x00007fe588d7c7be in __fdelt_warn () > from /lib/x86_64-linux-gnu/libc.so.6 > #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 > #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at > daemon/main.c:166 >libc raises exception when it detects memory violation. You can probably try to use valgrind to identify memory leak in xenconsoled. Wei.
Paul Harvey
2012-Dec-14 16:20 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On 14 December 2012 14:57, Wei Liu <Wei.Liu2@citrix.com> wrote:> On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote: > > SO > > > > #with 341 domains > > ./lsevntchn 0 | wc -l > > 724 > > > > Attaching gdb to xenconsoled, > > > > Program received signal SIGABRT, Aborted. > > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > > (gdb) bt > > #0 0x00007fe588ca8425 in raise () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #1 0x00007fe588cabb8b in abort () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > > #3 0x00007fe588d7c807 in __fortify_fail () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #4 0x00007fe588d7b700 in __chk_fail () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #5 0x00007fe588d7c7be in __fdelt_warn () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 > > #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at > > daemon/main.c:166 > > > > libc raises exception when it detects memory violation. > > You can probably try to use valgrind to identify memory leak in > xenconsoled. > > > Wei. > >Feeling in a little over my head now, I have run valgrind and include the file with the output. As before xenconsoled crashes, but i am not really sure how to read what i am seeing from valgrind. I am not really sure if it is telling me that these errors happen as it goes along, or if it is as a result of the crash that there are lost blocks around. Valgrind was run with: valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=20 --log-file="valgrind_output.txt" --track-fds=yes ./xenconsoled --pid-file=/var/run/xenconsoled.pid If the attached file doesn''t show, could you tell where it should go? Paul _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Paul Harvey
2012-Dec-14 16:20 UTC
Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0
On 14 December 2012 14:57, Wei Liu <Wei.Liu2@citrix.com> wrote:> On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote: > > SO > > > > #with 341 domains > > ./lsevntchn 0 | wc -l > > 724 > > > > Attaching gdb to xenconsoled, > > > > Program received signal SIGABRT, Aborted. > > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > > (gdb) bt > > #0 0x00007fe588ca8425 in raise () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #1 0x00007fe588cabb8b in abort () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > > #3 0x00007fe588d7c807 in __fortify_fail () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #4 0x00007fe588d7b700 in __chk_fail () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #5 0x00007fe588d7c7be in __fdelt_warn () > > from /lib/x86_64-linux-gnu/libc.so.6 > > #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 > > #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at > > daemon/main.c:166 > > > > libc raises exception when it detects memory violation. > > You can probably try to use valgrind to identify memory leak in > xenconsoled. > > > Wei. > >Feeling in a little over my head now, I have run valgrind and include the file with the output. As before xenconsoled crashes, but i am not really sure how to read what i am seeing from valgrind. I am not really sure if it is telling me that these errors happen as it goes along, or if it is as a result of the crash that there are lost blocks around. Valgrind was run with: valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=20 --log-file="valgrind_output.txt" --track-fds=yes ./xenconsoled --pid-file=/var/run/xenconsoled.pid If the attached file doesn''t show, could you tell where it should go? Paul _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Fajar A. Nugraha
2012-Dec-15 10:44 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On Wed, Dec 12, 2012 at 5:07 AM, Paul Harvey <jhebus@googlemail.com> wrote:> increased this to thousands : > ulimint -n > 10240By running "ulimit" manually? If yes, it''s only applied in your current session.> > To hammer this point home, i built a wee C file to allocate pty''s. Before i > changed the limit i got problems past 1024, now it work fine as root, or any > user. > > But, when i create ~350 domains: > > cat /proc/<xenconsoled>/fd | wc -l > 1024 > > only ever goes as high as 1024, and does not increase for subsequently added > domains.By default ubuntu only allows 1024 open file descriptor for any user. Changing it manually using "ulimit" command does not change the global limit. This includes root. Setting it globally is kinda pain: - edit /etc/security/limits.conf, read about "nofile", and make appropriate change - edit /etc/pam.d/common-session and common-session-noninteractive, include pam_limits.so (see settings for "su" for example) - reboot -- Fajar
Ian Campbell
2012-Dec-17 11:56 UTC
Re: 1000 Domains: Not able to access Domu via xm console from Dom0
On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote:> Program received signal SIGABRT, Aborted. > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > (gdb) bt > #0 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6 > #5 0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6 > #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 > #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166daemon/io.c:1059 in 4.1.2 is: FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds)) handle_ring_read(d); I rather suspect this is overrunning the readfds array. http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn''t show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems). It doesn''t seem likely that there will be a simple solution to this. We probably need to switch to something other than select(2). poll(2) seems to handle arbitrary numbers of file descriptors. epoll(7) would be nice (it supposedly scales better than poll) but is Linux specific. Another option might be to fork multiple worker processes (might be a good idea if xenconsole becomes a bottleneck). It seems likely (based on a quick grep) that both xenstore (both the C and ocaml variants) will suffer from the same issue. I''m not sure why we have an evtchn handle per guest, other than this comment which suggests it was simply expedient rather than a good design: /* Opening evtchn independently for each console is a bit * wasteful, but that''s how the code is structured... */ dom->xce_handle = xc_evtchn_open(NULL, 0); if (dom->xce_handle == NULL) { err = errno; goto out; } However this is just one open fd which scales with number of domains (the others are the pty related ones) so just fixing this would just buy a bit more time but not fix the underlying issue. Ian.
Ian Campbell
2012-Dec-17 11:56 UTC
Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0
On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote:> Program received signal SIGABRT, Aborted. > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > (gdb) bt > #0 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6 > #5 0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6 > #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 > #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166daemon/io.c:1059 in 4.1.2 is: FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds)) handle_ring_read(d); I rather suspect this is overrunning the readfds array. http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn''t show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems). It doesn''t seem likely that there will be a simple solution to this. We probably need to switch to something other than select(2). poll(2) seems to handle arbitrary numbers of file descriptors. epoll(7) would be nice (it supposedly scales better than poll) but is Linux specific. Another option might be to fork multiple worker processes (might be a good idea if xenconsole becomes a bottleneck). It seems likely (based on a quick grep) that both xenstore (both the C and ocaml variants) will suffer from the same issue. I''m not sure why we have an evtchn handle per guest, other than this comment which suggests it was simply expedient rather than a good design: /* Opening evtchn independently for each console is a bit * wasteful, but that''s how the code is structured... */ dom->xce_handle = xc_evtchn_open(NULL, 0); if (dom->xce_handle == NULL) { err = errno; goto out; } However this is just one open fd which scales with number of domains (the others are the pty related ones) so just fixing this would just buy a bit more time but not fix the underlying issue. Ian.
Wei Liu
2012-Dec-29 16:21 UTC
Re: [Xen-devel] 1000 Domains: Not able to access Domu via xm console from Dom0
On Mon, 2012-12-17 at 11:56 +0000, Ian Campbell wrote:> On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote: > > Program received signal SIGABRT, Aborted. > > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > > (gdb) bt > > #0 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > > #1 0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 > > #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > > #3 0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6 > > #4 0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6 > > #5 0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6 > > #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 > > #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166 > > daemon/io.c:1059 in 4.1.2 is: > FD_ISSET(xc_evtchn_fd(d->xce_handle), > &readfds)) > handle_ring_read(d); > > I rather suspect this is overrunning the readfds array. > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn''t show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems). > > It doesn''t seem likely that there will be a simple solution to this. We > probably need to switch to something other than select(2). poll(2) seems > to handle arbitrary numbers of file descriptors. epoll(7) would be nice > (it supposedly scales better than poll) but is Linux specific. Another > option might be to fork multiple worker processes (might be a good idea > if xenconsole becomes a bottleneck).libevent wraps around different event APIs and provides consistent interface across OSes. But I don''t know whether adding libevent as Xen tools dependency is a good idea.> It seems likely (based on a quick grep) that both xenstore (both the C > and ocaml variants) will suffer from the same issue. >Yes, I ran a test and hit this limit in both Xenstored and Xenconsoled.> I''m not sure why we have an evtchn handle per guest, other than this > comment which suggests it was simply expedient rather than a good > design: > /* Opening evtchn independently for each console is a bit > * wasteful, but that''s how the code is structured... */ > dom->xce_handle = xc_evtchn_open(NULL, 0); > if (dom->xce_handle == NULL) { > err = errno; > goto out; > } > However this is just one open fd which scales with number of domains > (the others are the pty related ones) so just fixing this would just buy > a bit more time but not fix the underlying issue. >Even if you work around this problem, you will still hit Xenstore limit. So the underlying issue has to be fixed. Wei.
Wei Liu
2012-Dec-29 16:21 UTC
Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0
On Mon, 2012-12-17 at 11:56 +0000, Ian Campbell wrote:> On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote: > > Program received signal SIGABRT, Aborted. > > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > > (gdb) bt > > #0 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > > #1 0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 > > #2 0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > > #3 0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6 > > #4 0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6 > > #5 0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6 > > #6 0x0000000000403ca8 in handle_io () at daemon/io.c:1059 > > #7 0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166 > > daemon/io.c:1059 in 4.1.2 is: > FD_ISSET(xc_evtchn_fd(d->xce_handle), > &readfds)) > handle_ring_read(d); > > I rather suspect this is overrunning the readfds array. > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn''t show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems). > > It doesn''t seem likely that there will be a simple solution to this. We > probably need to switch to something other than select(2). poll(2) seems > to handle arbitrary numbers of file descriptors. epoll(7) would be nice > (it supposedly scales better than poll) but is Linux specific. Another > option might be to fork multiple worker processes (might be a good idea > if xenconsole becomes a bottleneck).libevent wraps around different event APIs and provides consistent interface across OSes. But I don''t know whether adding libevent as Xen tools dependency is a good idea.> It seems likely (based on a quick grep) that both xenstore (both the C > and ocaml variants) will suffer from the same issue. >Yes, I ran a test and hit this limit in both Xenstored and Xenconsoled.> I''m not sure why we have an evtchn handle per guest, other than this > comment which suggests it was simply expedient rather than a good > design: > /* Opening evtchn independently for each console is a bit > * wasteful, but that''s how the code is structured... */ > dom->xce_handle = xc_evtchn_open(NULL, 0); > if (dom->xce_handle == NULL) { > err = errno; > goto out; > } > However this is just one open fd which scales with number of domains > (the others are the pty related ones) so just fixing this would just buy > a bit more time but not fix the underlying issue. >Even if you work around this problem, you will still hit Xenstore limit. So the underlying issue has to be fixed. Wei.
Ian Campbell
2013-Jan-02 13:15 UTC
Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0
On Sat, 2012-12-29 at 16:21 +0000, Wei Liu wrote:> On Mon, 2012-12-17 at 11:56 +0000, Ian Campbell wrote:> > I rather suspect this is overrunning the readfds array. > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn''t show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems). > > > > It doesn''t seem likely that there will be a simple solution to this. We > > probably need to switch to something other than select(2). poll(2) seems > > to handle arbitrary numbers of file descriptors. epoll(7) would be nice > > (it supposedly scales better than poll) but is Linux specific. Another > > option might be to fork multiple worker processes (might be a good idea > > if xenconsole becomes a bottleneck). > > libevent wraps around different event APIs and provides consistent > interface across OSes. But I don''t know whether adding libevent as Xen > tools dependency is a good idea.Using some reasonably widespread library to abstract away the differences between Linux and *BSD here seems like a better idea than rolling our own. I don''t know enough about it to say if libevent fits the bill or not. Based on a cursory glance it seems like libevent implements a complete event loop and requires the application to switch over to using it entirely. This may not be a bad thing but it might be preferable (and/or easier) to see if there is a library which only provides a basic simple abstraction/wrapper over the various polling mechanisms. Ian.
Ian Campbell
2013-Jan-02 13:15 UTC
Re: [Xen-devel] 1000 Domains: Not able to access Domu via xm console from Dom0
On Sat, 2012-12-29 at 16:21 +0000, Wei Liu wrote:> On Mon, 2012-12-17 at 11:56 +0000, Ian Campbell wrote:> > I rather suspect this is overrunning the readfds array. > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn''t show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems). > > > > It doesn''t seem likely that there will be a simple solution to this. We > > probably need to switch to something other than select(2). poll(2) seems > > to handle arbitrary numbers of file descriptors. epoll(7) would be nice > > (it supposedly scales better than poll) but is Linux specific. Another > > option might be to fork multiple worker processes (might be a good idea > > if xenconsole becomes a bottleneck). > > libevent wraps around different event APIs and provides consistent > interface across OSes. But I don''t know whether adding libevent as Xen > tools dependency is a good idea.Using some reasonably widespread library to abstract away the differences between Linux and *BSD here seems like a better idea than rolling our own. I don''t know enough about it to say if libevent fits the bill or not. Based on a cursory glance it seems like libevent implements a complete event loop and requires the application to switch over to using it entirely. This may not be a bad thing but it might be preferable (and/or easier) to see if there is a library which only provides a basic simple abstraction/wrapper over the various polling mechanisms. Ian.
Ian Jackson
2013-Jan-02 14:42 UTC
Re: [Xen-devel] 1000 Domains: Not able to access Domu via xm console from Dom0
Ian Campbell writes ("Re: [Xen-devel] [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0"):> Using some reasonably widespread library to abstract away the > differences between Linux and *BSD here seems like a better idea than > rolling our own. I don''t know enough about it to say if libevent fits > the bill or not.I don''t see why we wouldn''t just change the code to use poll() right away. poll is available everywhere and works in (roughly) the same way. It will fix the specific bug here. poll''s downside compared to nonportable approaches is that if you have many hundreds or thousands of fds it can be less efficient. I think we can deal with the efficiency problem later. As Ian writes, it might be better to fork instead. Ian.
Ian Jackson
2013-Jan-02 14:42 UTC
Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0
Ian Campbell writes ("Re: [Xen-devel] [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0"):> Using some reasonably widespread library to abstract away the > differences between Linux and *BSD here seems like a better idea than > rolling our own. I don''t know enough about it to say if libevent fits > the bill or not.I don''t see why we wouldn''t just change the code to use poll() right away. poll is available everywhere and works in (roughly) the same way. It will fix the specific bug here. poll''s downside compared to nonportable approaches is that if you have many hundreds or thousands of fds it can be less efficient. I think we can deal with the efficiency problem later. As Ian writes, it might be better to fork instead. Ian.
Maybe Matching Threads
- 1000 Domains: Not able to access Domu via xm console from Dom0
- [PATCH] xenconsoled: use grant references instead of map_foreign_range
- [PATCH] Switch to poll in xenconsoled's io loop.
- [PATCH v2 1/2] xenconsoled: use grant references instead of map_foreign_range
- [PATCH] xenconsoled: ignore spurious watch event