Hi, folks, I have a question about the default number of PIRQs of Domain 0 in Xen 4.x. I encountered a problem that cciss.ko, the HP Smart Array driver, freezed and had the system hanged at booting time. The server is HP Proliant DL385G5p and run a CentOS 5.6 dom0 with Xen 4.1.1. However everything goes well when with Xen 3.0.3 which CentOS had officially integrated in. We upgraded to Xen 4.x so as to make use of Remus. I debugged and guessed that an IRQ of the HP RAID controller missing but could not figure out why. At last I compiled and tried and all revisions from 3.4.3 (r19995) to 4.0.0 (r20789), using a binary search method in about 10 times, and located some changes between r20142 and r20143 were the point. The code changes were: -static unsigned int extra_dom0_irqs, extra_domU_irqs = 8; +static unsigned int extra_dom0_irqs = 256, extra_domU_irqs = 32; static void __init parse_extra_guest_irqs(const char *s) { if ( isdigit(*s) ) @@ -253,9 +253,11 @@ d->is_paused_by_controller = 1; atomic_inc(&d->pause_count); - d->nr_pirqs = (nr_irqs_gsi + - (domid ? extra_domU_irqs : - extra_dom0_irqs ?: nr_irqs_gsi)); + if ( domid ) + d->nr_pirqs = nr_irqs_gsi + extra_domU_irqs; + else + d->nr_pirqs = nr_irqs_gsi + extra_dom0_irqs; d->pirq_to_evtchn = xmalloc_array(u16, d->nr_pirqs); d->pirq_mask = xmalloc_array( unsigned long, BITS_TO_LONGS(d->nr_pirqs)); In the changes I noticed the extra_dom0_irqs, which should be 0 by default in r20142, was set to 256 in r20143, and caused default number of Dom0''s nr_pirq to exceed 256. Maybe this prevent IRQ of HP RAID controller, I don''t quite know about, though. After I set it to 32 (the same number as extra_guest_irqs) the cciss.ko worked well. Although I could set this value by "extra_guest_irqs=32,32" in boot param, there are still problem: 1. The argument for dom0 extra irqs, the one after the comma, is undocumented. 2. What is the reason of the magic number 256 for Dom0, and 32 for DomU in Xen 4.x by default? nr_irqs_gsi is only 16 on x64 arch, but the total nr_pirq would be more than 256. The magic number still exists in the newest code. This is bad hardcode and may cause very elusive fault for newbie user, maybe you can have a better solution. Cheers, Wu Shu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 16/11/11 03:40, Shu Wu wrote:> Hi, folks, I have a question about the default number of PIRQs of > Domain 0 in Xen 4.x. I encountered a problem that cciss.ko, the HP > Smart Array driver, freezed and had the system hanged at booting time. > The server is HP Proliant DL385G5p and run a CentOS 5.6 dom0 with Xen > 4.1.1. However everything goes well when with Xen 3.0.3 which CentOS > had officially integrated in. We upgraded to Xen 4.x so as to make use > of Remus. > > I debugged and guessed that an IRQ of the HP RAID > controller missing but could not figure out why. At last I compiled > and tried and all revisions from 3.4.3 (r19995) to 4.0.0 (r20789), > using a binary search method in about 10 times, and located some > changes between r20142 and r20143 were the point. The code changes were: > > -static unsigned int extra_dom0_irqs, extra_domU_irqs = 8; > +static unsigned int extra_dom0_irqs = 256, extra_domU_irqs = 32; > static void __init parse_extra_guest_irqs(const char *s) > { > if ( isdigit(*s) ) > @@ -253,9 +253,11 @@ > d->is_paused_by_controller = 1; > atomic_inc(&d->pause_count); > > - d->nr_pirqs = (nr_irqs_gsi + > - (domid ? extra_domU_irqs : > - extra_dom0_irqs ?: nr_irqs_gsi)); > + if ( domid ) > + d->nr_pirqs = nr_irqs_gsi + extra_domU_irqs; > + else > + d->nr_pirqs = nr_irqs_gsi + extra_dom0_irqs; > > d->pirq_to_evtchn = xmalloc_array(u16, d->nr_pirqs); > d->pirq_mask = xmalloc_array( > unsigned long, BITS_TO_LONGS(d->nr_pirqs)); > > > In the changes I noticed the extra_dom0_irqs, which should be 0 by > default in r20142, was set to 256 in r20143, and caused default number > of Dom0''s nr_pirq to exceed 256. Maybe this prevent IRQ of HP RAID > controller, I don''t quite know about, though. After I set it to 32 > (the same number as extra_guest_irqs) the cciss.ko worked well. > Although I could set this value by "extra_guest_irqs=32,32" in boot > param, there are still problem: > 1. The argument for dom0 extra irqs, the one after the comma, is > undocumented. > 2. What is the reason of the magic number 256 for Dom0, and 32 for > DomU in Xen 4.x by default? nr_irqs_gsi is only 16 on x64 arch, but > the total nr_pirq would be more than 256. The magic number still > exists in the newest code. This is bad hardcode and may cause very > elusive fault for newbie user, maybe you can have a better solution. >I doubt that this is relevant. dom0 uses the paravirtualized interface meaning it does not use real interrupts. To start with, could you boot on Xen-4.1.1 and use the Xen debug keys on the serial console to dump the interrupt bindings (''i''), MSI state (''M'') and PCI devices (''Q''). It would be useful if you could identify which PCI device the HPSA controller is, and depending on where your kernel crashes, /proc/interrupts would be very useful.> Cheers, > > Wu Shu > >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 16.11.11 at 04:40, Shu Wu <superwushu@gmail.com> wrote: > In the changes I noticed the extra_dom0_irqs, which should be 0 by default > in r20142, was set to 256 in r20143, and caused default number of Dom0''s > nr_pirq to exceed 256. Maybe this prevent IRQ of HP RAID controller, I > don''t quite know about, though. After I set it to 32 (the same number as > extra_guest_irqs) the cciss.ko worked well. Although I could set this value > by "extra_guest_irqs=32,32" in boot param, there are still problem:That would hint at the IRQ number (presumably an MSI one) getting stored in too narrow a field somewhere in the kernel.> 1. The argument for dom0 extra irqs, the one after the comma, is > undocumented.Feel free to submit a patch to update the respective documentation. But for your purpose you don''t even need the second value afaiu.> 2. What is the reason of the magic number 256 for Dom0, and 32 for DomU in > Xen 4.x by default?They''re not magic in any way; if they''re found to be too small for a significant portion of systems, they could get bumped (but not lowered).> nr_irqs_gsi is only 16 on x64 arch, but the totalThat you speak of one particular system. Most that I work with have larger values.> nr_pirq would be more than 256. The magic number still exists in the newest > code. This is bad hardcode and may cause very elusive fault for newbie > user, maybe you can have a better solution.The problem is that we can''t judge reasonable for everyone values here. As long as they serve a majority, we''re fine with requiring the few remaining systems to be run with a command line override. Speaking of which, one option possible after work that happened over the last couple of months would be to get rid of ->nr_pirqs altogether, using nr_irqs again instead. That would make things only worse for your case though, as you wouldn''t then have a way to override the system determined values. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 16/11/11 11:52, Jan Beulich wrote:>>>> On 16.11.11 at 04:40, Shu Wu <superwushu@gmail.com> wrote: >> In the changes I noticed the extra_dom0_irqs, which should be 0 by default >> in r20142, was set to 256 in r20143, and caused default number of Dom0''s >> nr_pirq to exceed 256. Maybe this prevent IRQ of HP RAID controller, I >> don''t quite know about, though. After I set it to 32 (the same number as >> extra_guest_irqs) the cciss.ko worked well. Although I could set this value >> by "extra_guest_irqs=32,32" in boot param, there are still problem: > That would hint at the IRQ number (presumably an MSI one) getting > stored in too narrow a field somewhere in the kernel.Is your kernel being built with per-cpu IDTs, or is it with one shared IDT? ~Andrew>> 1. The argument for dom0 extra irqs, the one after the comma, is >> undocumented. > Feel free to submit a patch to update the respective documentation. > But for your purpose you don''t even need the second value afaiu. > >> 2. What is the reason of the magic number 256 for Dom0, and 32 for DomU in >> Xen 4.x by default? > They''re not magic in any way; if they''re found to be too small for a > significant portion of systems, they could get bumped (but not > lowered). > >> nr_irqs_gsi is only 16 on x64 arch, but the total > That you speak of one particular system. Most that I work with have > larger values. > >> nr_pirq would be more than 256. The magic number still exists in the newest >> code. This is bad hardcode and may cause very elusive fault for newbie >> user, maybe you can have a better solution. > The problem is that we can''t judge reasonable for everyone values > here. As long as they serve a majority, we''re fine with requiring the > few remaining systems to be run with a command line override. > > Speaking of which, one option possible after work that happened over > the last couple of months would be to get rid of ->nr_pirqs altogether, > using nr_irqs again instead. That would make things only worse for your > case though, as you wouldn''t then have a way to override the system > determined values. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 16.11.11 at 14:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 16/11/11 11:52, Jan Beulich wrote: >>>>> On 16.11.11 at 04:40, Shu Wu <superwushu@gmail.com> wrote: >>> In the changes I noticed the extra_dom0_irqs, which should be 0 by default >>> in r20142, was set to 256 in r20143, and caused default number of Dom0''s >>> nr_pirq to exceed 256. Maybe this prevent IRQ of HP RAID controller, I >>> don''t quite know about, though. After I set it to 32 (the same number as >>> extra_guest_irqs) the cciss.ko worked well. Although I could set this value >>> by "extra_guest_irqs=32,32" in boot param, there are still problem: >> That would hint at the IRQ number (presumably an MSI one) getting >> stored in too narrow a field somewhere in the kernel. > > Is your kernel being built with per-cpu IDTs, or is it with one shared IDT?pv kernels don''t have any IDT. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel