Konrad Rzeszutek Wilk
2011-May-11 00:33 UTC
[Xen-devel] irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
The reason behind it is that irqbalance parses the /proc/interrupts and whenever it hits something it can''t understand: RES: 191614137 73904910 Rescheduling interrupts It will count the number of interrupts towards the IRQ 0. That IRQ does exist when the kernel boots under baremetal: 0: 46 0 IO-APIC-edge timer but under Xen, the timer interrupts are initialized much later: 272: 41197188 0 xen-percpu-virq timer0 and the first IRQ that is used is not zero, but rather one: 1: 73037 0 0 0 0 0 xen-pirq-ioapic-edge i8042 so when irqbalance tries to account for the IRQ ''RES'' to the IRQ 0 it fails and segfaults. The attached patch fixes it for whoever else is hitting this problem. I am not sure who the upstream maintainer is for this so I am sending this patch to the different distros as well. --- irqbalance-0.56.orig/procinterrupts.c 2010-06-10 10:45:55.000000000 -0400 +++ irqbalance-0.56/procinterrupts.c 2011-05-10 20:22:06.897465003 -0400 @@ -50,7 +50,7 @@ void parse_proc_interrupts(void) int cpunr; int number; uint64_t count; - char *c, *c2; + char *c, *c2, *err; if (getline(&line, &size, file)==0) break; @@ -64,7 +64,11 @@ void parse_proc_interrupts(void) continue; *c = 0; c++; - number = strtoul(line, NULL, 10); + number = strtoul(line, &err, 10); + /* Man page says that if that happens and number == 0, then it + * failed to parse. */ + if (err == line && number == 0) + continue; count = 0; cpunr = 0; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-May-11 08:16 UTC
Re: [Xen-devel] irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
>>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > The reason behind it is that irqbalance parses the /proc/interrupts > and whenever it hits something it can''t understand: > > RES: 191614137 73904910 Rescheduling interrupts > > It will count the number of interrupts towards the IRQ 0. That IRQ does > exist > when the kernel boots under baremetal: > > 0: 46 0 IO-APIC-edge timer > > but under Xen, the timer interrupts are initialized much later: > > 272: 41197188 0 xen-percpu-virq timer0 > > and the first IRQ that is used is not zero, but rather one: > > 1: 73037 0 0 0 0 0 > xen-pirq-ioapic-edge i8042 > > so when irqbalance tries to account for the IRQ ''RES'' to the IRQ 0 > it fails and segfaults. The attached patch fixes it for whoever else is > hitting this problem.In the svn snapshot I have, I see /* lines with letters in front are special, like NMI count. Ignore */ if (!(line[0]=='' '' || (line[0]>=''0'' && line[0]<=''9''))) break; which I would think should be taking care of your problem (or I mis-read your description), and which was there already before 0.56. Or are you perhaps having the problem because you have 1000+ interrupts, thus causing even the non-numeric strings to get space padded on their left? In that case I''d rather think above check should be either improved or removed (replaced by your solution).> I am not sure who the upstream maintainer is for this so > I am sending this patch to the different distros as well.Copying Neil and Arjan. Jan> > --- irqbalance-0.56.orig/procinterrupts.c 2010-06-10 10:45:55.000000000 -0400 > +++ irqbalance-0.56/procinterrupts.c 2011-05-10 20:22:06.897465003 -0400 > @@ -50,7 +50,7 @@ void parse_proc_interrupts(void) > int cpunr; > int number; > uint64_t count; > - char *c, *c2; > + char *c, *c2, *err; > > if (getline(&line, &size, file)==0) > break; > @@ -64,7 +64,11 @@ void parse_proc_interrupts(void) > continue; > *c = 0; > c++; > - number = strtoul(line, NULL, 10); > + number = strtoul(line, &err, 10); > + /* Man page says that if that happens and number == 0, then it > + * failed to parse. */ > + if (err == line && number == 0) > + continue; > count = 0; > cpunr = 0; >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-11 13:10 UTC
Re: [Xen-devel] irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
On Wed, May 11, 2011 at 09:16:53AM +0100, Jan Beulich wrote:> >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > The reason behind it is that irqbalance parses the /proc/interrupts > > and whenever it hits something it can''t understand: > > > > RES: 191614137 73904910 Rescheduling interrupts > > > > It will count the number of interrupts towards the IRQ 0. That IRQ does > > exist > > when the kernel boots under baremetal: > > > > 0: 46 0 IO-APIC-edge timer > > > > but under Xen, the timer interrupts are initialized much later: > > > > 272: 41197188 0 xen-percpu-virq timer0 > > > > and the first IRQ that is used is not zero, but rather one: > > > > 1: 73037 0 0 0 0 0 > > xen-pirq-ioapic-edge i8042 > > > > so when irqbalance tries to account for the IRQ ''RES'' to the IRQ 0 > > it fails and segfaults. The attached patch fixes it for whoever else is > > hitting this problem. > > In the svn snapshot I have, I see > > /* lines with letters in front are special, like NMI count. Ignore */ > if (!(line[0]=='' '' || (line[0]>=''0'' && line[0]<=''9''))) > break; > > which I would think should be taking care of your problem (or > I mis-read your description), and which was there already beforeNot anymore. In kernels 2.6.37: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 .. snip. NMI: 0 0 0 0 Non-maskable interrupts LOC: 12413629 12858323 16296183 11098466 Local timer interrupts In 2.6.38 and later: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 TRM: 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 Machine check exceptions They added in a space before the name. The check you mentioned above could be augmented for this of course, as another solution for this.> 0.56. Or are you perhaps having the problem because you have > 1000+ interrupts, thus causing even the non-numeric strings to > get space padded on their left? In that case I''d rather think above > check should be either improved or removed (replaced by your > solution). > > > I am not sure who the upstream maintainer is for this so > > I am sending this patch to the different distros as well. > > Copying Neil and Arjan. > > Jan > > > > > --- irqbalance-0.56.orig/procinterrupts.c 2010-06-10 10:45:55.000000000 -0400 > > +++ irqbalance-0.56/procinterrupts.c 2011-05-10 20:22:06.897465003 -0400 > > @@ -50,7 +50,7 @@ void parse_proc_interrupts(void) > > int cpunr; > > int number; > > uint64_t count; > > - char *c, *c2; > > + char *c, *c2, *err; > > > > if (getline(&line, &size, file)==0) > > break; > > @@ -64,7 +64,11 @@ void parse_proc_interrupts(void) > > continue; > > *c = 0; > > c++; > > - number = strtoul(line, NULL, 10); > > + number = strtoul(line, &err, 10); > > + /* Man page says that if that happens and number == 0, then it > > + * failed to parse. */ > > + if (err == line && number == 0) > > + continue; > > count = 0; > > cpunr = 0; > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-May-11 13:41 UTC
Re: [Xen-devel] irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
>>> On 11.05.11 at 15:10, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Wed, May 11, 2011 at 09:16:53AM +0100, Jan Beulich wrote: >> >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >> > The reason behind it is that irqbalance parses the /proc/interrupts >> > and whenever it hits something it can''t understand: >> > >> > RES: 191614137 73904910 Rescheduling interrupts >> > >> > It will count the number of interrupts towards the IRQ 0. That IRQ does >> > exist >> > when the kernel boots under baremetal: >> > >> > 0: 46 0 IO-APIC-edge timer >> > >> > but under Xen, the timer interrupts are initialized much later: >> > >> > 272: 41197188 0 xen-percpu-virq timer0 >> > >> > and the first IRQ that is used is not zero, but rather one: >> > >> > 1: 73037 0 0 0 0 0 >> > xen-pirq-ioapic-edge i8042 >> > >> > so when irqbalance tries to account for the IRQ ''RES'' to the IRQ 0 >> > it fails and segfaults. The attached patch fixes it for whoever else is >> > hitting this problem. >> >> In the svn snapshot I have, I see >> >> /* lines with letters in front are special, like NMI count. Ignore */ >> if (!(line[0]=='' '' || (line[0]>=''0'' && line[0]<=''9''))) >> break; >> >> which I would think should be taking care of your problem (or >> I mis-read your description), and which was there already before > > Not anymore. In kernels 2.6.37: > > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > > .. snip. > NMI: 0 0 0 0 Non-maskable interrupts > LOC: 12413629 12858323 16296183 11098466 Local timer interrupts > > In 2.6.38 and later: > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > > TRM: 0 0 0 0 0 0 > Thermal event interrupts > THR: 0 0 0 0 0 0 > Threshold APIC interrupts > MCE: 0 0 0 0 0 0 > Machine check exceptions > > They added in a space before the name. The check you mentioned > above could be augmented for this of course, as another solution > for this.Not generally - this depends on your configuration. I just check on ma laptop, and there is no extra space there. It''s presumably indeed what I wrote here:>> 0.56. Or are you perhaps having the problem because you have >> 1000+ interrupts, thus causing even the non-numeric strings to >> get space padded on their left? In that case I''d rather think above >> check should be either improved or removed (replaced by your >> solution).... and this left padding had been introduced a lot earlier than .37 iirc. Jan>> > I am not sure who the upstream maintainer is for this so >> > I am sending this patch to the different distros as well. >> >> Copying Neil and Arjan. >> >> Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel