Hi all, Firstly, please include me in any replies as I am not a list subscriber. I''m trying to nail down a problem using Xen 4.2.1 & Kernel 3.7.1 (also 3.7.2). It seems at random periods of time I get the following via the syslog: Message from syslogd@xenhost at Jan 15 09:02:36 ... kernel:Disabling IRQ #16 Looking at IRQ16: [root@xenhost xen]# cat /proc/interrupts | grep 16 16: 1900000 xen-pirq-ioapic-level sata_mv I also see this in the dmesg: irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1 Call Trace: <IRQ> [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6 [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5 [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6 [<ffffffff810a5a37>] handle_irq_event+0x38/0x54 [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5 [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7 [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42 [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a [<ffffffff810169b1>] ? default_idle+0x50/0x8a [<ffffffff81016318>] ? cpu_idle+0xc0/0xff [<ffffffff8148160e>] ? rest_init+0x72/0x74 [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd [<ffffffff817455a7>] ? repair_env_string+0x58/0x58 [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4 handlers: [<ffffffffa012edd9>] mv_interrupt [sata_mv] Disabling IRQ #16 I have tried booting with the irqpoll option on the kernel boot line, but the same problem occurs. It seems disk throughput almost drops dead when this happens - as the SATA controller seems to go into some different mode of operation. It also seems like this has only happened recently - I was using builds of 3.6.x as my Xen Dom0 kernel with no signs of this problem. Has anyone else seen this in recent kernel releases? I''m not quite sure how to try and track this down. Some system specs follow: # dmidecode 2.11 SMBIOS 2.7 present. 75 structures occupying 3098 bytes. Table at 0x000EB420. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: American Megatrends Inc. Version: U1f Release Date: 06/13/2012 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 4096 kB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 kB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) ACPI is supported USB legacy is supported BIOS boot specification is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 4.6 Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Gigabyte Technology Co., Ltd. Product Name: To be filled by O.E.M. Version: To be filled by O.E.M. Serial Number: To be filled by O.E.M. UUID: 03E50250-0449-054D-4A06-F60700080009 Wake-up Type: Power Switch SKU Number: To be filled by O.E.M. Family: To be filled by O.E.M. Handle 0x0002, DMI type 2, 15 bytes Base Board Information Manufacturer: Gigabyte Technology Co., Ltd. Product Name: Z68M-D2H Version: To be filled by O.E.M. Serial Number: To be filled by O.E.M. Asset Tag: To be filled by O.E.M. Features: Board is a hosting board Board is replaceable Location In Chassis: To be filled by O.E.M. Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0 Handle 0x0003, DMI type 3, 22 bytes Chassis Information Manufacturer: Gigabyte Technology Co., Ltd. Type: Desktop Lock: Not Present Version: To Be Filled By O.E.M. Serial Number: To Be Filled By O.E.M. Asset Tag: To Be Filled By O.E.M. Boot-up State: Safe Power Supply State: Safe Thermal State: Safe Security Status: None OEM Information: 0x00000000 Height: Unspecified Number Of Power Cords: 1 Contained Elements: 0 SKU Number: To be filled by O.E.M. Handle 0x0004, DMI type 7, 19 bytes Cache Information Socket Designation: CPU Internal L1 Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Write Through Location: Internal Installed Size: 128 kB Maximum Size: 128 kB Supported SRAM Types: Unknown Installed SRAM Type: Unknown Speed: Unknown Error Correction Type: Parity System Type: Other Associativity: 16-way Set-associative Handle 0x0005, DMI type 7, 19 bytes Cache Information Socket Designation: CPU Internal L2 Configuration: Enabled, Not Socketed, Level 2 Operational Mode: Write Through Location: Internal Installed Size: 1024 kB Maximum Size: 1024 kB Supported SRAM Types: Unknown Installed SRAM Type: Unknown Speed: Unknown Error Correction Type: Multi-bit ECC System Type: Instruction Associativity: 16-way Set-associative Handle 0x0006, DMI type 7, 19 bytes Cache Information Socket Designation: CPU Internal L3 Configuration: Enabled, Not Socketed, Level 3 Operational Mode: Write Back Location: Internal Installed Size: 6144 kB Maximum Size: 6144 kB Supported SRAM Types: Unknown Installed SRAM Type: Unknown Speed: Unknown Error Correction Type: Multi-bit ECC System Type: Instruction Associativity: 48-way Set-associative ... snip a bit ... Handle 0x0020, DMI type 9, 17 bytes System Slot Information Designation: J6B2 Type: x16 PCI Express Current Usage: In Use Length: Long ID: 0 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:00:02.0 Handle 0x0021, DMI type 9, 17 bytes System Slot Information Designation: J6B1 Type: x1 PCI Express Current Usage: In Use Length: Short ID: 1 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:00:1c.0 Handle 0x0022, DMI type 9, 17 bytes System Slot Information Designation: J6D1 Type: x8 PCI Express Current Usage: In Use Length: Short ID: 2 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:00:01.0 Handle 0x0023, DMI type 9, 17 bytes System Slot Information Designation: J7B1 Type: x16 PCI Express Current Usage: In Use Length: Short ID: 3 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:00:03.0 Handle 0x0024, DMI type 9, 17 bytes System Slot Information Designation: J8B4 Type: x1 PCI Express Current Usage: In Use Length: Short ID: 4 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:00:1c.7 Handle 0x0025, DMI type 9, 17 bytes System Slot Information Designation: J8B3 Type: 32-bit PCI Current Usage: In Use Length: Short ID: 6 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:14:1e.0 ... snip a bit more .... Handle 0x0043, DMI type 4, 42 bytes Processor Information Socket Designation: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz Type: Central Processor Family: Core i7 Manufacturer: Intel ID: A7 06 02 00 FF FB EB BF Signature: Type 0, Family 6, Model 42, Stepping 7 Flags: FPU (Floating-point unit on-chip) VME (Virtual mode extension) DE (Debugging extension) PSE (Page size extension) TSC (Time stamp counter) MSR (Model specific registers) PAE (Physical address extension) MCE (Machine check exception) CX8 (CMPXCHG8 instruction supported) APIC (On-chip APIC hardware supported) SEP (Fast system call) MTRR (Memory type range registers) PGE (Page global enable) MCA (Machine check architecture) CMOV (Conditional move instruction supported) PAT (Page attribute table) PSE-36 (36-bit page size extension) CLFSH (CLFLUSH instruction supported) DS (Debug store) ACPI (ACPI supported) MMX (MMX technology supported) FXSR (FXSAVE and FXSTOR instructions supported) SSE (Streaming SIMD extensions) SSE2 (Streaming SIMD extensions 2) SS (Self-snoop) HTT (Multi-threading) TM (Thermal monitor supported) PBE (Pending break enabled) Version: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz Voltage: 1.2 V External Clock: 100 MHz Max Speed: 7000 MHz Current Speed: 3700 MHz Status: Populated, Enabled Upgrade: Other L1 Cache Handle: 0x0004 L2 Cache Handle: 0x0005 L3 Cache Handle: 0x0006 Serial Number: Not Specified Asset Tag: Fill By OEM Part Number: Fill By OEM Core Count: 4 Core Enabled: 1 Characteristics: 64-bit capable ... end # lspci 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b5) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1f.0 ISA bridge: Intel Corporation Z68 Express Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 01:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02) 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06) Disks are configured as such: # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid1 sda1[1] sdb1[0] 204788 blocks super 1.0 [2/2] [UU] md2 : active raid6 sdc[5] sde[1] sdf[4] sdd[0] 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4] [UUUU] md1 : active raid1 sdb2[0] sda2[1] 77942716 blocks super 1.1 [2/2] [UU] -- Steven Haigh Email: netwiz@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 15.01.13 at 04:27, Steven Haigh <netwiz@crc.id.au> wrote: > irq 16: nobody cared (try booting with the "irqpoll" option) > Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1 > Call Trace: > <IRQ> [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6 > [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5 > [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6 > [<ffffffff810a5a37>] handle_irq_event+0x38/0x54 > [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5 > [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7 > [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42 > [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a > [<ffffffff810169b1>] ? default_idle+0x50/0x8a > [<ffffffff81016318>] ? cpu_idle+0xc0/0xff > [<ffffffff8148160e>] ? rest_init+0x72/0x74 > [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd > [<ffffffff817455a7>] ? repair_env_string+0x58/0x58 > [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd > [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4 > handlers: > [<ffffffffa012edd9>] mv_interrupt [sata_mv] > Disabling IRQ #16 > > I have tried booting with the irqpoll option on the kernel boot line, > but the same problem occurs. > > It seems disk throughput almost drops dead when this happens - as the > SATA controller seems to go into some different mode of operation. It > also seems like this has only happened recently - I was using builds of > 3.6.x as my Xen Dom0 kernel with no signs of this problem. > > Has anyone else seen this in recent kernel releases? I''m not quite sure > how to try and track this down.First of all, you''ll want to clarify whether this problem is present _only_ when running under Xen, or also when running the same kernel without Xen underneath. This is primarily because the output you provided shows that IRQ 16 actually has a handler, just that it apparently ignores the interrupts (and that''s nothing that Xen controls). Then, if this is a Xen-only problem, you will want to provide full hypervisor and kernel (boot) logs, the hypervisor one including debug key ''i'' output, and the kernel one once with and once without Xen. Finally you''ll want to clarify whether, when updating the kernel, you also updated the hypervisor (and if so, try the know good and known bad kernels on identical hypervisors). Jan
Hi Jan, On 16/01/2013 2:23 AM, Jan Beulich wrote:>>>> On 15.01.13 at 04:27, Steven Haigh <netwiz@crc.id.au> wrote: >> irq 16: nobody cared (try booting with the "irqpoll" option) >> Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1 >> Call Trace: >> <IRQ> [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6 >> [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5 >> [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6 >> [<ffffffff810a5a37>] handle_irq_event+0x38/0x54 >> [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5 >> [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7 >> [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42 >> [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30 >> <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a >> [<ffffffff810169b1>] ? default_idle+0x50/0x8a >> [<ffffffff81016318>] ? cpu_idle+0xc0/0xff >> [<ffffffff8148160e>] ? rest_init+0x72/0x74 >> [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd >> [<ffffffff817455a7>] ? repair_env_string+0x58/0x58 >> [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd >> [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4 >> handlers: >> [<ffffffffa012edd9>] mv_interrupt [sata_mv] >> Disabling IRQ #16 >> >> I have tried booting with the irqpoll option on the kernel boot line, >> but the same problem occurs. >> >> It seems disk throughput almost drops dead when this happens - as the >> SATA controller seems to go into some different mode of operation. It >> also seems like this has only happened recently - I was using builds of >> 3.6.x as my Xen Dom0 kernel with no signs of this problem. >> >> Has anyone else seen this in recent kernel releases? I''m not quite sure >> how to try and track this down. > First of all, you''ll want to clarify whether this problem is present > _only_ when running under Xen, or also when running the same > kernel without Xen underneath. This is primarily because the > output you provided shows that IRQ 16 actually has a handler, > just that it apparently ignores the interrupts (and that''s nothing > that Xen controls).I''m not 100% sure how to do this. I haven''t been able to find a method to cause the problem to happen... It just does - and it seems random when it does happen. Part of the problem with running the system without the hypervisor in place is that I can''t replicate any kind of workload that would normally trigger the problem.> Then, if this is a Xen-only problem, you will want to provide full > hypervisor and kernel (boot) logs, the hypervisor one including > debug key ''i'' output, and the kernel one once with and once > without Xen. > > Finally you''ll want to clarify whether, when updating the kernel, > you also updated the hypervisor (and if so, try the know good > and known bad kernels on identical hypervisors).I have been running Xen 4.2.1 for a while - and used multiple kernel versions with it. Sadly, I don''t have an archive of the RPMs that I used (even though I built them!). I''ve only really noticed this happening in the last month - when I''ve been running kernel 3.7.1+ On the off chance today, I have moved the card from one 16x PCIe slot to the second one on the mainboard. This has moved the card from IRQ16 to IRQ19. As of yet, I haven''t had the problem occur - however as it is a seemingly random occurrence, there is no guarantee that the problem is solved. I''ve tried loading up the i/o by doing a resync of the RAID6 (of which, 2 drives are on the sata_mv card) as well as hammering i/o in the DomUs (rather random stuff), but still no reliable way to force the problem to occur :( I''m open to any suggestions :) -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299
>>> On 15.01.13 at 18:15, Steven Haigh <netwiz@crc.id.au> wrote: > I''m not 100% sure how to do this. I haven''t been able to find a method > to cause the problem to happen... It just does - and it seems random > when it does happen. Part of the problem with running the system without > the hypervisor in place is that I can''t replicate any kind of workload > that would normally trigger the problem.That''s pretty odd - there need to be almost 100,000 unhandled interrupts within a tenth of a second, so there _must_ be something triggering this if the device is otherwise working fine. You''re not by chance passing through to a guest any other device using the same IRQ? Jan
On 16/01/2013 8:42 PM, Jan Beulich wrote:>>>> On 15.01.13 at 18:15, Steven Haigh <netwiz@crc.id.au> wrote: >> I''m not 100% sure how to do this. I haven''t been able to find a method >> to cause the problem to happen... It just does - and it seems random >> when it does happen. Part of the problem with running the system without >> the hypervisor in place is that I can''t replicate any kind of workload >> that would normally trigger the problem. > That''s pretty odd - there need to be almost 100,000 unhandled > interrupts within a tenth of a second, so there _must_ be > something triggering this if the device is otherwise working fine. > > You''re not by chance passing through to a guest any other > device using the same IRQ?Hi Jan, I don''t pass any devices at all to any DomU''s. All guests are PV Linux systems, all EL6. The only thing each DomU has is a disk, a network interface, and 2 x vcpus. So far, I have: # uptime 20:50:40 up 1 day, 1:11, 1 user, load average: 0.36, 0.17, 0.13 As I mentioned, I moved the sata card to the second 16x PCIe slot in the mainboard - which changed the IRQ from 16 to 19. Currently I see: # grep sata_mv /proc/interrupts 19: 21243495 xen-pirq-ioapic-level sata_mv Which is interestingly more than the onboard SATA ports: # grep ahci /proc/interrupts 50: 9004117 xen-pirq-msi ahci I''m not sure if this will give any further info: # xm dmesg __ __ _ _ ____ _ _ _ __ \ \/ /___ _ __ | || | |___ \ / | / | ___| |/ /_ \ // _ \ ''_ \ | || |_ __) | | |__| | / _ \ | ''_ \ / \ __/ | | | |__ _| / __/ _| |__| || __/ | (_) | /_/\_\___|_| |_| |_|(_)_____(_)_| |_(_)___|_|\___/ (XEN) Xen version 4.2.1 (mockbuild@crc.id.au) (gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)) Wed Dec 19 01:32:40 EST 2012 (XEN) Latest ChangeSet: unavailable (XEN) Bootloader: GNU GRUB 0.97 (XEN) Command line: dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1 dom0_vcpus_pin (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 2 MBR signatures (XEN) Found 3 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009d800 (usable) (XEN) 000000000009d800 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 0000000020000000 (usable) (XEN) 0000000020000000 - 0000000020200000 (reserved) (XEN) 0000000020200000 - 0000000040000000 (usable) (XEN) 0000000040000000 - 0000000040200000 (reserved) (XEN) 0000000040200000 - 00000000dbb1b000 (usable) (XEN) 00000000dbb1b000 - 00000000dc3c7000 (reserved) (XEN) 00000000dc3c7000 - 00000000dc647000 (ACPI NVS) (XEN) 00000000dc647000 - 00000000dc64c000 (ACPI data) (XEN) 00000000dc64c000 - 00000000dc68f000 (ACPI NVS) (XEN) 00000000dc68f000 - 00000000dcdca000 (usable) (XEN) 00000000dcdca000 - 00000000dcfdd000 (reserved) (XEN) 00000000dcfdd000 - 00000000dd000000 (usable) (XEN) 00000000dd800000 - 00000000dfa00000 (reserved) (XEN) 00000000f8000000 - 00000000fc000000 (reserved) (XEN) 00000000fec00000 - 00000000fec01000 (reserved) (XEN) 00000000fed00000 - 00000000fed04000 (reserved) (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ff000000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 000000021f600000 (usable) (XEN) ACPI: RSDP 000F0490, 0024 (r2 ALASKA) (XEN) ACPI: XSDT DC629070, 0064 (r1 ALASKA A M I 1072009 AMI 10013) (XEN) ACPI: FACP DC632928, 00F4 (r4 ALASKA A M I 1072009 AMI 10013) (XEN) ACPI: DSDT DC629170, 97B8 (r2 ALASKA A M I 12 INTL 20051117) (XEN) ACPI: FACS DC645F80, 0040 (XEN) ACPI: APIC DC632A20, 0072 (r3 ALASKA A M I 1072009 AMI 10013) (XEN) ACPI: MCFG DC632A98, 003C (r1 ALASKA A M I 1072009 MSFT 97) (XEN) ACPI: HPET DC632AD8, 0038 (r1 ALASKA A M I 1072009 AMI. 5) (XEN) ACPI: SSDT DC632B10, 036D (r1 SataRe SataTabl 1000 INTL 20091112) (XEN) ACPI: SSDT DC632E80, 09AA (r1 PmRef Cpu0Ist 3000 INTL 20051117) (XEN) ACPI: SSDT DC633830, 0A92 (r1 PmRef CpuPm 3000 INTL 20051117) (XEN) ACPI: MATS DC6342C8, 0034 (r2 ALASKA A M I 2 wx2 0) (XEN) System RAM: 8116MB (8310872kB) (XEN) Domain heap initialised (XEN) ACPI: 32/64X FACS address mismatch in FADT - dc645f80/0000000000000000, using 32 (XEN) Processor #0 6:10 APIC version 21 (XEN) Processor #2 6:10 APIC version 21 (XEN) Processor #4 6:10 APIC version 21 (XEN) Processor #6 6:10 APIC version 21 (XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) Table is not found! (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 3303.320 MHz processor. (XEN) Initing memory sharing. (XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7 (XEN) I/O virtualisation disabled (XEN) Enabled directed EOI with ioapic_ack_old on! (XEN) ENABLING IO-APIC IRQs (XEN) -> Using old ACK method (XEN) Platform timer is 14.318MHz HPET (XEN) Allocated console ring of 16 KiB. (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Extended Page Tables (EPT) (XEN) - Virtual-Processor Identifiers (VPID) (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) - Unrestricted Guest (XEN) HVM: ASIDs enabled. (XEN) HVM: VMX enabled (XEN) HVM: Hardware Assisted Paging (HAP) detected (XEN) HVM: HAP page sizes: 4kB, 2MB (XEN) Brought up 4 CPUs (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1d87000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000210000000->0000000214000000 (236799 pages to be allocated) (XEN) Init. ramdisk: 000000021d2ff000->000000021f5ff800 (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff81000000->ffffffff81d87000 (XEN) Init. ramdisk: ffffffff81d87000->ffffffff84087800 (XEN) Phys-Mach map: ffffffff84088000->ffffffff84288000 (XEN) Start info: ffffffff84288000->ffffffff842884b4 (XEN) Page tables: ffffffff84289000->ffffffff842ae000 (XEN) Boot stack: ffffffff842ae000->ffffffff842af000 (XEN) TOTAL: ffffffff80000000->ffffffff84400000 (XEN) ENTRY ADDRESS: ffffffff81745210 (XEN) Dom0 has maximum 1 VCPUs (XEN) Scrubbing Free RAM: ......................................................................done. (XEN) Initial low memory virq threshold set at 0x4000 pages. (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen) (XEN) Freed 252kB init memory. (XEN) no cpu_id for acpi_id 5 (XEN) no cpu_id for acpi_id 6 (XEN) no cpu_id for acpi_id 7 (XEN) no cpu_id for acpi_id 8 -- Steven Haigh Email: netwiz@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 16.01.13 at 10:54, Steven Haigh <netwiz@crc.id.au> wrote: > So far, I have: > # uptime > 20:50:40 up 1 day, 1:11, 1 user, load average: 0.36, 0.17, 0.13 > > As I mentioned, I moved the sata card to the second 16x PCIe slot in the > mainboard - which changed the IRQ from 16 to 19. Currently I see: > # grep sata_mv /proc/interrupts > 19: 21243495 xen-pirq-ioapic-level sata_mv > > Which is interestingly more than the onboard SATA ports: > # grep ahci /proc/interrupts > 50: 9004117 xen-pirq-msi ahciWhether the former count is too high depends on the I/O amount going through each controller. Of course it is possible for there to be spikes that usually don''t reach the 99,900 cutoff point, but once in a while do. Figuring whether that''s the case would require adding a little bit more verbosity to kernel/irq/spurious.c:note_interrupt(), e.g. to warn when having reached half the threshold. Jan
On 16/01/2013 9:05 PM, Jan Beulich wrote:>>>> On 16.01.13 at 10:54, Steven Haigh <netwiz@crc.id.au> wrote: >> So far, I have: >> # uptime >> 20:50:40 up 1 day, 1:11, 1 user, load average: 0.36, 0.17, 0.13 >> >> As I mentioned, I moved the sata card to the second 16x PCIe slot in the >> mainboard - which changed the IRQ from 16 to 19. Currently I see: >> # grep sata_mv /proc/interrupts >> 19: 21243495 xen-pirq-ioapic-level sata_mv >> >> Which is interestingly more than the onboard SATA ports: >> # grep ahci /proc/interrupts >> 50: 9004117 xen-pirq-msi ahci > Whether the former count is too high depends on the I/O amount > going through each controller. Of course it is possible for there to > be spikes that usually don''t reach the 99,900 cutoff point, but > once in a while do. Figuring whether that''s the case would require > adding a little bit more verbosity to > kernel/irq/spurious.c:note_interrupt(), e.g. to warn when having > reached half the threshold.Interestingly, I just realised I have 3 of the 4 drives in this RAID6 on the sata_mv card. I did originally think I had 2 drives on the onboard SATA ports, and the other 2 on the sata_mv card. This would mean 3/4 of the IO would be going via this card - but only 1/4 on the onboard. # lsdrv PCI [ahci] 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) .scsi 0:0:0:0 ATA ST380815AS {6RAB72DZ} ..sda 74.53g [8:0] Partitioned (dos) . .sda1 200.00m [8:1] MD raid1 (1/2) (w/ sdb1) in_sync ''localhost.localdomain:0'' {9f19116a-d280-8216-cc87-af34eae68242} . ..md0 199.99m [9:0] MD v1.0 raid1 (2) clean . . . Partitioned (dos) {6578dbc0-9e07-4ccc-8eff-15f2a1da8df1} . . .Mounted as /dev/md0 @ /boot . .sda2 74.33g [8:2] MD raid1 (1/2) (w/ sdb2) in_sync ''localhost.localdomain:1'' {afb92c19-b9b1-e3ae-07af-315d738e38be} . .md1 74.33g [9:1] MD v1.1 raid1 (2) clean . . PV LVM2_member 74.33g used, 0 free {2koqPs-U1IA-9erV-ua4N-mxW1-BhRs-V3mlAH} . .VG RAID1 74.33g 0 free {HEGjco-Ptil-M5ZG-2qQR-zNo4-3cc5-b9Z3Kj} . .dm-0 9.77g [253:0] LV xenhost ext4 {d2fa50d5-1a51-4599-9b72-f38f86b8f99e} . ..Mounted as /dev/mapper/RAID1-xenhost @ / . .dm-7 64.56g [253:7] LV zeus.vm ext4 {67310780-b15c-47e4-812e-d954aa7d8e3b} .scsi 1:0:0:0 ATA ST380815AS {6QZ6L9SD} ..sdb 74.53g [8:16] Partitioned (dos) . .sdb1 200.00m [8:17] MD raid1 (0/2) (w/ sda1) in_sync ''localhost.localdomain:0'' {9f19116a-d280-8216-cc87-af34eae68242} . ..md0 199.99m [9:0] MD v1.0 raid1 (2) clean . . Partitioned (dos) {6578dbc0-9e07-4ccc-8eff-15f2a1da8df1} . .sdb2 74.33g [8:18] MD raid1 (0/2) (w/ sda2) in_sync ''localhost.localdomain:1'' {afb92c19-b9b1-e3ae-07af-315d738e38be} . .md1 74.33g [9:1] MD v1.1 raid1 (2) clean . PV LVM2_member 74.33g used, 0 free {2koqPs-U1IA-9erV-ua4N-mxW1-BhRs-V3mlAH} .scsi 2:x:x:x [Empty] .scsi 3:0:0:0 ATA ST2000VX000-9YW1 {Z1E10QQJ} ..sdc 1.82t [8:32] MD raid6 (3/4) (w/ sdd,sde,sdf) in_sync ''xenhost.lan.crc.id.au:2'' {cd8cc032-4898-fa88-3ba1-af64cf91583b} . .md2 3.64t [9:2] MD v1.2 raid6,left-sym (4) active, 128k Chunk . . PV LVM2_member 2.12t used, 1.52t free {8pyp2G-D268-fqKW-mBvf-wZbI-Qurt-aeTvOh} . .VG vg_raid6 3.64t 1.52t free {UrqTRc-AozJ-2RDf-qcZB-UdX3-tno9-3KHjjv} . .dm-6 2.00t [253:6] LV fileshare xfs {af405459-7569-4d82-82d9-ca27912316c7} . .dm-3 10.00g [253:3] LV lamp.vm ext4 {67310780-b15c-47e4-812e-d954aa7d8e3b} . .dm-2 40.00g [253:2] LV mail.vm ext4 {67310780-b15c-47e4-812e-d954aa7d8e3b} . .dm-4 20.00g [253:4] LV remotedesktop.vm Partitioned (dos) . .dm-5 2.00g [253:5] LV template.vm ext4 {67310780-b15c-47e4-812e-d954aa7d8e3b} . .dm-1 50.00g [253:1] LV tsm.vm ext4 {67310780-b15c-47e4-812e-d954aa7d8e3b} .scsi 4:x:x:x [Empty] .scsi 5:x:x:x [Empty] PCI [sata_mv] 04:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02) .scsi 6:0:0:0 ATA ST2000VX000-9YW1 {Z1E11E7R} ..sdd 1.82t [8:48] MD raid6 (0/4) (w/ sdc,sde,sdf) in_sync ''xenhost.lan.crc.id.au:2'' {cd8cc032-4898-fa88-3ba1-af64cf91583b} . .md2 3.64t [9:2] MD v1.2 raid6,left-sym (4) active, 128k Chunk . PV LVM2_member 2.12t used, 1.52t free {8pyp2G-D268-fqKW-mBvf-wZbI-Qurt-aeTvOh} .scsi 7:x:x:x [Empty] .scsi 8:0:0:0 ATA ST2000VX000-9YW1 {Z1E0MD58} ..sde 1.82t [8:64] MD raid6 (1/4) (w/ sdc,sdd,sdf) in_sync ''xenhost.lan.crc.id.au:2'' {cd8cc032-4898-fa88-3ba1-af64cf91583b} . .md2 3.64t [9:2] MD v1.2 raid6,left-sym (4) active, 128k Chunk . PV LVM2_member 2.12t used, 1.52t free {8pyp2G-D268-fqKW-mBvf-wZbI-Qurt-aeTvOh} .scsi 9:0:0:0 ATA ST2000VX000-9YW1 {Z1E17C3X} .sdf 1.82t [8:80] MD raid6 (2/4) (w/ sdc,sdd,sde) in_sync ''xenhost.lan.crc.id.au:2'' {cd8cc032-4898-fa88-3ba1-af64cf91583b} .md2 3.64t [9:2] MD v1.2 raid6,left-sym (4) active, 128k Chunk PV LVM2_member 2.12t used, 1.52t free {8pyp2G-D268-fqKW-mBvf-wZbI-Qurt-aeTvOh} I''m going to leave it as is at the moment to see if it happens again as it has been randomly over the last 3-4 weeks. I''ll try to pull any info off this time before rebooting the system - as I only recently found this problem. Hopefully, either changing the slot, or even just reseating the card may have had some effect - but I guess only time will tell. -- Steven Haigh Email: netwiz@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel