I''ll wait for the real root cause to be discovered. -- Keir On 9/8/07 14:35, "Stefan Berger" <stefanb@us.ibm.com> wrote:> Check the signature of the alleged FADT and only parse it if it really > is the FADT and not something else. I understand that this is not the > real solution, but prevents other bad things from happening, like the > timer going backwards if a bogus PM-Timer port is picked up. > > Signed-off-by: Stefan Berger <stefanb@us.ibm.com> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Check the signature of the alleged FADT and only parse it if it really is the FADT and not something else. I understand that this is not the real solution, but prevents other bad things from happening, like the timer going backwards if a bogus PM-Timer port is picked up. Signed-off-by: Stefan Berger <stefanb@us.ibm.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser <keir@xensource.com> wrote on 08/09/2007 09:33:01 AM:> I''ll wait for the real root cause to be discovered.We have two blades of different type where we saw the problem with the time going backwards. I used one of them to find out what is going on and where also the previous xm dmesg dumps are coming. I gave that blade a BIOS update two days ago. On this one Xen parsed the DSDT as the FADT (see posted xm dmesg), but in the meantime this has corrected itself (no idea what triggered this) and now the correct table is parsed and the correct PM-Timer port is picked up - the same as Linux has found out. So the first blade has become useless for this type of debugging. The second blade is up-to-date in terms of BIOS and I even went into the BIOS setup to give it a chance to maybe correct some things from a previous update. On this one the allegded FADT''s signature is ''A_AD'' and a bogus timer port is still picked up there (0x4000 0837), while Linux (2.6.20) gets the port right (0x588) as it seems. On this second blade I now call the function parsing the fadt multiple times, and here is what happens: (XEN) System RAM: 1023MB (1047860kB) (XEN) (1) Mapped 0xfdfc0 to ff0fdfc0 (XEN) ACPI: RSDP (v000 IBM ) @ 0x000fdfc0 (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 (XEN) sdt_entry[0].pa = 0x3ffcfec0 (XEN) sdt_entry[1].pa = 0x3ffcfe00 (XEN) sdt_entry[2].pa = 0x3ffcfdc0 (XEN) sign: RSDT; name=RSDT0 (XEN) ACPI: RSDT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcff80 (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) sign: FACP; name=FADT (XEN) ACPI: FADT (v002 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfec0 (XEN) std_entry[0].id = 7,matches sign FACP (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 (XEN) sign: APIC; name=MADT (XEN) ACPI: MADT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfe00 (XEN) std_entry[1].id = 1,matches sign APIC (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 (XEN) sign: MCFG; name=MCFG< (XEN) ACPI: MCFG (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfdc0 (XEN) std_entry[2].id = 18,matches sign MCFG (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) (2) Mapped 0x3ffbe4a8 to fff9b4a8 (XEN) sign: DSDT; name=DSDT (XEN) ACPI: DSDT (v001 IBM SERBLADE 0x00001000 INTL 0x02002025) @ 0x00000000 (XEN) NUMA turned off (XEN) Faking a node at 0000000000000000-000000003ffb0000 (XEN) Xen heap: 9MB (10184kB) (XEN) Domain heap initialised: DMA width 32 bits (XEN) PAE enabled, limit: 16 GB (XEN) found SMP MP-table at 0009d540 (XEN) DMI 2.3 present. (XEN) (1) Mapped 0xf601f to ff0f601f (XEN) Using APIC driver default (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) ACPI: Invalid FADT signature A__ADR That one is bad. It has a bad signature! First call. (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) ACPI: PM-Timer IO Port: 0x588 2nd call. This one is good! POrt is also good. (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) wakeup_vec[3ffcfd8c], vec_size[20] (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) ACPI: PM-Timer IO Port: 0x588 3rd call. This one is also good! Looks like the mapping does not work correctly. Stefan> > -- Keir > > On 9/8/07 14:35, "Stefan Berger" <stefanb@us.ibm.com> wrote: > > > Check the signature of the alleged FADT and only parse it if it really > > is the FADT and not something else. I understand that this is not the > > real solution, but prevents other bad things from happening, like the > > timer going backwards if a bogus PM-Timer port is picked up. > > > > Signed-off-by: Stefan Berger <stefanb@us.ibm.com> > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Did you check the signature on all three mappings, and only the first time you mapped it was broken? Weird. -- Keir On 9/8/07 18:54, "Stefan Berger" <stefanb@us.ibm.com> wrote:> (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: Invalid FADT signature A__ADR > > That one is bad. It has a bad signature! First call. > > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: PM-Timer IO Port: 0x588 > > 2nd call. This one is good! POrt is also good. > > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80 > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: PM-Timer IO Port: 0x588 > > 3rd call. This one is also good! > > Looks like the mapping does not work correctly._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
A TLB flushing issue (perhaps not even in software, as map_pages_to_xen() appears to not have any problems in that respect)? Can you check what is at the offset into the mapped page after mapping the DSDT (which is the one place where the virtual page fff9bXXX gets associated with a different physical one; all other mappings map the same physical page), but before (re-)mapping the FADT? If that''s not the data you observe, any chance you can find out (e.g. under native Linux) where the observed data really lives, to get an understanding where else such a broken translation could come from? Jan>>> Stefan Berger <stefanb@us.ibm.com> 09.08.07 19:54 >>>Keir Fraser <keir@xensource.com> wrote on 08/09/2007 09:33:01 AM:> I''ll wait for the real root cause to be discovered.We have two blades of different type where we saw the problem with the time going backwards. I used one of them to find out what is going on and where also the previous xm dmesg dumps are coming. I gave that blade a BIOS update two days ago. On this one Xen parsed the DSDT as the FADT (see posted xm dmesg), but in the meantime this has corrected itself (no idea what triggered this) and now the correct table is parsed and the correct PM-Timer port is picked up - the same as Linux has found out. So the first blade has become useless for this type of debugging. The second blade is up-to-date in terms of BIOS and I even went into the BIOS setup to give it a chance to maybe correct some things from a previous update. On this one the allegded FADT''s signature is ''A_AD'' and a bogus timer port is still picked up there (0x4000 0837), while Linux (2.6.20) gets the port right (0x588) as it seems. On this second blade I now call the function parsing the fadt multiple times, and here is what happens: (XEN) System RAM: 1023MB (1047860kB) (XEN) (1) Mapped 0xfdfc0 to ff0fdfc0 (XEN) ACPI: RSDP (v000 IBM ) @ 0x000fdfc0 (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 (XEN) sdt_entry[0].pa = 0x3ffcfec0 (XEN) sdt_entry[1].pa = 0x3ffcfe00 (XEN) sdt_entry[2].pa = 0x3ffcfdc0 (XEN) sign: RSDT; name=RSDT0 (XEN) ACPI: RSDT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcff80 (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) sign: FACP; name=FADT (XEN) ACPI: FADT (v002 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfec0 (XEN) std_entry[0].id = 7,matches sign FACP (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 (XEN) sign: APIC; name=MADT (XEN) ACPI: MADT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfe00 (XEN) std_entry[1].id = 1,matches sign APIC (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 (XEN) sign: MCFG; name=MCFG< (XEN) ACPI: MCFG (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfdc0 (XEN) std_entry[2].id = 18,matches sign MCFG (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) (2) Mapped 0x3ffbe4a8 to fff9b4a8 (XEN) sign: DSDT; name=DSDT (XEN) ACPI: DSDT (v001 IBM SERBLADE 0x00001000 INTL 0x02002025) @ 0x00000000 (XEN) NUMA turned off (XEN) Faking a node at 0000000000000000-000000003ffb0000 (XEN) Xen heap: 9MB (10184kB) (XEN) Domain heap initialised: DMA width 32 bits (XEN) PAE enabled, limit: 16 GB (XEN) found SMP MP-table at 0009d540 (XEN) DMI 2.3 present. (XEN) (1) Mapped 0xf601f to ff0f601f (XEN) Using APIC driver default (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) ACPI: Invalid FADT signature A__ADR That one is bad. It has a bad signature! First call. (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) ACPI: PM-Timer IO Port: 0x588 2nd call. This one is good! POrt is also good. (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) wakeup_vec[3ffcfd8c], vec_size[20] (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 (XEN) ACPI: PM-Timer IO Port: 0x588 3rd call. This one is also good! Looks like the mapping does not work correctly. Stefan> > -- Keir > > On 9/8/07 14:35, "Stefan Berger" <stefanb@us.ibm.com> wrote: > > > Check the signature of the alleged FADT and only parse it if it really > > is the FADT and not something else. I understand that this is not the > > real solution, but prevents other bad things from happening, like the > > timer going backwards if a bogus PM-Timer port is picked up. > > > > Signed-off-by: Stefan Berger <stefanb@us.ibm.com> > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Again the output and the debugging changes to acpi/boot.c. There are 5 additional calls to the fadt parser. Stefan [...] (XEN) sign: RSDT; name=RSDT0 (XEN) ACPI: RSDT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcff80 (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) sign: FACP; name=FADT (XEN) ACPI: FADT (v002 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfec0 (XEN) std_entry[0].id = 7,matches sign FACP (XEN) (2) Mapped 0x3ffcfe00 to fff9be00, base = 0xfff9b000 (XEN) (2) Mapped 0x3ffcfe00 to fff9be00, base = 0xfff9b000 (XEN) sign: APIC; name=MADT (XEN) ACPI: MADT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfe00 (XEN) std_entry[1].id = 1,matches sign APIC (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0, base = 0xfff9b000 (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0, base = 0xfff9b000 (XEN) sign: MCFG; name=MCFG< (XEN) ACPI: MCFG (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ 0x3ffcfdc0 (XEN) std_entry[2].id = 18,matches sign MCFG (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) (2) Mapped 0x3ffbe4a8 to fff9b4a8, base = 0xfff9b000 (XEN) sign: DSDT; name=DSDT (XEN) ACPI: DSDT (v001 IBM SERBLADE 0x00001000 INTL 0x02002025) @ 0x00000000 (XEN) NUMA turned off (XEN) Faking a node at 0000000000000000-000000003ffb0000 (XEN) Xen heap: 9MB (10184kB) (XEN) Domain heap initialised: DMA width 32 bits (XEN) PAE enabled, limit: 16 GB (XEN) found SMP MP-table at 0009d540 (XEN) DMI 2.3 present. (XEN) (1) Mapped 0xf601f to ff0f601f (XEN) Using APIC driver default (XEN) IN acpi_parse_fadt. (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: Invalid FADT signature A__ADR (XEN) IN acpi_parse_fadt. (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: PM-Timer IO Port: 0x588 (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) wakeup_vec[3ffcfd8c], vec_size[20] (XEN) IN acpi_parse_fadt. (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: PM-Timer IO Port: 0x588 (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) wakeup_vec[3ffcfd8c], vec_size[20] (XEN) IN acpi_parse_fadt. (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: PM-Timer IO Port: 0x588 (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) wakeup_vec[3ffcfd8c], vec_size[20] diff -r 7c5c3aa858cc xen/arch/x86/acpi/boot.c --- a/xen/arch/x86/acpi/boot.c Tue Jul 31 15:09:45 2007 +0100 +++ b/xen/arch/x86/acpi/boot.c Fri Aug 10 07:34:46 2007 -0400 @@ -106,8 +106,10 @@ char *__acpi_map_table(unsigned long phy unsigned long base, offset, mapped_size; int idx; - if (phys + size < 8 * 1024 * 1024) + if (phys + size < 8 * 1024 * 1024) { + printk("(1) Mapped 0x%lx to %p\n",phys,__va(phys)); return __va(phys); + } offset = phys & (PAGE_SIZE - 1); mapped_size = PAGE_SIZE - offset; @@ -126,6 +128,7 @@ char *__acpi_map_table(unsigned long phy mapped_size += PAGE_SIZE; } + printk("(2) Mapped 0x%lx to %p, base = 0x%lx\n",phys,((char *)base + offset),base); return ((char *) base + offset); } @@ -308,8 +311,10 @@ static int __init acpi_parse_sbf(unsigne { struct acpi_table_sbf *sb; - if (!phys_addr || !size) + if (!phys_addr || !size) { + printk("SBF: Bad phys addr. or size.\n"); return -EINVAL; + } sb = (struct acpi_table_sbf *)__acpi_map_table(phys_addr, size); if (!sb) { @@ -318,6 +323,7 @@ static int __init acpi_parse_sbf(unsigne } sbf_port = sb->sbf_cmos; /* Save CMOS port */ +printk("Successfully read SBF.\n"); return 0; } @@ -467,11 +473,19 @@ static int __init acpi_parse_fadt(unsign { struct fadt_descriptor_rev2 *fadt = NULL; +printk("IN acpi_parse_fadt.\n"); fadt = (struct fadt_descriptor_rev2 *)__acpi_map_table(phys, size); if (!fadt) { printk(KERN_WARNING PREFIX "Unable to map FADT\n"); return 0; } + + if (strncmp(fadt->signature, "FACP", 4)) { + printk(KERN_ERR PREFIX "Invalid FADT signature %s\n", + fadt->signature); + return 0; + } + #ifdef CONFIG_ACPI_INTERPRETER /* initialize sci_int early for INT_SRC_OVR MADT parsing */ @@ -1002,19 +1016,24 @@ int __init acpi_boot_init(void) if (acpi_disabled && !acpi_ht) return 1; +acpi_table_parse(ACPI_FADT, acpi_parse_fadt); acpi_table_parse(ACPI_BOOT, acpi_parse_sbf); /* * set sci_int and PM timer address */ +acpi_table_parse(ACPI_FADT, acpi_parse_fadt); acpi_table_parse(ACPI_FADT, acpi_parse_fadt); /* * Process the Multiple APIC Description Table (MADT), if present */ +acpi_table_parse(ACPI_FADT, acpi_parse_fadt); acpi_process_madt(); +acpi_table_parse(ACPI_FADT, acpi_parse_fadt); acpi_table_parse(ACPI_HPET, acpi_parse_hpet); - - return 0; -} +acpi_table_parse(ACPI_FADT, acpi_parse_fadt); + + return 0; +} Keir Fraser <keir@xensource.com> wrote on 08/10/2007 02:55:50 AM:> Did you check the signature on all three mappings, and only the > first time you mapped it was broken? Weird. > > -- Keir > > > On 9/8/07 18:54, "Stefan Berger" <stefanb@us.ibm.com> wrote:> (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: Invalid FADT signature A__ADR > > That one is bad. It has a bad signature! First call. > > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: PM-Timer IO Port: 0x588 > > 2nd call. This one is good! POrt is also good. > > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80 > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: PM-Timer IO Port: 0x588 > > 3rd call. This one is also good! > > Looks like the mapping does not work correctly._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/8/07 12:36, "Stefan Berger" <stefanb@us.ibm.com> wrote:> (XEN) (1) Mapped 0xf601f to ff0f601f > (XEN) Using APIC driver default > (XEN) IN acpi_parse_fadt. > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 > (XEN) ACPI: Invalid FADT signature A__ADRThis A__ADR¹ looks remarkably like the middle of another ACPI table. :-) Perhaps you could scan all physical memory in the range 3ffcd000-3ffd0000, either via /dev/mem in native Linux, or hack some code to map-and-check within Xen, or whatever, and look for that string? It¹d be interesting to know: a) does it appear at all? b) is it at the same offset in a page as we expect the fadt signature to appear (ie., offset 0xec0)? c) is it in a page we mapped previously via __acpi_map_table()? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Looking at the signature of the structure BEFORE the mapping happens (with hard coded address 0xfff9bec0) shows the same signature before and after the mapping - at least on first try. Hm, does the mapping code maybe not do the mapping again if it thinks that the memory has already been mapped, but the code for that testing has a bug? Otherwise a caching problem? Stefan [...] (XEN) PAE enabled, limit: 16 GB (XEN) found SMP MP-table at 0009d540 (XEN) DMI 2.3 present. (XEN) (1) Mapped 0xf601f to ff0f601f (XEN) Using APIC driver default (XEN) IN acpi_parse_fadt. (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: Invalid FADT signature A__ADR (XEN) IN acpi_parse_fadt. (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: PM-Timer IO Port: 0x588 (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) wakeup_vec[3ffcfd8c], vec_size[20] (XEN) IN acpi_parse_fadt. (XEN) Signature of acpi str. @ fff9bec0 before mapping: FACP? (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: PM-Timer IO Port: 0x588 (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) wakeup_vec[3ffcfd8c], vec_size[20] (XEN) IN acpi_parse_fadt. "Jan Beulich" <jbeulich@novell.com> wrote on 08/10/2007 04:51:57 AM:> A TLB flushing issue (perhaps not even in software, asmap_pages_to_xen()> appears to not have any problems in that respect)? Can you check what > is at the offset into the mapped page after mapping the DSDT (which isthe> one place where the virtual page fff9bXXX gets associated with adifferent> physical one; all other mappings map the same physical page), but before > (re-)mapping the FADT? If that''s not the data you observe, any chance > you can find out (e.g. under native Linux) where the observed datareally> lives, to get an understanding where else such a broken translationcould> come from? > > Jan > > >>> Stefan Berger <stefanb@us.ibm.com> 09.08.07 19:54 >>> > Keir Fraser <keir@xensource.com> wrote on 08/09/2007 09:33:01 AM: > > > I''ll wait for the real root cause to be discovered. > > We have two blades of different type where we saw the problem with the > time going backwards. > > I used one of them to find out what is going on and where also the > previous xm dmesg dumps are coming. I gave that blade a BIOS update two > days ago. On this one Xen parsed the DSDT as the FADT (see posted xm > dmesg), but in the meantime this has corrected itself (no idea what > triggered this) and now the correct table is parsed and the correct > PM-Timer port is picked up - the same as Linux has found out. So thefirst> blade has become useless for this type of debugging. > > The second blade is up-to-date in terms of BIOS and I even went into the> BIOS setup to give it a chance to maybe correct some things from a > previous update. On this one the allegded FADT''s signature is ''A_AD'' anda> bogus timer port is still picked up there (0x4000 0837), while Linux > (2.6.20) gets the port right (0x588) as it seems. > > > On this second blade I now call the function parsing the fadt multiple > times, and here is what happens: > > > (XEN) System RAM: 1023MB (1047860kB) > (XEN) (1) Mapped 0xfdfc0 to ff0fdfc0 > (XEN) ACPI: RSDP (v000 IBM ) @ > 0x000fdfc0 > (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 > (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 > (XEN) sdt_entry[0].pa = 0x3ffcfec0 > (XEN) sdt_entry[1].pa = 0x3ffcfe00 > (XEN) sdt_entry[2].pa = 0x3ffcfdc0 > (XEN) sign: RSDT; name=RSDT0 > (XEN) ACPI: RSDT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ > 0x3ffcff80 > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) sign: FACP; name=FADT > (XEN) ACPI: FADT (v002 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ > 0x3ffcfec0 > (XEN) std_entry[0].id = 7,matches sign FACP > (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 > (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 > (XEN) sign: APIC; name=MADT > (XEN) ACPI: MADT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ > 0x3ffcfe00 > (XEN) std_entry[1].id = 1,matches sign APIC > (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 > (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 > (XEN) sign: MCFG; name=MCFG< > (XEN) ACPI: MCFG (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ > 0x3ffcfdc0 > (XEN) std_entry[2].id = 18,matches sign MCFG > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) (2) Mapped 0x3ffbe4a8 to fff9b4a8 > (XEN) sign: DSDT; name=DSDT > (XEN) ACPI: DSDT (v001 IBM SERBLADE 0x00001000 INTL 0x02002025) @ > 0x00000000 > (XEN) NUMA turned off > (XEN) Faking a node at 0000000000000000-000000003ffb0000 > (XEN) Xen heap: 9MB (10184kB) > (XEN) Domain heap initialised: DMA width 32 bits > (XEN) PAE enabled, limit: 16 GB > (XEN) found SMP MP-table at 0009d540 > (XEN) DMI 2.3 present. > (XEN) (1) Mapped 0xf601f to ff0f601f > (XEN) Using APIC driver default > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: Invalid FADT signature A__ADR > > That one is bad. It has a bad signature! First call. > > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: PM-Timer IO Port: 0x588 > > 2nd call. This one is good! POrt is also good. > > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80 > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > (XEN) ACPI: PM-Timer IO Port: 0x588 > > 3rd call. This one is also good! > > Looks like the mapping does not work correctly. > > Stefan > > > > > -- Keir > > > > On 9/8/07 14:35, "Stefan Berger" <stefanb@us.ibm.com> wrote: > > > > > Check the signature of the alleged FADT and only parse it if itreally> > > is the FADT and not something else. I understand that this is notthe> > > real solution, but prevents other bad things from happening, likethe> > > timer going backwards if a bogus PM-Timer port is picked up. > > > > > > Signed-off-by: Stefan Berger <stefanb@us.ibm.com> > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Not by my reading of the code, but your testing shows differently ;-) Have you tried adding tracing to map_pages_to_xen()? That’s the guts of the remapping code. -- Keir On 10/8/07 14:58, "Stefan Berger" <stefanb@us.ibm.com> wrote:> > Looking at the signature of the structure BEFORE the mapping happens (with > hard coded address 0xfff9bec0) shows the same signature before and after the > mapping - at least on first try. > Hm, does the mapping code maybe not do the mapping again if it thinks that the > memory has already been mapped, but the code for that testing has a bug? > Otherwise a caching problem? > > Stefan > > [...] > (XEN) PAE enabled, limit: 16 GB > (XEN) found SMP MP-table at 0009d540 > (XEN) DMI 2.3 present. > (XEN) (1) Mapped 0xf601f to ff0f601f > (XEN) Using APIC driver default > > > (XEN) IN acpi_parse_fadt. > (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 > (XEN) ACPI: Invalid FADT signature A__ADR > > > (XEN) IN acpi_parse_fadt. > (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 > (XEN) ACPI: PM-Timer IO Port: 0x588 > > > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] > (XEN) IN acpi_parse_fadt. > (XEN) Signature of acpi str. @ fff9bec0 before mapping: FACP„ > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 > (XEN) ACPI: PM-Timer IO Port: 0x588 > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] > (XEN) IN acpi_parse_fadt. > > > "Jan Beulich" <jbeulich@novell.com> wrote on 08/10/2007 04:51:57 AM: > >> > A TLB flushing issue (perhaps not even in software, as map_pages_to_xen() >> > appears to not have any problems in that respect)? Can you check what >> > is at the offset into the mapped page after mapping the DSDT (which is the >> > one place where the virtual page fff9bXXX gets associated with a different >> > physical one; all other mappings map the same physical page), but before >> > (re-)mapping the FADT? If that''s not the data you observe, any chance >> > you can find out (e.g. under native Linux) where the observed data really >> > lives, to get an understanding where else such a broken translation could >> > come from? >> > >> > Jan >> > >>>>> > >>> Stefan Berger <stefanb@us.ibm.com> 09.08.07 19:54 >>> >> > Keir Fraser <keir@xensource.com> wrote on 08/09/2007 09:33:01 AM: >> > >>> > > I''ll wait for the real root cause to be discovered. >> > >> > We have two blades of different type where we saw the problem with the >> > time going backwards. >> > >> > I used one of them to find out what is going on and where also the >> > previous xm dmesg dumps are coming. I gave that blade a BIOS update two >> > days ago. On this one Xen parsed the DSDT as the FADT (see posted xm >> > dmesg), but in the meantime this has corrected itself (no idea what >> > triggered this) and now the correct table is parsed and the correct >> > PM-Timer port is picked up - the same as Linux has found out. So the first >> > blade has become useless for this type of debugging. >> > >> > The second blade is up-to-date in terms of BIOS and I even went into the >> > BIOS setup to give it a chance to maybe correct some things from a >> > previous update. On this one the allegded FADT''s signature is ''A_AD'' and a >> > bogus timer port is still picked up there (0x4000 0837), while Linux >> > (2.6.20) gets the port right (0x588) as it seems. >> > >> > >> > On this second blade I now call the function parsing the fadt multiple >> > times, and here is what happens: >> > >> > >> > (XEN) System RAM: 1023MB (1047860kB) >> > (XEN) (1) Mapped 0xfdfc0 to ff0fdfc0 >> > (XEN) ACPI: RSDP (v000 IBM ) @ >> > 0x000fdfc0 >> > (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 >> > (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 >> > (XEN) sdt_entry[0].pa = 0x3ffcfec0 >> > (XEN) sdt_entry[1].pa = 0x3ffcfe00 >> > (XEN) sdt_entry[2].pa = 0x3ffcfdc0 >> > (XEN) sign: RSDT; name=RSDT0 >> > (XEN) ACPI: RSDT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ >> > 0x3ffcff80 >> > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 >> > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 >> > (XEN) sign: FACP; name=FADT >> > (XEN) ACPI: FADT (v002 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ >> > 0x3ffcfec0 >> > (XEN) std_entry[0].id = 7,matches sign FACP >> > (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 >> > (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 >> > (XEN) sign: APIC; name=MADT >> > (XEN) ACPI: MADT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ >> > 0x3ffcfe00 >> > (XEN) std_entry[1].id = 1,matches sign APIC >> > (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 >> > (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 >> > (XEN) sign: MCFG; name=MCFG< >> > (XEN) ACPI: MCFG (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ >> > 0x3ffcfdc0 >> > (XEN) std_entry[2].id = 18,matches sign MCFG >> > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 >> > (XEN) (2) Mapped 0x3ffbe4a8 to fff9b4a8 >> > (XEN) sign: DSDT; name=DSDT >> > (XEN) ACPI: DSDT (v001 IBM SERBLADE 0x00001000 INTL 0x02002025) @ >> > 0x00000000 >> > (XEN) NUMA turned off >> > (XEN) Faking a node at 0000000000000000-000000003ffb0000 >> > (XEN) Xen heap: 9MB (10184kB) >> > (XEN) Domain heap initialised: DMA width 32 bits >> > (XEN) PAE enabled, limit: 16 GB >> > (XEN) found SMP MP-table at 0009d540 >> > (XEN) DMI 2.3 present. >> > (XEN) (1) Mapped 0xf601f to ff0f601f >> > (XEN) Using APIC driver default >> > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 >> > (XEN) ACPI: Invalid FADT signature A__ADR >> > >> > That one is bad. It has a bad signature! First call. >> > >> > >> > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 >> > (XEN) ACPI: PM-Timer IO Port: 0x588 >> > >> > 2nd call. This one is good! POrt is also good. >> > >> > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80 >> > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] >> > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] >> > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 >> > (XEN) ACPI: PM-Timer IO Port: 0x588 >> > >> > 3rd call. This one is also good! >> > >> > Looks like the mapping does not work correctly. >> > >> > Stefan >> > >>> > > >>> > > -- Keir >>> > > >>> > > On 9/8/07 14:35, "Stefan Berger" <stefanb@us.ibm.com> wrote: >>> > > >>>> > > > Check the signature of the alleged FADT and only parse it if it >>>>really>>>> > > > is the FADT and not something else. I understand that this is not the >>>> > > > real solution, but prevents other bad things from happening, like the >>>> > > > timer going backwards if a bogus PM-Timer port is picked up. >>>> > > > >>>> > > > Signed-off-by: Stefan Berger <stefanb@us.ibm.com> >>>> > > > >>>> > > > _______________________________________________ >>>> > > > Xen-devel mailing list >>>> > > > Xen-devel@lists.xensource.com >>>> > > > http://lists.xensource.com/xen-devel >>> > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser <keir@xensource.com> wrote on 08/10/2007 11:09:43 AM:> Not by my reading of the code, but your testing shows differently ;-) > > Have you tried adding tracing to map_pages_to_xen()? That?s the guts > of the remapping code.Now I did... diff -r 7c5c3aa858cc xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c Tue Jul 31 15:09:45 2007 +0100 +++ b/xen/arch/x86/mm.c Fri Aug 10 11:21:49 2007 -0400 @@ -3539,6 +3539,7 @@ int map_pages_to_xen( nr_mfns -= 1UL; } } + local_flush_tlb_pge(); return 0; } This might be coarse but it does the trick to locate the problem. (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) (2) Mapped 0x3ffbe4a8 to fff9b4a8, base = 0xfff9b000 (XEN) sign: DSDT; name=DSDT (XEN) ACPI: DSDT (v001 IBM SERBLADE 0x00001000 INTL 0x02002025) @ 0x00000000 (XEN) NUMA turned off (XEN) Faking a node at 0000000000000000-000000003ffb0000 (XEN) Xen heap: 9MB (10184kB) (XEN) Domain heap initialised: DMA width 32 bits (XEN) PAE enabled, limit: 16 GB (XEN) found SMP MP-table at 0009d540 (XEN) DMI 2.3 present. (XEN) (1) Mapped 0xf601f to ff0f601f (XEN) Using APIC driver default (XEN) IN acpi_parse_fadt. (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: PM-Timer IO Port: 0x588 success on first attempt :-) (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) wakeup_vec[3ffcfd8c], vec_size[20] (XEN) IN acpi_parse_fadt. (XEN) Signature of acpi str. @ fff9bec0 before mapping: FACP? (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: PM-Timer IO Port: 0x588 Stefan> > -- Keir > > On 10/8/07 14:58, "Stefan Berger" <stefanb@us.ibm.com> wrote:> > Looking at the signature of the structure BEFORE the mapping happens > (with hard coded address 0xfff9bec0) shows the same signature > before and after the mapping - at least on first try. > Hm, does the mapping code maybe not do the mapping again if it > thinks that the memory has already been mapped, but the code for > that testing has a bug? Otherwise a caching problem? > > Stefan > > [...] > (XEN) PAE enabled, limit: 16 GB > (XEN) found SMP MP-table at 0009d540 > (XEN) DMI 2.3 present. > (XEN) (1) Mapped 0xf601f to ff0f601f > (XEN) Using APIC driver default > > > (XEN) IN acpi_parse_fadt. > (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 > (XEN) ACPI: Invalid FADT signature A__ADR > > > (XEN) IN acpi_parse_fadt. > (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 > (XEN) ACPI: PM-Timer IO Port: 0x588 > > > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] > (XEN) IN acpi_parse_fadt. > (XEN) Signature of acpi str. @ fff9bec0 before mapping: FACP? > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 > (XEN) ACPI: PM-Timer IO Port: 0x588 > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80, base = 0xfff9b000 > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] > (XEN) IN acpi_parse_fadt. > > > "Jan Beulich" <jbeulich@novell.com> wrote on 08/10/2007 04:51:57 AM: > > > A TLB flushing issue (perhaps not even in software, asmap_pages_to_xen()> > appears to not have any problems in that respect)? Can you check what > > is at the offset into the mapped page after mapping the DSDT (which isthe> > one place where the virtual page fff9bXXX gets associated with adifferent> > physical one; all other mappings map the same physical page), butbefore> > (re-)mapping the FADT? If that''s not the data you observe, any chance > > you can find out (e.g. under native Linux) where the observed datareally> > lives, to get an understanding where else such a broken translationcould> > come from? > > > > Jan > > > > >>> Stefan Berger <stefanb@us.ibm.com> 09.08.07 19:54 >>> > > Keir Fraser <keir@xensource.com> wrote on 08/09/2007 09:33:01 AM: > > > > > I''ll wait for the real root cause to be discovered. > > > > We have two blades of different type where we saw the problem with the> > time going backwards. > > > > I used one of them to find out what is going on and where also the > > previous xm dmesg dumps are coming. I gave that blade a BIOS updatetwo> > days ago. On this one Xen parsed the DSDT as the FADT (see posted xm > > dmesg), but in the meantime this has corrected itself (no idea what > > triggered this) and now the correct table is parsed and the correct > > PM-Timer port is picked up - the same as Linux has found out. So thefirst> > blade has become useless for this type of debugging. > > > > The second blade is up-to-date in terms of BIOS and I even went intothe> > BIOS setup to give it a chance to maybe correct some things from a > > previous update. On this one the allegded FADT''s signature is ''A_AD''and a> > bogus timer port is still picked up there (0x4000 0837), while Linux > > (2.6.20) gets the port right (0x588) as it seems. > > > > > > On this second blade I now call the function parsing the fadt multiple> > times, and here is what happens: > > > > > > (XEN) System RAM: 1023MB (1047860kB) > > (XEN) (1) Mapped 0xfdfc0 to ff0fdfc0 > > (XEN) ACPI: RSDP (v000 IBM ) @ > > 0x000fdfc0 > > (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 > > (XEN) (2) Mapped 0x3ffcff80 to fff9bf80 > > (XEN) sdt_entry[0].pa = 0x3ffcfec0 > > (XEN) sdt_entry[1].pa = 0x3ffcfe00 > > (XEN) sdt_entry[2].pa = 0x3ffcfdc0 > > (XEN) sign: RSDT; name=RSDT0 > > (XEN) ACPI: RSDT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ > > 0x3ffcff80 > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > > (XEN) sign: FACP; name=FADT > > (XEN) ACPI: FADT (v002 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ > > 0x3ffcfec0 > > (XEN) std_entry[0].id = 7,matches sign FACP > > (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 > > (XEN) (2) Mapped 0x3ffcfe00 to fff9be00 > > (XEN) sign: APIC; name=MADT > > (XEN) ACPI: MADT (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ > > 0x3ffcfe00 > > (XEN) std_entry[1].id = 1,matches sign APIC > > (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 > > (XEN) (2) Mapped 0x3ffcfdc0 to fff9bdc0 > > (XEN) sign: MCFG; name=MCFG< > > (XEN) ACPI: MCFG (v001 IBM SERBLADE 0x00001000 IBM 0x45444f43) @ > > 0x3ffcfdc0 > > (XEN) std_entry[2].id = 18,matches sign MCFG > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > > (XEN) (2) Mapped 0x3ffbe4a8 to fff9b4a8 > > (XEN) sign: DSDT; name=DSDT > > (XEN) ACPI: DSDT (v001 IBM SERBLADE 0x00001000 INTL 0x02002025) @ > > 0x00000000 > > (XEN) NUMA turned off > > (XEN) Faking a node at 0000000000000000-000000003ffb0000 > > (XEN) Xen heap: 9MB (10184kB) > > (XEN) Domain heap initialised: DMA width 32 bits > > (XEN) PAE enabled, limit: 16 GB > > (XEN) found SMP MP-table at 0009d540 > > (XEN) DMI 2.3 present. > > (XEN) (1) Mapped 0xf601f to ff0f601f > > (XEN) Using APIC driver default > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > > (XEN) ACPI: Invalid FADT signature A__ADR > > > > That one is bad. It has a bad signature! First call. > > > > > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > > (XEN) ACPI: PM-Timer IO Port: 0x588 > > > > 2nd call. This one is good! POrt is also good. > > > > (XEN) (2) Mapped 0x3ffcfd80 to fff9bd80 > > (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] > > (XEN) wakeup_vec[3ffcfd8c], vec_size[20] > > (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0 > > (XEN) ACPI: PM-Timer IO Port: 0x588 > > > > 3rd call. This one is also good! > > > > Looks like the mapping does not work correctly. > > > > Stefan > > > > > > > > -- Keir > > > > > > On 9/8/07 14:35, "Stefan Berger" <stefanb@us.ibm.com> wrote: > > > > > > > Check the signature of the alleged FADT and only parse it if itreally> > > > is the FADT and not something else. I understand that this is notthe> > > > real solution, but prevents other bad things from happening, likethe> > > > timer going backwards if a bogus PM-Timer port is picked up. > > > > > > > > Signed-off-by: Stefan Berger <stefanb@us.ibm.com> > > > > > > > > _______________________________________________ > > > > Xen-devel mailing list > > > > Xen-devel@lists.xensource.com > > > > http://lists.xensource.com/xen-devel > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
So the bug could be either that the read is getting hoisted above the pagetable update (unlikely) or that there is a stale TLB entry (more likely). You could discount the first possibility by replacing your local_flush_tlb_pge() with mb() at the end of map_pages_to_xen(). If that does not fix the bug then the problem is not that the read is getting hoisted. The TLB handling looks correct though if the modified PTE was not previously empty then we execute an INVLPG on that virtual address. Might be worth adding some tracing around there to see if the code thinks the PTE was previously present, and hence whether the INVLPG actually gets executed? -- Keir On 10/8/07 16:35, "Stefan Berger" <stefanb@us.ibm.com> wrote:> Keir Fraser <keir@xensource.com> wrote on 08/10/2007 11:09:43 AM: > >> > Not by my reading of the code, but your testing shows differently ;-) >> > >> > Have you tried adding tracing to map_pages_to_xen()? That¹s the guts >> > of the remapping code. > > Now I did..._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xen-devel-bounces@lists.xensource.com wrote on 08/10/2007 11:58:21 AM:> So the bug could be either that the read is getting hoisted above > the pagetable update (unlikely) or that there is a stale TLB entry > (more likely). You could discount the first possibility by replacing > your local_flush_tlb_pge() with mb() at the end of > map_pages_to_xen(). If that does not fix the bug then the problem is > not that the read is getting hoisted. > > The TLB handling looks correct though ? if the modified PTE was not > previously empty then we execute an INVLPG on that virtual address. > Might be worth adding some tracing around there to see if the code > thinks the PTE was previously present, and hence whether the INVLPG > actually gets executed?local_flush_tlb_one() does NOT get executed the first time, but upon the second attempt. The mb() alone did NOT help. Stefan> > -- Keir > > On 10/8/07 16:35, "Stefan Berger" <stefanb@us.ibm.com> wrote:> Keir Fraser <keir@xensource.com> wrote on 08/10/2007 11:09:43 AM: > > > Not by my reading of the code, but your testing shows differently ;-) > > > > Have you tried adding tracing to map_pages_to_xen()? That?s the guts > > of the remapping code. > > Now I did... > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
You mean that local_flush_tlb_one() is NOT executed the first time we try to map the FADT? That¹s obviously bogus, since we have mapped other ACPI tables in that fixmap entry earlier during boot, and this is evidenced by the fact that you can print out the current contents of that virtual address before calling __acpi_map_table() and you do not fault (which you would if the PTE did not have _PAGE_PRESENT set). So, now the investigation moves on to: WHY does map_pages_to_xen() think that the PTE was not present, when it quite obviously was?? I think we¹re getting somewhere, albeit rather slowly :-) -- Keir On 10/8/07 17:11, "Stefan Berger" <stefanb@us.ibm.com> wrote:>> > The TLB handling looks correct though if the modified PTE was not >> > previously empty then we execute an INVLPG on that virtual address. >> > Might be worth adding some tracing around there to see if the code >> > thinks the PTE was previously present, and hence whether the INVLPG >> > actually gets executed? > > local_flush_tlb_one() does NOT get executed the first time, but upon the > second attempt. > The mb() alone did NOT help._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xen-devel-bounces@lists.xensource.com wrote on 08/10/2007 12:15:33 PM:> You mean that local_flush_tlb_one() is NOT executed the first time > we try to map the FADT? That?s obviously bogus, since we have mapped > other ACPI tables in that fixmap entry earlier during boot, and this > is evidenced by the fact that you can print out the current contents > of that virtual address before calling __acpi_map_table() and you do > not fault (which you would if the PTE did not have _PAGE_PRESENT > set). So, now the investigation moves on to: WHY does > map_pages_to_xen() think that the PTE was not present, when it quite > obviously was??(XEN) local_flush_tlb_one(0xfff9b000) (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) map_pages_to_xen : 3533 (XEN) local_flush_tlb_one(0xfff9b000) (*) (XEN) (2) Mapped 0x3ffbe4a8 to fff9b4a8, base = 0xfff9b000 (XEN) sign: DSDT; name=DSDT (XEN) ACPI: DSDT (v001 IBM SERBLADE 0x00001000 INTL 0x02002025) @ 0x00000000 (XEN) NUMA turned off (XEN) Faking a node at 0000000000000000-000000003ffb0000 (XEN) Xen heap: 9MB (10200kB) (XEN) Domain heap initialised: DMA width 32 bits (XEN) PAE enabled, limit: 16 GB (XEN) found SMP MP-table at 0009d540 (XEN) DMI 2.3 present. (XEN) (1) Mapped 0xf601f to ff0f601f (XEN) Using APIC driver default (XEN) IN acpi_parse_fadt. (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR (XEN) map_pages_to_xen : 3533 (that''s the line number) (XEN) 0xfff9b000 was NOT present. Something between (*) and here seems to trash this presence flag. paging_init() and many others lie in between the upper call and this one here. Could be a side effect of this? Maybe that tlb flush at the right place in one of these functions would solve the problem? (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: Invalid FADT signature A__ADR (XEN) IN acpi_parse_fadt. (XEN) Signature of acpi str. @ fff9bec0 before mapping: A__ADR (XEN) map_pages_to_xen : 3533 (XEN) local_flush_tlb_one(0xfff9b000) Assumed to be present here... (XEN) (2) Mapped 0x3ffcfec0 to fff9bec0, base = 0xfff9b000 (XEN) ACPI: PM-Timer IO Port: 0x588 (XEN) map_pages_to_xen : 3533> > I think we?re getting somewhere, albeit rather slowly :-)Yes, slow apprentice and now he''s has to take off. :-) Cheers! Stefan> > -- Keir > > On 10/8/07 17:11, "Stefan Berger" <stefanb@us.ibm.com> wrote:> > The TLB handling looks correct though ? if the modified PTE was not > > previously empty then we execute an INVLPG on that virtual address. > > Might be worth adding some tracing around there to see if the code > > thinks the PTE was previously present, and hence whether the INVLPG > > actually gets executed? > > local_flush_tlb_one() does NOT get executed the first time, but upon > the second attempt. > The mb() alone did NOT help. > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/8/07 18:00, "Stefan Berger" <stefanb@us.ibm.com> wrote:> (XEN) map_pages_to_xen : 3533 > (that''s the line number) > (XEN) 0xfff9b000 was NOT present. > > Something between (*) and here seems to trash this presence flag. > paging_init() and many others lie in between the upper call and this one here. > Could be a side effect of this? Maybe that tlb flush at the right place in one > of these functions would solve the problem?Yes, this now looks likely and that¹s rather scary. We¹ll go after this next week. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
A good debugging approach will be to write a function that walks the pagetables for that virtual address and prints the PTE that maps it. Scatter calls to this function between acpi_boot_table_init() and acpi_boot_init() and hence narrow down exactly where the PTE is getting zapped. -- Keir On 10/8/07 19:21, "Keir Fraser" <keir@xensource.com> wrote:> On 10/8/07 18:00, "Stefan Berger" <stefanb@us.ibm.com> wrote: > >> (XEN) map_pages_to_xen : 3533 >> (that''s the line number) >> (XEN) 0xfff9b000 was NOT present. >> >> Something between (*) and here seems to trash this presence flag. >> paging_init() and many others lie in between the upper call and this one >> here. Could be a side effect of this? Maybe that tlb flush at the right place >> in one of these functions would solve the problem? > > Yes, this now looks likely and that¹s rather scary. We¹ll go after this next > week._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xen-devel-bounces@lists.xensource.com wrote on 08/10/2007 09:15:57 PM:> A good debugging approach will be to write a function that walks the > pagetables for that virtual address and prints the PTE that maps it. > Scatter calls to this function between acpi_boot_table_init() and > acpi_boot_init() and hence narrow down exactly where the PTE is > getting zapped.What is happening is that the pl1e pointer used for mapping the ACPI table entry changes between the calls before paging_init() and after. The l1_pgentry_t that is used before paging_init() correctly shows that the page is present whereas the one used after indicates that the page is not present. Then when the ACPI table is mapped after paging_init() the tlb is not flushed and wrong information is read. Stefan> > -- Keir > > On 10/8/07 19:21, "Keir Fraser" <keir@xensource.com> wrote:> On 10/8/07 18:00, "Stefan Berger" <stefanb@us.ibm.com> wrote:> (XEN) map_pages_to_xen : 3533 > (that''s the line number) > (XEN) 0xfff9b000 was NOT present. > > Something between (*) and here seems to trash this presence flag. > paging_init() and many others lie in between the upper call and this > one here. Could be a side effect of this? Maybe that tlb flush at > the right place in one of these functions would solve the problem? > > Yes, this now looks likely and that?s rather scary. We?ll go after > this next week. > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yes, this seems to make things clear: paging_init() (re-)creates the page directory for the ioremap area, which was partially established already by set_fixmap()/ map_pages_to_xen(). While adding a check there seems trivial I wonder what the purpose of this initialization is, given that there''s no (real) ioremap anyway (so it would seem to me that the code there could as well be removed). Jan>>> Stefan Berger <stefanb@us.ibm.com> 24.08.07 08:19 >>>xen-devel-bounces@lists.xensource.com wrote on 08/10/2007 09:15:57 PM:> A good debugging approach will be to write a function that walks the > pagetables for that virtual address and prints the PTE that maps it. > Scatter calls to this function between acpi_boot_table_init() and > acpi_boot_init() and hence narrow down exactly where the PTE is > getting zapped.What is happening is that the pl1e pointer used for mapping the ACPI table entry changes between the calls before paging_init() and after. The l1_pgentry_t that is used before paging_init() correctly shows that the page is present whereas the one used after indicates that the page is not present. Then when the ACPI table is mapped after paging_init() the tlb is not flushed and wrong information is read. Stefan> > -- Keir > > On 10/8/07 19:21, "Keir Fraser" <keir@xensource.com> wrote:> On 10/8/07 18:00, "Stefan Berger" <stefanb@us.ibm.com> wrote:> (XEN) map_pages_to_xen : 3533 > (that''s the line number) > (XEN) 0xfff9b000 was NOT present. > > Something between (*) and here seems to trash this presence flag. > paging_init() and many others lie in between the upper call and this > one here. Could be a side effect of this? Maybe that tlb flush at > the right place in one of these functions would solve the problem? > > Yes, this now looks likely and that?s rather scary. We?ll go after > this next week. > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The area is also used by domain_page.c routines. K. On 24/8/07 08:13, "Jan Beulich" <jbeulich@novell.com> wrote:> Yes, this seems to make things clear: paging_init() (re-)creates the page > directory > for the ioremap area, which was partially established already by set_fixmap()/ > map_pages_to_xen(). While adding a check there seems trivial I wonder what > the purpose of this initialization is, given that there''s no (real) ioremap > anyway > (so it would seem to me that the code there could as well be removed). > > Jan > >>>> Stefan Berger <stefanb@us.ibm.com> 24.08.07 08:19 >>> > xen-devel-bounces@lists.xensource.com wrote on 08/10/2007 09:15:57 PM: > >> A good debugging approach will be to write a function that walks the >> pagetables for that virtual address and prints the PTE that maps it. >> Scatter calls to this function between acpi_boot_table_init() and >> acpi_boot_init() and hence narrow down exactly where the PTE is >> getting zapped. > > What is happening is that the pl1e pointer used for mapping the ACPI table > entry changes between the calls before paging_init() and after. The > l1_pgentry_t that is used before paging_init() correctly shows that the > page is present whereas the one used after indicates that the page is not > present. Then when the ACPI table is mapped after paging_init() the tlb is > not flushed and wrong information is read. > > Stefan > >> >> -- Keir >> >> On 10/8/07 19:21, "Keir Fraser" <keir@xensource.com> wrote: > >> On 10/8/07 18:00, "Stefan Berger" <stefanb@us.ibm.com> wrote: > >> (XEN) map_pages_to_xen : 3533 >> (that''s the line number) >> (XEN) 0xfff9b000 was NOT present. >> >> Something between (*) and here seems to trash this presence flag. >> paging_init() and many others lie in between the upper call and this >> one here. Could be a side effect of this? Maybe that tlb flush at >> the right place in one of these functions would solve the problem? >> >> Yes, this now looks likely and that?s rather scary. We?ll go after >> this next week. >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Okay, please try tip of staging tree (c/s 15773). -- Keir On 24/8/07 07:19, "Stefan Berger" <stefanb@us.ibm.com> wrote:> > xen-devel-bounces@lists.xensource.com wrote on 08/10/2007 09:15:57 PM: > >> > A good debugging approach will be to write a function that walks the >> > pagetables for that virtual address and prints the PTE that maps it. >> > Scatter calls to this function between acpi_boot_table_init() and >> > acpi_boot_init() and hence narrow down exactly where the PTE is >> > getting zapped. > > What is happening is that the pl1e pointer used for mapping the ACPI table > entry changes between the calls before paging_init() and after. The > l1_pgentry_t that is used before paging_init() correctly shows that the page > is present whereas the one used after indicates that the page is not present. > Then when the ACPI table is mapped after paging_init() the tlb is not flushed > and wrong information is read. > > Stefan > >> > >> > -- Keir >> > >> > On 10/8/07 19:21, "Keir Fraser" <keir@xensource.com> wrote: > >> > On 10/8/07 18:00, "Stefan Berger" <stefanb@us.ibm.com> wrote: > >> > (XEN) map_pages_to_xen : 3533 >> > (that''s the line number) >> > (XEN) 0xfff9b000 was NOT present. >> > >> > Something between (*) and here seems to trash this presence flag. >> > paging_init() and many others lie in between the upper call and this >> > one here. Could be a side effect of this? Maybe that tlb flush at >> > the right place in one of these functions would solve the problem? >> > >> > Yes, this now looks likely and that¹s rather scary. We¹ll go after >> > this next week. >> > _______________________________________________ >> > Xen-devel mailing list >> > Xen-devel@lists.xensource.com >> > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Okay, please try tip of staging tree (c/s 15773).This solves the problem with mapping ACPI memory and the timer address is read correctly. Thanks! Stefan> > -- Keir > > On 24/8/07 07:19, "Stefan Berger" <stefanb@us.ibm.com> wrote:> > xen-devel-bounces@lists.xensource.com wrote on 08/10/2007 09:15:57 PM: > > > A good debugging approach will be to write a function that walks the > > pagetables for that virtual address and prints the PTE that maps it. > > Scatter calls to this function between acpi_boot_table_init() and > > acpi_boot_init() and hence narrow down exactly where the PTE is > > getting zapped. > > What is happening is that the pl1e pointer used for mapping the ACPI > table entry changes between the calls before paging_init() and > after. The l1_pgentry_t that is used before paging_init() correctly > shows that the page is present whereas the one used after indicates > that the page is not present. Then when the ACPI table is mapped > after paging_init() the tlb is not flushed and wrong information isread.> > Stefan > > > > > -- Keir > > > > On 10/8/07 19:21, "Keir Fraser" <keir@xensource.com> wrote: > > > On 10/8/07 18:00, "Stefan Berger" <stefanb@us.ibm.com> wrote: > > > (XEN) map_pages_to_xen : 3533 > > (that''s the line number) > > (XEN) 0xfff9b000 was NOT present. > > > > Something between (*) and here seems to trash this presence flag. > > paging_init() and many others lie in between the upper call and this > > one here. Could be a side effect of this? Maybe that tlb flush at > > the right place in one of these functions would solve the problem? > > > > Yes, this now looks likely and that?s rather scary. We?ll go after > > this next week. > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel