Andrew Cooper
2012-Mar-12 14:53 UTC
[PATCH 0 of 3] KEXEC: Introduce new crashtables interface (v2)
This patch series implement changes to the kexec system in Xen, both along the lines of supporting a 32bit crash kernel with 64bit Xen, and providing more useful information to the crash kernel. Most of these patches have been submitted upstream in the past; The main new one is patch 4 which introduces the crashtables mechanism of passing information to the crashdump kernel using ELF CORE notes. The set of patches (ported to xen-4.1) have been tested against a wide range of hardware. Changes since v1: * Drop the top-of-mem patch. It was convaing information very similar information to other memory variables, in a less sensible way. Therefore, use max_page in preference. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Currently, the buffers for crash notes are allocated per CPU when a KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in question. This has certain problems including not being able to allocate the crash buffers if the host is out of memory when crashing. In addition, my forthcoming code to support 32bit kdump kernels on 64bit Xen on large (>64GB) boxes will require some guarentees as to where the crash note buffers are actually allocated in physical memory. This is far easier to sort out at boot time, rather than after dom0 has been booted and potentially using the physical memory required. Therefore, allocate the crash note buffers at boot time. Changes since v6: * Tweak kexec_init() to use xzmalloc_array(), and to defer registering the crashdump keyhandler until the crash notes have been successfully allocated. Changes since v5: * Introduce sizeof_cpu_notes to move calculation of note size into a separate location. * Tweak sizeof_note() to return size_t rather than int, as it is a function based upon sizeof(). Changes since v4: * Replace the current cpu crash note scheme of using void pointers and hand calculating the size each time is needed, by a range structure containing a pointer and a size. This removes duplicate times where the size is calculated. * Tweak kexec_get_cpu(). Don''t fail if a cpu is offline because it may already have crash notes, and may be up by the time a crash happens. Split the error conditions up to return ERANGE for an out-of-range cpu request rather than EINVAL. Finally, returning a range of zeros is acceptable, so do this in preference to failing. Changes since v3: * Alter the spinlocks to avoid calling xmalloc/xfree while holding the lock. * Tidy up the coding style used. Changes since v2: * Allocate crash_notes dynamically using nr_cpu_ids at boot time, rather than statically using NR_CPUS. * Fix the incorrect use of signed integers for cpu id. * Fix collateral damage to do_crashdump_trigger() and crashdump_trigger_handler caused by reordering sizeof_note() and setup_note() * Tweak the issue about returing -ENOMEM from kexec_init_cpu_note(). No functional change. * Change kexec_get_cpu() to attempt to allocate crash note buffers in case we have more free memory now than when the pcpu came up. * Now that there are two codepaths possibly allocating crash notes, protect the allocation itself with a spinlock. Changes since v1: * Use cpu hotplug notifiers to handle allocating of the notes buffers rather than assuming the boot state of cpus will be the same as the crash state. * Move crash_notes from being per_cpu. This is because the kdump kernel elf binary put in the crash area is hard coded to physical addresses which the dom0 kernel typically obtains at boot time. If a cpu is offlined, its buffer should not be deallocated because the kdump kernel would read junk when trying to get the crash notes. Similarly, the same problem would occur if the cpu was re-onlined later and its crash notes buffer was allocated elsewhere. * Only attempt to allocate buffers if a crash area has been specified. Else, allocating crash note buffers is a waste of space. Along with this, change the test in kexec_get_cpu to return -EINVAL if no buffers have been allocated. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2012-Mar-12 14:53 UTC
[PATCH 2 of 3] KEXEC: Allocate crash structures in low memory
On 64bit Xen with 32bit dom0 and crashkernel, xmalloc''ing items such as the CPU crash notes will go into the xenheap, which tends to be in upper memory. This causes problems on machines with more than 64GB (or 4GB if no PAE support) of ram as the crashkernel physically cant access the crash notes. The solution is to force Xen to allocate certain structures in lower memory. This is achieved by introducing two new command line parameters; low_crashinfo and crashinfo_maxaddr. Because of the potential impact on 32bit PV guests, and that this problem does not exist for 64bit dom0 on 64bit Xen, this new functionality defaults to the codebase''s previous behavior, requiring the user to explicitly add extra command line parameters to change the behavior. This patch consists of 3 logically distinct but closely related changes. 1) Add the two new command line parameters. 2) Change crash note allocation to use lower memory when instructed. 3) Change the conring buffer to use lower memory when instructed. There result is that the crash notes and console ring will be placed in lower memory so useful information can be recovered in the case of a crash. Changes since v1: - Patch xen-command-line.markdown to document new options Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2012-Mar-12 14:53 UTC
[PATCH 3 of 3] KEXEC: Introduce new crashtables interface
Introduce the ''crashtables'' interface as an extensible method of passing data to the crashdump kernel, providing a substantially more flexability than the current method XenServer uses of hacking around with the symbol file. Care is taken to suitably align values, and to expliticly disabiguate pointers from sizes by use of a separate table, despite the raw data being compatabile between the two. In addition, all the details for "low_crashinfo=min" are passed into their respective tables. Changes since v1: * Fix up naming conventions and static qualifiers. * Remove XENLOG_WARNINGs, as in the case of a crash, they are little or no use. Instead, mark entries as invalid so as to reduce confusion to the crashdump kernel trying to parse the table. * Change strtab setup so a failure at that point wont fail the entire kexec setup. * Change the modifications to console.c to prevent marking conring and conring_size as global symbols. * Add more details into crashtables.h clarifying certain information. * Add emacs local variables at the end of crashtables.h Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2012-Mar-12 16:35 UTC
Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface
>>> On 12.03.12 at 15:53, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > Changes since v1: > * Fix up naming conventions and static qualifiers. > * Remove XENLOG_WARNINGs, as in the case of a crash, they are little or no > use. Instead, mark entries as invalid so as to reduce confusion to the > crashdump kernel trying to parse the table. > * Change strtab setup so a failure at that point wont fail the entire kexec > setup. > * Change the modifications to console.c to prevent marking conring and > conring_size as global symbols. > * Add more details into crashtables.h clarifying certain information.While you indeed added a few things, I don''t see any of my remarks against v1 having got addressed.> * Add emacs local variables at the end of crashtables.hI''d rather like to see them go away in the other public headers too.>+#define WRITE_STRTAB_ENTRY_CONST(v, s) \ >+ *(unsigned long*)ptr = (v); ptr += sl; \ >+ memcpy(ptr, s, sizeof(s)); ptr += round_up(sizeof(s)) >+ >+#define WRITE_STRTAB_ENTRY_VAR(v, s) \ >+ *(unsigned long*)ptr = (v); ptr += sl; \ >+ memcpy(ptr, s, strlen(s)); ptr += round_up(strlen(s)+1)Didn''t even notice in v1 that here you''re using "unsigned long" too.>+ switch( crash_val64tab[i].id ) >+ { >+ case XEN_VALTAB_MAX_PAGE: >+ crash_val64tab[i].val = (uint64_t)(max_page);Pointless cast and parentheses.>+ break; >+ case XEN_VALTAB_CONRING_SIZE: >+ console_write_val64tab(&crash_val64tab[i]);console_write_val64tab(crash_val64tab[i].id, &crash_val64tab[i].val); would eliminate the need to expose the crashval types to console.h. I''d further suggest folding the two console related functions. And my reservations remain: Why is this any better than the consumer looking at the symbol table (as it would continue to need to do for all other information it might be after). Jan
Andrew Cooper
2012-Mar-12 18:42 UTC
Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface
On 12/03/12 16:35, Jan Beulich wrote:>>>> On 12.03.12 at 15:53, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> Changes since v1: >> * Fix up naming conventions and static qualifiers. >> * Remove XENLOG_WARNINGs, as in the case of a crash, they are little or no >> use. Instead, mark entries as invalid so as to reduce confusion to the >> crashdump kernel trying to parse the table. >> * Change strtab setup so a failure at that point wont fail the entire kexec >> setup. >> * Change the modifications to console.c to prevent marking conring and >> conring_size as global symbols. >> * Add more details into crashtables.h clarifying certain information. > While you indeed added a few things, I don''t see any of my remarks > against v1 having got addressed.There is a new comment section at the top, explaining the basis for numbering, and why unsigned longs are not a problem. To summarize: Numbering follows on from XEN_ELFNOTE_CRASH_* (0x100000x) and XEN_ELFNOTE_DUMPCORE_* (0x200000x) unsigned longs are more complicated. The justification was that Xen would only ever use the size it was compiled for, and this would be calculable from the core file class (ELFCLASS{32,64}) Any crashdump environment designed for the same bitness as Xen would be fine, and any general crashdump environment would have to implement all possible sizes, based on the class. I will see if there is a neat way to expose explicit structures for both possible bitnesses of Xen.>> * Add emacs local variables at the end of crashtables.h > I''d rather like to see them go away in the other public headers too.I am ambivalent, but will maintain consistency with the others.>> +#define WRITE_STRTAB_ENTRY_CONST(v, s) \ >> + *(unsigned long*)ptr = (v); ptr += sl; \ >> + memcpy(ptr, s, sizeof(s)); ptr += round_up(sizeof(s)) >> + >> +#define WRITE_STRTAB_ENTRY_VAR(v, s) \ >> + *(unsigned long*)ptr = (v); ptr += sl; \ >> + memcpy(ptr, s, strlen(s)); ptr += round_up(strlen(s)+1) > Didn''t even notice in v1 that here you''re using "unsigned long" too.Same justification as above.>> + switch( crash_val64tab[i].id ) >> + { >> + case XEN_VALTAB_MAX_PAGE: >> + crash_val64tab[i].val = (uint64_t)(max_page); > Pointless cast and parentheses.True>> + break; >> + case XEN_VALTAB_CONRING_SIZE: >> + console_write_val64tab(&crash_val64tab[i]); > console_write_val64tab(crash_val64tab[i].id, &crash_val64tab[i].val); > > would eliminate the need to expose the crashval types to console.h. > I''d further suggest folding the two console related functions.But the crashval #DEFINES would still need to be exposed and they are all in the same header file. The symtab and valtab functions are separate because they refer to separate tables. If I were to fold the functions, I would either need an extra parameter and nested switch statements, or settle with a dependence between ID values. The first is an ugly hack, and the second works against the design principles of separating the tables in the first place.> And my reservations remain: Why is this any better than the consumer > looking at the symbol table (as it would continue to need to do for all > other information it might be after).There are several different reasons. In the ''min'' case, it is true that all relevant information is in the symbol table, but there is information required for ''all'' which wont be in the symbol table. We have had cases in the past where users have updated Xen without updating the symbol file, resulting in confusing or wrong crash dump analyses. Whilst in an ideal world, we wouldn''t support this, that argument doesn''t fly politically. Finally, and most importantly along the line of 32bit crashdump support on 64bit Xen; It is problematic to find out (through the symbol file and page table walks) that an value you need is located in memory outside the first 64GB. By entering the value into the value table, it is explicitly being made accessible to a 32bit crash kernel.> Jan >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Jan Beulich
2012-Mar-13 08:05 UTC
Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface
>>> On 12.03.12 at 19:42, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 12/03/12 16:35, Jan Beulich wrote: >>>>> On 12.03.12 at 15:53, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> Changes since v1: >>> * Fix up naming conventions and static qualifiers. >>> * Remove XENLOG_WARNINGs, as in the case of a crash, they are little or no >>> use. Instead, mark entries as invalid so as to reduce confusion to the >>> crashdump kernel trying to parse the table. >>> * Change strtab setup so a failure at that point wont fail the entire kexec >>> setup. >>> * Change the modifications to console.c to prevent marking conring and >>> conring_size as global symbols. >>> * Add more details into crashtables.h clarifying certain information. >> While you indeed added a few things, I don''t see any of my remarks >> against v1 having got addressed. > > There is a new comment section at the top, explaining the basis for > numbering, and why unsigned longs are not a problem.Didn''t expect it there, and thus overlooked it - I''m sorry.> To summarize: > > Numbering follows on from XEN_ELFNOTE_CRASH_* (0x100000x) and > XEN_ELFNOTE_DUMPCORE_* (0x200000x)Then please call the new ones XEN_ELFNOTE_* too, fold the whole stuff into public/elfnote.h, and generate the data as normal notes. Or is there a reason not to? Even more so that part of what you do here is really overlapping with XEN_ELFNOTE_DUMPCORE_* (which you could simply re-use).> unsigned longs are more complicated. The justification was that Xen > would only ever use the size it was compiled for, and this would be > calculable from the core file class (ELFCLASS{32,64}) Any crashdump > environment designed for the same bitness as Xen would be fine, and any > general crashdump environment would have to implement all possible > sizes, based on the class.If going with "proper" notes, this will become mute anyway.>>> +#define WRITE_STRTAB_ENTRY_CONST(v, s) \ >>> + *(unsigned long*)ptr = (v); ptr += sl; \ >>> + memcpy(ptr, s, sizeof(s)); ptr += round_up(sizeof(s)) >>> + >>> +#define WRITE_STRTAB_ENTRY_VAR(v, s) \ >>> + *(unsigned long*)ptr = (v); ptr += sl; \ >>> + memcpy(ptr, s, strlen(s)); ptr += round_up(strlen(s)+1) >> Didn''t even notice in v1 that here you''re using "unsigned long" too. > > Same justification as above.Yet here it doesn''t even make sense - you''re not going to have a string table of 4Gb or more in size, and hence you can as well use uint32_t uniformly.>>> + break; >>> + case XEN_VALTAB_CONRING_SIZE: >>> + console_write_val64tab(&crash_val64tab[i]); >> console_write_val64tab(crash_val64tab[i].id, > &crash_val64tab[i].val); >> >> would eliminate the need to expose the crashval types to console.h. >> I''d further suggest folding the two console related functions. > > But the crashval #DEFINES would still need to be exposed and they are > all in the same header file.Then pass a bool_t instead of the ID. You''re wanting two values from the console code only after all.> The symtab and valtab functions are separate because they refer to > separate tables. If I were to fold the functions, I would either need > an extra parameter and nested switch statements, or settle with a > dependence between ID values. The first is an ugly hack, and the second > works against the design principles of separating the tables in the > first place.No, the separation can be completely transparent to the console code, just leveraging that the data type is always uint64_t.>> And my reservations remain: Why is this any better than the consumer >> looking at the symbol table (as it would continue to need to do for all >> other information it might be after). > > There are several different reasons. In the ''min'' case, it is true that > all relevant information is in the symbol table, but there is > information required for ''all'' which wont be in the symbol table. We > have had cases in the past where users have updated Xen without updating > the symbol file, resulting in confusing or wrong crash dump analyses.When installing a package (or even manually via "make install-xen") this can''t happen, so someone must have broken this by copying stuff manually. Which I don''t think the code ought to be prepared to deal with. You just can''t have provisions for all sort of admin mistakes - it won''t scale.> Whilst in an ideal world, we wouldn''t support this, that argument > doesn''t fly politically. > > Finally, and most importantly along the line of 32bit crashdump support > on 64bit Xen; It is problematic to find out (through the symbol file and > page table walks) that an value you need is located in memory outside > the first 64GB. By entering the value into the value table, it is > explicitly being made accessible to a 32bit crash kernel.I''m not following - Xen always resides in memory below 4G, so no symbol in the symbol table can possibly be inaccessible (with the pages tables for Xen''s code and data being static and hence below 4G too, eventual page table walks wouldn''t represent a problem either). Jan