thr3ads.net - Xen devel - [PATCH 0 of 3] KEXEC: Introduce new crashtables interface (v2) [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Andrew Cooper

2012-Mar-12 14:53 UTC

[PATCH 0 of 3] KEXEC: Introduce new crashtables interface (v2)

This patch series implement changes to the kexec system in Xen, both along the
lines of supporting a 32bit crash kernel with 64bit Xen, and providing more
useful information to the crash kernel.

Most of these patches have been submitted upstream in the past; The main new one
is patch 4 which introduces the crashtables mechanism of passing information to
the crashdump kernel using ELF CORE notes.  The set of patches (ported to
xen-4.1) have been tested against a wide range of hardware.

Changes since v1:
 *  Drop the top-of-mem patch.  It was convaing information very similar
    information to other memory variables, in a less sensible way.  Therefore,
    use max_page in preference.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Andrew Cooper

2012-Mar-12 14:53 UTC

head link

[PATCH 1 of 3] KEXEC: Allocate crash notes on boot

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in question.
This has certain problems including not being able to allocate the crash buffers
if the host is out of memory when crashing.

In addition, my forthcoming code to support 32bit kdump kernels on 64bit Xen on
large (>64GB) boxes will require some guarentees as to where the crash note
buffers are actually allocated in physical memory.  This is far easier to sort
out at boot time, rather than after dom0 has been booted and potentially using
the physical memory required.

Therefore, allocate the crash note buffers at boot time.

Changes since v6:
 *  Tweak kexec_init() to use xzmalloc_array(), and to defer registering the
    crashdump keyhandler until the crash notes have been successfully
    allocated.

Changes since v5:
 *  Introduce sizeof_cpu_notes to move calculation of note size into a
    separate location.
 *  Tweak sizeof_note() to return size_t rather than int, as it is a
    function based upon sizeof().

Changes since v4:

 *  Replace the current cpu crash note scheme of using void pointers
    and hand calculating the size each time is needed, by a range
    structure containing a pointer and a size.  This removes duplicate
    times where the size is calculated.
 *  Tweak kexec_get_cpu().  Don''t fail if a cpu is offline because it
    may already have crash notes, and may be up by the time a crash
    happens.  Split the error conditions up to return ERANGE for an
    out-of-range cpu request rather than EINVAL.  Finally, returning a
    range of zeros is acceptable, so do this in preference to failing.

Changes since v3:

 *  Alter the spinlocks to avoid calling xmalloc/xfree while holding
    the lock.
 *  Tidy up the coding style used.

Changes since v2:

 *  Allocate crash_notes dynamically using nr_cpu_ids at boot time,
    rather than statically using NR_CPUS.
 *  Fix the incorrect use of signed integers for cpu id.
 *  Fix collateral damage to do_crashdump_trigger() and
    crashdump_trigger_handler caused by reordering sizeof_note() and
    setup_note()
 *  Tweak the issue about returing -ENOMEM from kexec_init_cpu_note().
    No functional change.
 *  Change kexec_get_cpu() to attempt to allocate crash note buffers
    in case we have more free memory now than when the pcpu came up.
 *  Now that there are two codepaths possibly allocating crash notes,
    protect the allocation itself with a spinlock.

Changes since v1:

 *  Use cpu hotplug notifiers to handle allocating of the notes
    buffers rather than assuming the boot state of cpus will be the
    same as the crash state.
 *  Move crash_notes from being per_cpu.  This is because the kdump
    kernel elf binary put in the crash area is hard coded to physical
    addresses which the dom0 kernel typically obtains at boot time.
    If a cpu is offlined, its buffer should not be deallocated because
    the kdump kernel would read junk when trying to get the crash
    notes.  Similarly, the same problem would occur if the cpu was
    re-onlined later and its crash notes buffer was allocated elsewhere.
 *  Only attempt to allocate buffers if a crash area has been
    specified.  Else, allocating crash note buffers is a waste of
    space.  Along with this, change the test in kexec_get_cpu to
    return -EINVAL if no buffers have been allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Andrew Cooper

2012-Mar-12 14:53 UTC

head link

[PATCH 2 of 3] KEXEC: Allocate crash structures in low memory

On 64bit Xen with 32bit dom0 and crashkernel, xmalloc''ing items such
as the CPU crash notes will go into the xenheap, which tends to be in
upper memory.  This causes problems on machines with more than 64GB
(or 4GB if no PAE support) of ram as the crashkernel physically cant
access the crash notes.

The solution is to force Xen to allocate certain structures in lower
memory.  This is achieved by introducing two new command line
parameters; low_crashinfo and crashinfo_maxaddr.  Because of the
potential impact on 32bit PV guests, and that this problem does not
exist for 64bit dom0 on 64bit Xen, this new functionality defaults to
the codebase''s previous behavior, requiring the user to explicitly
add extra command line parameters to change the behavior.

This patch consists of 3 logically distinct but closely related
changes.
 1) Add the two new command line parameters.
 2) Change crash note allocation to use lower memory when instructed.
 3) Change the conring buffer to use lower memory when instructed.

There result is that the crash notes and console ring will be placed
in lower memory so useful information can be recovered in the case of
a crash.

Changes since v1:
 -  Patch xen-command-line.markdown to document new options

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Andrew Cooper

2012-Mar-12 14:53 UTC

head link

[PATCH 3 of 3] KEXEC: Introduce new crashtables interface

Introduce the ''crashtables'' interface as an extensible method
of passing data to
the crashdump kernel, providing a substantially more flexability than the
current method XenServer uses of hacking around with the symbol file.

Care is taken to suitably align values, and to expliticly disabiguate pointers
from sizes by use of a separate table, despite the raw data being compatabile
between the two.

In addition, all the details for "low_crashinfo=min" are passed into
their
respective tables.

Changes since v1:
 *  Fix up naming conventions and static qualifiers.
 *  Remove XENLOG_WARNINGs, as in the case of a crash, they are little or no
    use. Instead, mark entries as invalid so as to reduce confusion to the
    crashdump kernel trying to parse the table.
 *  Change strtab setup so a failure at that point wont fail the entire kexec
    setup.
 *  Change the modifications to console.c to prevent marking conring and
    conring_size as global symbols.
 *  Add more details into crashtables.h clarifying certain information.
 *  Add emacs local variables at the end of crashtables.h

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Mar-12 16:35 UTC

head link

Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface

>>> On 12.03.12 at 15:53, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> Changes since v1:
>  *  Fix up naming conventions and static qualifiers.
>  *  Remove XENLOG_WARNINGs, as in the case of a crash, they are little or
no
>     use. Instead, mark entries as invalid so as to reduce confusion to the
>     crashdump kernel trying to parse the table.
>  *  Change strtab setup so a failure at that point wont fail the entire
kexec
>     setup.
>  *  Change the modifications to console.c to prevent marking conring and
>     conring_size as global symbols.
>  *  Add more details into crashtables.h clarifying certain information.
While you indeed added a few things, I don''t see any of my remarks
against v1 having got addressed.
>  *  Add emacs local variables at the end of crashtables.h
I''d rather like to see them go away in the other public headers too.
>+#define WRITE_STRTAB_ENTRY_CONST(v, s)                      \
>+    *(unsigned long*)ptr = (v); ptr += sl;                  \
>+    memcpy(ptr, s, sizeof(s)); ptr += round_up(sizeof(s))
>+
>+#define WRITE_STRTAB_ENTRY_VAR(v, s)                        \
>+    *(unsigned long*)ptr = (v); ptr += sl;                  \
>+    memcpy(ptr, s, strlen(s)); ptr += round_up(strlen(s)+1)
Didn''t even notice in v1 that here you''re using "unsigned
long" too.
>+        switch( crash_val64tab[i].id )
>+        {
>+        case XEN_VALTAB_MAX_PAGE:
>+            crash_val64tab[i].val = (uint64_t)(max_page);
Pointless cast and parentheses.
>+            break;
>+        case XEN_VALTAB_CONRING_SIZE:
>+            console_write_val64tab(&crash_val64tab[i]);
            console_write_val64tab(crash_val64tab[i].id,
&crash_val64tab[i].val);

would eliminate the need to expose the crashval types to console.h.
I''d further suggest folding the two console related functions.

And my reservations remain: Why is this any better than the consumer
looking at the symbol table (as it would continue to need to do for all
other information it might be after).

Jan

Andrew Cooper

2012-Mar-12 18:42 UTC

head link

Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface

On 12/03/12 16:35, Jan Beulich wrote:>>>> On 12.03.12 at 15:53, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> Changes since v1:
>>  *  Fix up naming conventions and static qualifiers.
>>  *  Remove XENLOG_WARNINGs, as in the case of a crash, they are little
or no
>>     use. Instead, mark entries as invalid so as to reduce confusion to
the
>>     crashdump kernel trying to parse the table.
>>  *  Change strtab setup so a failure at that point wont fail the entire
kexec
>>     setup.
>>  *  Change the modifications to console.c to prevent marking conring
and
>>     conring_size as global symbols.
>>  *  Add more details into crashtables.h clarifying certain information.
> While you indeed added a few things, I don''t see any of my remarks
> against v1 having got addressed.
There is a new comment section at the top, explaining the basis for
numbering, and why unsigned longs are not a problem.

To summarize:

Numbering follows on from XEN_ELFNOTE_CRASH_* (0x100000x) and
XEN_ELFNOTE_DUMPCORE_* (0x200000x)

unsigned longs are more complicated.  The justification was that Xen
would only ever use the size it was compiled for, and this would be
calculable from the core file class (ELFCLASS{32,64})  Any crashdump
environment designed for the same bitness as Xen would be fine, and any
general crashdump environment would have to implement all possible
sizes, based on the class.

I will see if there is a neat way to expose explicit structures for both
possible bitnesses of Xen.
>>  *  Add emacs local variables at the end of crashtables.h
> I''d rather like to see them go away in the other public headers
too.
I am ambivalent, but will maintain consistency with the others.
>> +#define WRITE_STRTAB_ENTRY_CONST(v, s)                      \
>> +    *(unsigned long*)ptr = (v); ptr += sl;                  \
>> +    memcpy(ptr, s, sizeof(s)); ptr += round_up(sizeof(s))
>> +
>> +#define WRITE_STRTAB_ENTRY_VAR(v, s)                        \
>> +    *(unsigned long*)ptr = (v); ptr += sl;                  \
>> +    memcpy(ptr, s, strlen(s)); ptr += round_up(strlen(s)+1)
> Didn''t even notice in v1 that here you''re using
"unsigned long" too.
Same justification as above.
>> +        switch( crash_val64tab[i].id )
>> +        {
>> +        case XEN_VALTAB_MAX_PAGE:
>> +            crash_val64tab[i].val = (uint64_t)(max_page);
> Pointless cast and parentheses.
True
>> +            break;
>> +        case XEN_VALTAB_CONRING_SIZE:
>> +            console_write_val64tab(&crash_val64tab[i]);
>             console_write_val64tab(crash_val64tab[i].id,
&crash_val64tab[i].val);
>
> would eliminate the need to expose the crashval types to console.h.
> I''d further suggest folding the two console related functions.
But the crashval #DEFINES would still need to be exposed and they are
all in the same header file.

The symtab and valtab functions are separate because they refer to
separate tables.  If I were to fold the functions, I would either need
an extra parameter and nested switch statements, or settle with a
dependence between ID values.  The first is an ugly hack, and the second
works against the design principles of separating the tables in the
first place.
> And my reservations remain: Why is this any better than the consumer
> looking at the symbol table (as it would continue to need to do for all
> other information it might be after).
There are several different reasons.  In the ''min'' case, it is
true that
all relevant information is in the symbol table, but there is
information required for ''all'' which wont be in the symbol
table.  We
have had cases in the past where users have updated Xen without updating
the symbol file, resulting in confusing or wrong crash dump analyses. 
Whilst in an ideal world, we wouldn''t support this, that argument
doesn''t fly politically.

Finally, and most importantly along the line of 32bit crashdump support
on 64bit Xen; It is problematic to find out (through the symbol file and
page table walks) that an value you need is located in memory outside
the first 64GB.  By entering the value into the value table, it is
explicitly being made accessible to a 32bit crash kernel.
> Jan
>-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Jan Beulich

2012-Mar-13 08:05 UTC

head link

Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface

>>> On 12.03.12 at 19:42, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 12/03/12 16:35, Jan Beulich wrote:
>>>>> On 12.03.12 at 15:53, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>> Changes since v1:
>>>  *  Fix up naming conventions and static qualifiers.
>>>  *  Remove XENLOG_WARNINGs, as in the case of a crash, they are
little or no
>>>     use. Instead, mark entries as invalid so as to reduce confusion
to the
>>>     crashdump kernel trying to parse the table.
>>>  *  Change strtab setup so a failure at that point wont fail the
entire kexec
>>>     setup.
>>>  *  Change the modifications to console.c to prevent marking
conring and
>>>     conring_size as global symbols.
>>>  *  Add more details into crashtables.h clarifying certain
information.
>> While you indeed added a few things, I don''t see any of my
remarks
>> against v1 having got addressed.
> 
> There is a new comment section at the top, explaining the basis for
> numbering, and why unsigned longs are not a problem.
Didn''t expect it there, and thus overlooked it - I''m sorry.
> To summarize:
> 
> Numbering follows on from XEN_ELFNOTE_CRASH_* (0x100000x) and
> XEN_ELFNOTE_DUMPCORE_* (0x200000x)
Then please call the new ones XEN_ELFNOTE_* too, fold the whole
stuff into public/elfnote.h, and generate the data as normal notes.
Or is there a reason not to? Even more so that part of what you do
here is really overlapping with XEN_ELFNOTE_DUMPCORE_* (which
you could simply re-use).
> unsigned longs are more complicated.  The justification was that Xen
> would only ever use the size it was compiled for, and this would be
> calculable from the core file class (ELFCLASS{32,64})  Any crashdump
> environment designed for the same bitness as Xen would be fine, and any
> general crashdump environment would have to implement all possible
> sizes, based on the class.
If going with "proper" notes, this will become mute anyway.
>>> +#define WRITE_STRTAB_ENTRY_CONST(v, s)                      \
>>> +    *(unsigned long*)ptr = (v); ptr += sl;                  \
>>> +    memcpy(ptr, s, sizeof(s)); ptr += round_up(sizeof(s))
>>> +
>>> +#define WRITE_STRTAB_ENTRY_VAR(v, s)                        \
>>> +    *(unsigned long*)ptr = (v); ptr += sl;                  \
>>> +    memcpy(ptr, s, strlen(s)); ptr += round_up(strlen(s)+1)
>> Didn''t even notice in v1 that here you''re using
"unsigned long" too.
> 
> Same justification as above.
Yet here it doesn''t even make sense - you''re not going to have
a
string table of 4Gb or more in size, and hence you can as well use
uint32_t uniformly.
>>> +            break;
>>> +        case XEN_VALTAB_CONRING_SIZE:
>>> +            console_write_val64tab(&crash_val64tab[i]);
>>             console_write_val64tab(crash_val64tab[i].id, 
> &crash_val64tab[i].val);
>>
>> would eliminate the need to expose the crashval types to console.h.
>> I''d further suggest folding the two console related functions.
> 
> But the crashval #DEFINES would still need to be exposed and they are
> all in the same header file.
Then pass a bool_t instead of the ID. You''re wanting two values from
the console code only after all.
> The symtab and valtab functions are separate because they refer to
> separate tables.  If I were to fold the functions, I would either need
> an extra parameter and nested switch statements, or settle with a
> dependence between ID values.  The first is an ugly hack, and the second
> works against the design principles of separating the tables in the
> first place.
No, the separation can be completely transparent to the console code,
just leveraging that the data type is always uint64_t.
>> And my reservations remain: Why is this any better than the consumer
>> looking at the symbol table (as it would continue to need to do for all
>> other information it might be after).
> 
> There are several different reasons.  In the ''min'' case,
it is true that
> all relevant information is in the symbol table, but there is
> information required for ''all'' which wont be in the
symbol table.  We
> have had cases in the past where users have updated Xen without updating
> the symbol file, resulting in confusing or wrong crash dump analyses. 
When installing a package (or even manually via "make install-xen")
this can''t happen, so someone must have broken this by copying
stuff manually. Which I don''t think the code ought to be prepared
to deal with. You just can''t have provisions for all sort of admin
mistakes - it won''t scale.
> Whilst in an ideal world, we wouldn''t support this, that argument
> doesn''t fly politically.
> 
> Finally, and most importantly along the line of 32bit crashdump support
> on 64bit Xen; It is problematic to find out (through the symbol file and
> page table walks) that an value you need is located in memory outside
> the first 64GB.  By entering the value into the value table, it is
> explicitly being made accessible to a 32bit crash kernel.
I''m not following - Xen always resides in memory below 4G, so no
symbol in the symbol table can possibly be inaccessible (with the
pages tables for Xen''s code and data being static and hence below
4G too, eventual page table walks wouldn''t represent a problem
either).

Jan

Xen devel - Mar 2012 - [PATCH 0 of 3] KEXEC: Introduce new crashtables interface (v2)

[PATCH 0 of 3] KEXEC: Introduce new crashtables interface (v2)

[PATCH 1 of 3] KEXEC: Allocate crash notes on boot

[PATCH 2 of 3] KEXEC: Allocate crash structures in low memory

[PATCH 3 of 3] KEXEC: Introduce new crashtables interface

Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface

Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface

Re: [PATCH 3 of 3] KEXEC: Introduce new crashtables interface