thr3ads.net - Xen devel - [RFC] KEXEC: allocate crash note buffers at boot time [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Keir Fraser

2011-Nov-29 11:19 UTC

Re: [RFC] KEXEC: allocate crash note buffers at boot time

On 29/11/2011 18:56, "Andrew Cooper" <andrew.cooper3@citrix.com>
wrote:
> Hello,
> 
> As I have little to no knowledge of this stage of the boot process, is
> this a sensible way to be setting up the per_cpu areas?  I have a
> sneaking suspicion that it will fall over if a CPU is onlined after
> boot, and may also fall over if a CPU is offlined and reonlined later.
> There appears to be no infrastructure currently in place for this type
> of initialization, which is quite possibly why the code exists in its
> current form.
No it''s bad. For starters you should use for_each_online_cpu, not do an
open-coded for-loop. Secondly you should register a cpu hotplug notifier
(register_cpu_notifier()) to pick up and handle future
CPU_UP_PREPARE/CPU_UP_CANCELED/CPU_DEAD events. This can be hung off an
__initcall. See common/stop_machine.c for example, or common/timer.c, which
doesn''t even require a for_each_online_cpu loop because its init code
gets
run before we bring up secondary CPUs.

 -- Keir 
> Thanks,

Andrew Cooper

2011-Nov-29 18:56 UTC

head link

[RFC] KEXEC: allocate crash note buffers at boot time

Hello,

As I have little to no knowledge of this stage of the boot process, is
this a sensible way to be setting up the per_cpu areas?  I have a
sneaking suspicion that it will fall over if a CPU is onlined after
boot, and may also fall over if a CPU is offlined and reonlined later. 
There appears to be no infrastructure currently in place for this type
of initialization, which is quite possibly why the code exists in its
current form.

Thanks,

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Nov-30 09:20 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time

>>> On 29.11.11 at 19:56, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> As I have little to no knowledge of this stage of the boot process, is
> this a sensible way to be setting up the per_cpu areas?  I have a
> sneaking suspicion that it will fall over if a CPU is onlined after
> boot, and may also fall over if a CPU is offlined and reonlined later. 
> There appears to be no infrastructure currently in place for this type
> of initialization, which is quite possibly why the code exists in its
> current form.
I actually wonder how you came to those 4 statements you make in
the description - none of these seem to me like they are really an
issue (this would instead be plain bugs in Dom0). Did you actually look
at the existing Dom0 implementation(s)?

Further, while not being a huge waste of memory, it still is one in case
kexec gets never enabled, especially when considering a Dom0 kernel
that''s being built without CONFIG_KEXEC (or an incapable on, like any
pv-ops kernel to date). So I also conceptually question that change.

Jan

Andrew Cooper

2011-Nov-30 13:14 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time

On 29/11/11 11:19, Keir Fraser wrote:> On 29/11/2011 18:56, "Andrew Cooper"
<andrew.cooper3@citrix.com> wrote:
>
>> Hello,
>>  
>> As I have little to no knowledge of this stage of the boot process, is
>> this a sensible way to be setting up the per_cpu areas?  I have a
>> sneaking suspicion that it will fall over if a CPU is onlined after
>> boot, and may also fall over if a CPU is offlined and reonlined later.
>> There appears to be no infrastructure currently in place for this type
>> of initialization, which is quite possibly why the code exists in its
>> current form.
> No it''s bad. For starters you should use for_each_online_cpu, not
do an
> open-coded for-loop. Secondly you should register a cpu hotplug notifier
> (register_cpu_notifier()) to pick up and handle future
> CPU_UP_PREPARE/CPU_UP_CANCELED/CPU_DEAD events. This can be hung off an
> __initcall. See common/stop_machine.c for example, or common/timer.c, which
> doesn''t even require a for_each_online_cpu loop because its init
code gets
> run before we bring up secondary CPUs.
>
>  -- Keir 
>
Thanks - this is exactly what I was looking for.

Sadly, after thinking about cpu hotplug safety, it is not safe to store
the crash note pointer in the per cpu data.  Once you have reported an
area to dom0 via KEXEC_CMD_get_reserve, the areas reported cant possibly
move.

Therefore, I need to redesign a little bit.

Thanks,

~Andrew
>> Thanks,
>
-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Andrew Cooper

2011-Nov-30 14:01 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time

On 30/11/11 09:20, Jan Beulich wrote:>>>> On 29.11.11 at 19:56, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> As I have little to no knowledge of this stage of the boot process, is
>> this a sensible way to be setting up the per_cpu areas?  I have a
>> sneaking suspicion that it will fall over if a CPU is onlined after
>> boot, and may also fall over if a CPU is offlined and reonlined later. 
>> There appears to be no infrastructure currently in place for this type
>> of initialization, which is quite possibly why the code exists in its
>> current form.
> I actually wonder how you came to those 4 statements you make in
> the description - none of these seem to me like they are really an
> issue (this would instead be plain bugs in Dom0). Did you actually look
> at the existing Dom0 implementation(s)?
>
> Further, while not being a huge waste of memory, it still is one in case
> kexec gets never enabled, especially when considering a Dom0 kernel
> that''s being built without CONFIG_KEXEC (or an incapable on, like
any
> pv-ops kernel to date). So I also conceptually question that change.
>
> Jan
We (XenServer) have had many cases of the kexec path failing on customer
boxes under weird and seemingly inexplicable circumstances.  This is why
I am working on trying to bullet-proofing the entire path.

We have 1 ticket where the contents of a crash note is clearly bogus (a
PRSTATUS is not 2GB long).  We have a ticket where it appears that the
kdump kernel has failed to reassemble /proc/vmcore from elfcorehdr as a
few pcpus worth of crash notes are missing.  I seem to remember a ticket
from before my time with a crash while writing the crash notes in Xen
itself. We even have a ticket stating that you get different crash notes
depending on whether you crash using the Xen debug keys (crash notes
appear completely bogus) or /proc/sysrq-trigger in dom0 (seems to be fine).

All of these are uncertain on reproducibility (except the final one
which was shown to reproduce on Xen-3.x and not on Xen-4.x so was not
investigated further) and have a habit of being unreproducible on any of
our local hardware, which makes fixing the problems tricky.

So yes -  the 4 points I have made are certainly not regular or common
behavior, but given some of the tickets we have, I am fairly sure it is
not a bug-free path.  I have checked the 2.6.32 implementation of
dom0''s
side of this and agree that it looks ok.  However, it is my opinion that
relying on a certain hypercalling pattern from dom0 is a perilous route
for Xen, whether it is likely for that pattern to change in the future
or not.

Having said all of this, I agree with your second paragraph.  As already
noted in my other email in this thread, I need to change the
implementation of this, so I will key the initial allocation of memory
on whether crashkernel= has been passed.  This should be sufficient
indication as to whether the user minds having the space allocated or not.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Andrew Cooper

2011-Nov-30 17:24 UTC

head link

[RFC] KEXEC: allocate crash note buffers at boot time v2

Hello,

Presented is version 2 of this patch, which uses cpu hotplug
notifications to be rather safer when allocating buffers.

It occurs to me that there is a bit of an API problem with it comes to
combining a crashdump kernel with potential hotplug events.

If dom0 does not get notification of new/removed pcpus, and if it
doesn''t re-evaluate /proc/iomem by recalling things like
KEXEC_CMD_get_range, then subsequent calls to /sbin/kexec -p will bundle
up stale information into the kdump magic bundle.

Even if dom0 does get a notification of pcpu hotplug events, and it
updates its iomem map, /sbin/kexec would still need to be called to
bundle the new information into the kdump magic bundle.  Possibly doable
off a udev CPU hotplug event.

Even if /sbin/kexec gets called after hotplug events, a crash before the
new kexec magic bundle has been loaded will still result in the old
bundle being used, with its stale information regarding the position and
number of crash notes.

Sadly, I don''t see any easy or sensible solution to these problems. 
However, it is probably worth knowing as a potential limitation.

I have worked around this problem by never deallocating crash notes, so
even if information is stale as to the number of crash notes to be
expected, the stale physical addresses still point to allocated notes
buffers which have been partially initialized to have sensible note
headers in.  This unfortunately means that an offlined cpu which was
present at boot time will have a notes section with zero''d data.  It
also means that a cpu onlined after boot which crashes will not have its
register state presented to the kdump environment.

I would like to hope that both of these cases are unlikely, but again,
it is still a potential limitation.

The cpu crash notes themselves (PRSTATUS and XEN_ELFNOTE_CRASH_REGS)
don''t contain a reference to which pcpu they represent.  This is
inferred by /sbin/kexec from the order in which they appear in dom0''s
/proc/iomem, meaning that the kdump environments idea of which pcpu is
which might differ from Xen''s. This depending on whether Xen allocates
the notes buffer in ascending order, whether dom0 sorts memory addresses
reported, and, as it currently gets it correct, whether either of these
behaviors change in the future.

This final issue is within my ability to fix and I will do so in the
near future, along with some other additions I already need to make to
the per cpu crash notes.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Dec-01 05:20 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

On 01/12/2011 12:56, "Jan Beulich" <JBeulich@suse.com> wrote:
>> +    register_keyhandler(''C'',
&crashdump_trigger_keyhandler);
>> +
>> +    /* If no crash area, no need to allocate space for notes. */
>> +    if ( 0 == kexec_crash_area.size )
>> +        return 0;
> 
> Wouldn''t it make sense to switch the order of these?
I also really hate this constant-first ordering.

 -- Keir

Jan Beulich

2011-Dec-01 09:08 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v2

>>> On 30.11.11 at 18:24, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
>+static void * crash_notes[NR_CPUS];
If at all possible, can you please allocate this dynamically using
nr_cpu_ids for the size?
>+static int kexec_init_cpu_notes(const int cpu)
If the function''s argument was made ''unsigned int''
(as it already is in
its caller), then ...
>+    BUG_ON( cpu < 0 || cpu >= NR_CPUS );
... only the second check would be needed. Furthermore, it''s
pointless to use signed types for CPU numbers (and it''s
inefficient on various architectures), despite there being numerous
(bad) examples throughout the tree.
>+    if ( 0 == cpu )
>+        nr_bytes = nr_bytes +
>+            sizeof_note("Xen", sizeof(crash_xen_info_t));
Minor nit: Why not use += here (presumably allowing the statement to
fit on one line)?
>+    if ( ! note )
>+        /* Ideally, this would be -ENOMEM.  However, there are more
problems
>+         * assocated with trying to deal with -ENOMEM sensibly than just
>+         * pretending that the crash note area doesn''t exist. */
>+        return 0;
The only current caller ignores the return value, so why the bogus
return value?
>+    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) ||
!crash_notes[nr] )
>...
>-    if ( per_cpu(crash_notes, nr) == NULL )
>-    {
>...
>-    }
I would suggest to retry allocation here. Even if this isn''t suitable
for
a 32-bit kdump kernel on large systems, there#s no reason to penalize
fully 64-bit environment.

And here it would also become meaningful that kexec_init_cpu_notes()
actually returns a meaningful error upon failure.

Also, I can''t see the reason for the patch to move around
do_crashdump_trigger() and crashdump_trigger_keyhandler.

Jan

Andrew Cooper

2011-Dec-01 09:49 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v2

On 01/12/11 09:08, Jan Beulich wrote:>>>> On 30.11.11 at 18:24, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> -static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
>> +static void * crash_notes[NR_CPUS];
> If at all possible, can you please allocate this dynamically using
> nr_cpu_ids for the size?
Ok
>> +static int kexec_init_cpu_notes(const int cpu)
> If the function''s argument was made ''unsigned
int'' (as it already is in
> its caller), then ...
>
>> +    BUG_ON( cpu < 0 || cpu >= NR_CPUS );
> ... only the second check would be needed. Furthermore, it''s
> pointless to use signed types for CPU numbers (and it''s
> inefficient on various architectures), despite there being numerous
> (bad) examples throughout the tree.
Good point.  I will clean this up
>> +    if ( 0 == cpu )
>> +        nr_bytes = nr_bytes +
>> +            sizeof_note("Xen", sizeof(crash_xen_info_t));
> Minor nit: Why not use += here (presumably allowing the statement to
> fit on one line)?
Because the next few patches in my series adds a new crash notes at this
point.  I felt it was neater to leave in this form for a reduced diff
next patch, and gcc will optimize it to +>> +    if ( ! note )
>> +        /* Ideally, this would be -ENOMEM.  However, there are more
problems
>> +         * assocated with trying to deal with -ENOMEM sensibly than
just
>> +         * pretending that the crash note area doesn''t exist.
*/
>> +        return 0;
> The only current caller ignores the return value, so why the bogus
> return value?
Originally it passed the return value back up to the cpu hotplug
notifier, but I decided that causing an -ENOMEM at this point was a
little silly given that:
1) a lack of mem on boot will quickly kill the host in other ways, and
2) while there is no way useful way to ensure that the crashdump kernel
gets reloaded with new notes pointers, blocking on not being able to
allocate notes is silly.

Would you consider this enough of a problem to actually fail the
CPU_PREPARE_UP ?

On further thought from my musings yesterday, it would be fantastic if
we were able to point the kdump kernel at a PT_NOTE header without
discerned length, at which point Xen can rewrite its crash notes without
having to get /sbin/kexec to repackage its magic elf blog pointing to
new/different memory locations.  However, as this would involve changes
to kexec-tools, Linux (and Xen) in a not trivially backward compatible
fashion, the chances of it happening are slim.
>> +    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) ||
!crash_notes[nr] )
>> ...
>> -    if ( per_cpu(crash_notes, nr) == NULL )
>> -    {
>> ...
>> -    }
> I would suggest to retry allocation here. Even if this isn''t
suitable for
> a 32-bit kdump kernel on large systems, there#s no reason to penalize
> fully 64-bit environment.
At this point, none of the allocation makes it suitable for a 32bit
system.  For that, I need to start investigating something akin to
xalloc_below(), which is probably going to be todays task.  If we have
previously failed to allocate the notes buffer (and are somehow still
running), what if anything makes it likely that we will successfully
allocate memory this time?
> And here it would also become meaningful that kexec_init_cpu_notes()
> actually returns a meaningful error upon failure.
> Also, I can''t see the reason for the patch to move around
> do_crashdump_trigger() and crashdump_trigger_keyhandler.
This is probably collateral damage from having to reorder sizeof_note()
and setup_note() in preference to making a forward declaration of them. 
I will see about fixing.
> Jan
>
-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Jan Beulich

2011-Dec-01 10:01 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v2

>>> On 01.12.11 at 10:49, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 01/12/11 09:08, Jan Beulich wrote:
>>>>> On 30.11.11 at 18:24, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>> +    if ( ! note )
>>> +        /* Ideally, this would be -ENOMEM.  However, there are
more problems
>>> +         * assocated with trying to deal with -ENOMEM sensibly
than just
>>> +         * pretending that the crash note area doesn''t
exist. */
>>> +        return 0;
>> The only current caller ignores the return value, so why the bogus
>> return value?
> 
> Originally it passed the return value back up to the cpu hotplug
> notifier, but I decided that causing an -ENOMEM at this point was a
> little silly given that:
> 1) a lack of mem on boot will quickly kill the host in other ways, and
> 2) while there is no way useful way to ensure that the crashdump kernel
> gets reloaded with new notes pointers, blocking on not being able to
> allocate notes is silly.
> 
> Would you consider this enough of a problem to actually fail the
> CPU_PREPARE_UP ?
No, absolutely not. Ignoring the return value there is fine.
>>> +    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) ||
!crash_notes[nr] )
>>> ...
>>> -    if ( per_cpu(crash_notes, nr) == NULL )
>>> -    {
>>> ...
>>> -    }
>> I would suggest to retry allocation here. Even if this isn''t
suitable for
>> a 32-bit kdump kernel on large systems, there#s no reason to penalize
>> fully 64-bit environment.
> 
> At this point, none of the allocation makes it suitable for a 32bit
> system.  For that, I need to start investigating something akin to
> xalloc_below(), which is probably going to be todays task.  If we have
> previously failed to allocate the notes buffer (and are somehow still
> running), what if anything makes it likely that we will successfully
> allocate memory this time?
Because memory got freed meanwhile? I''m particularly having post-
boot onlining of CPUs in mind; of course, if allocation failed during
boot we have bigger problems than this one.

Jan

Andrew Cooper

2011-Dec-01 12:29 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Here is v3 of this patch.

After some consideration, I am not so certain the spinlock is strictly
necessary.  However, as it is not a common codepath, I figure that it is
better to be safe than sorry.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Dec-01 12:56 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

>>> On 01.12.11 at 13:29, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>+static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;
Please use DEFINE_SPINLOCK() here.
>+    register_keyhandler(''C'',
&crashdump_trigger_keyhandler);
>+
>+    /* If no crash area, no need to allocate space for notes. */
>+    if ( 0 == kexec_crash_area.size )
>+        return 0;
Wouldn''t it make sense to switch the order of these?
>+    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));
Please use xmalloc_array() here.
>+    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )
The first check is pointless - the function will return zero if the
allocation was already done.

Further, you shouldn''t take a lock around a call to xmalloc() or alike
unless absolutely necessary. It is pretty simple to avoid here - you
really only need to lock around the storing of the pointer and maybe
the setup_note() calls (but be careful with returning -ENOMEM - you
shouldn''t if the allocation fails, but you then find - under the lock -
that a pointer was already set by another CPU).

Finally, one thing I failed to notice on the previous version - the
nr_bytes calculations are now being done twice. This should
probably be moved into a helper function, especially since you
said you intend to add stuff here subsequently.

Jan

Andrew Cooper

2011-Dec-01 13:59 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

On 01/12/11 12:56, Jan Beulich wrote:>>>> On 01.12.11 at 13:29, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;
> Please use DEFINE_SPINLOCK() here.
Ok
>> +    register_keyhandler(''C'',
&crashdump_trigger_keyhandler);
>> +
>> +    /* If no crash area, no need to allocate space for notes. */
>> +    if ( 0 == kexec_crash_area.size )
>> +        return 0;
> Wouldn''t it make sense to switch the order of these?
Possibly.  In the case where a crash kernel has not been loaded, it
would degrade to a reboot, so it is still of some use if the there is no
kexec area.  Having said that, there is an explicit reboot handler, so
making this one disappear is probably a good thing.
>> +    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));
> Please use xmalloc_array() here.
Yes - it was dim of me to forget that.
>> +    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )
> The first check is pointless - the function will return zero if the
> allocation was already done.
Good point - I missed that.
> Further, you shouldn''t take a lock around a call to xmalloc() or
alike
> unless absolutely necessary. It is pretty simple to avoid here - you
> really only need to lock around the storing of the pointer and maybe
> the setup_note() calls (but be careful with returning -ENOMEM - you
> shouldn''t if the allocation fails, but you then find - under the
lock -
> that a pointer was already set by another CPU).
So what we should do is this:

xmalloc
take lock
check to see if the entry is been filled in the meantime.  if so, free
the malloc''d region
release lock
only return -ENOMEM if we fail the malloc and the crash_note is still
NULL when we take the lock

I think this ought to cover all possible cases ?

(In reality I think the xmalloc itself should be covered by the fact we
will fail the !cpu_online(nr) test before we consider trying to
reallocate the buffer, but that doesn''t preclude future proofing the
code)
> Finally, one thing I failed to notice on the previous version - the
> nr_bytes calculations are now being done twice. This should
> probably be moved into a helper function, especially since you
> said you intend to add stuff here subsequently.
I had noticed this and was going to let it slide for now, considering
what would be best to do about it.  Playing with void pointers and
calculating lengths with sizeof is always more dangerous than
calculating a size, malloc''ing it and filling in a range start and
size.

Given that it is such a rare codepath, I am honestly not sure which is
the better tradeof - an extra function call in 2 places or doubling the
size of the crash_notes array by introducing a size as well as a start. 
Both seem very minor in the grand scheme of things.
> Jan
>
-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Andrew Cooper

2011-Dec-01 14:00 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

On 01/12/11 05:20, Keir Fraser wrote:> On 01/12/2011 12:56, "Jan Beulich" <JBeulich@suse.com>
wrote:
>
>>> +    register_keyhandler(''C'',
&crashdump_trigger_keyhandler);
>>> +
>>> +    /* If no crash area, no need to allocate space for notes. */
>>> +    if ( 0 == kexec_crash_area.size )
>>> +        return 0;
>> Wouldn''t it make sense to switch the order of these?
> I also really hate this constant-first ordering.
>
>  -- Keir
>
It was a useful habit I got into when learning C/C++ and has stayed with
me.  I will endeavor to not do it.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Andrew Cooper

2011-Dec-01 15:02 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

On 01/12/11 12:56, Jan Beulich wrote:>>>> On 01.12.11 at 13:29, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;
> Please use DEFINE_SPINLOCK() here.
>
>> +    register_keyhandler(''C'',
&crashdump_trigger_keyhandler);
>> +
>> +    /* If no crash area, no need to allocate space for notes. */
>> +    if ( 0 == kexec_crash_area.size )
>> +        return 0;
> Wouldn''t it make sense to switch the order of these?
>
>> +    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));
> Please use xmalloc_array() here.
>
>> +    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )
> The first check is pointless - the function will return zero if the
> allocation was already done.
I forgot to say in my previous email.  Short circuit evaluation should
prevent the kexec_init_cpu_notes() function call actually being made.
> Further, you shouldn''t take a lock around a call to xmalloc() or
alike
> unless absolutely necessary. It is pretty simple to avoid here - you
> really only need to lock around the storing of the pointer and maybe
> the setup_note() calls (but be careful with returning -ENOMEM - you
> shouldn''t if the allocation fails, but you then find - under the
lock -
> that a pointer was already set by another CPU).
>
> Finally, one thing I failed to notice on the previous version - the
> nr_bytes calculations are now being done twice. This should
> probably be moved into a helper function, especially since you
> said you intend to add stuff here subsequently.
>
> Jan
>
-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Jan Beulich

2011-Dec-01 15:14 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

>>> On 01.12.11 at 14:59, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 01/12/11 12:56, Jan Beulich wrote:
>> Further, you shouldn''t take a lock around a call to xmalloc()
or alike
>> unless absolutely necessary. It is pretty simple to avoid here - you
>> really only need to lock around the storing of the pointer and maybe
>> the setup_note() calls (but be careful with returning -ENOMEM - you
>> shouldn''t if the allocation fails, but you then find - under
the lock -
>> that a pointer was already set by another CPU).
> 
> So what we should do is this:
> 
> xmalloc
> take lock
> check to see if the entry is been filled in the meantime.  if so, free
> the malloc''d region
Don''t call xfree() with the lock held either, if possible.
> release lock
> only return -ENOMEM if we fail the malloc and the crash_note is still
> NULL when we take the lock
Jan

Jan Beulich

2011-Dec-01 15:15 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

>>> On 01.12.11 at 16:02, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 01/12/11 12:56, Jan Beulich wrote:
>>>>> On 01.12.11 at 13:29, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;
>> Please use DEFINE_SPINLOCK() here.
>>
>>> +    register_keyhandler(''C'',
&crashdump_trigger_keyhandler);
>>> +
>>> +    /* If no crash area, no need to allocate space for notes. */
>>> +    if ( 0 == kexec_crash_area.size )
>>> +        return 0;
>> Wouldn''t it make sense to switch the order of these?
>>
>>> +    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));
>> Please use xmalloc_array() here.
>>
>>> +    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr)
)
>> The first check is pointless - the function will return zero if the
>> allocation was already done.
> 
> I forgot to say in my previous email.  Short circuit evaluation should
> prevent the kexec_init_cpu_notes() function call actually being made.
What would that buy you? Performance is not an issue on this code
path.

Jan

Andrew Cooper

2011-Dec-01 17:14 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v4

Version 4 attached.

Fixed the logic regarding locking and x{malloc,free}''ing, added a few
more comments in places, and changed the coding style to avoid
constant-first compares.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Dec-02 08:02 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v4

>>> On 01.12.11 at 18:14, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
Looks good to me now except for one minor thing (below), and the
fact that you still decided to retain the two duplicates of the size
calculation (I''ll have to remember to clean this up if you
don''t, unless
Keir explicitly agrees with the duplication).
>+    /* Try once again to allocate room for the crash notes.  It is just
possible
>+     * that more space has become available since we last tried.  If space
has
>+     * already been allocated, kexec_init_cpu_notes() will return early
with 0.
>+     */
>+    if ( kexec_init_cpu_notes(nr) )
>         return -EINVAL;
The function can fail only with -ENOMEM, so why not return this here?

Jan

Andrew Cooper

2011-Dec-02 12:33 UTC

head link

Re: [RFC] KEXEC: allocate crash note buffers at boot time v4

On 02/12/11 08:02, Jan Beulich wrote:>>>> On 01.12.11 at 18:14, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> Looks good to me now except for one minor thing (below), and the
> fact that you still decided to retain the two duplicates of the size
> calculation (I''ll have to remember to clean this up if you
don''t, unless
> Keir explicitly agrees with the duplication).
Ok - I will turn them into a start,size pair
>> +    /* Try once again to allocate room for the crash notes.  It is
just possible
>> +     * that more space has become available since we last tried.  If
space has
>> +     * already been allocated, kexec_init_cpu_notes() will return
early with 0.
>> +     */
>> +    if ( kexec_init_cpu_notes(nr) )
>>         return -EINVAL;
> The function can fail only with -ENOMEM, so why not return this here?
>
> Jan
>
Actually, returning -EINVAL here is counter productive.  -EINVAL is used
by dom0 to work out when it has asked for each CPU.  This in itself is a
little broken because there is nothing stopping a middle CPU from being
offline at the time these hypercalls are made.  The other thing is that
there is nothing stopping an offline cpu from having a valid notes
section.  Therefore, the test of online should be removed, so -EINVAL
only gets returned for cpu out of range, or not set up notes at all in
the first place.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Andrew Cooper

2011-Dec-02 15:19 UTC

head link

KEXEC: allocate crash note buffers at boot time v5

Here is v5.

I have tested quite carefully the impact of changing the error
conditions in kexec_get_cpu().  dom0 is happy to accept ERANGE as well
as EINVAL, and ERANGE is a more sensible response.  Moreover, this new
scheme (ignoring for a moment ENOMEM) allows for all nr_cpu_ids to give
note ranges to dom0 whether they are up or not at the moment.   It is
better to have more notes provided with some of them potentially not
filled in, than to miss a crashed CPU register state because it was
offline when the crash kernel was loaded.

All this testing has been done via backport to 4.1.2.  Minimal changes
were necessary, such as the lack of xzalloc_array() and NR_CPUS instead
of nr_cpu_ids, but it was otherwise functionally identical.

I am not sure whether to recomend this for backport into 4.1-testing or
not.  It does make the crashkernel more likely to get crash notes,
although in reality, it was only special circumstances which caused
problems.  If you consider it worth backporting, I can provide my
already-backported patch.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Dec-02 16:04 UTC

head link

Re: KEXEC: allocate crash note buffers at boot time v5

(Andrew, my previous reply dropped all Cc-s, but I''m re-sending not
only
because of that - I also adjusted the first part of the response.)
>>> On 02.12.11 at 16:19, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
I just had another look at the Dom0 side of things, and I fail to see why
you think moving this out of per-CPU data is necessary: All Dom0 does
with the provided info is set up the resource tree. The data doesn''t
get
stored for any post-boot use. What am I overlooking?
>+    /* In the case of still not having enough memory to allocate buffer
room,
>+     * returning a range of 0,0 is still valid. */
>+    range->start = __pa((unsigned long)crash_notes[nr].start);
>+    range->size = crash_notes[nr].size;
Comment and implementation don''t match - __pa(NULL) is not zero
(and the cast to ''unsigned long'' is pointless afaict).

Jan

Andrew Cooper

2011-Dec-02 16:10 UTC

head link

Re: KEXEC: allocate crash note buffers at boot time v

On 02/12/11 16:04, Jan Beulich wrote:> (Andrew, my previous reply dropped all Cc-s, but I''m re-sending
not only
> because of that - I also adjusted the first part of the response.)
>
>>>> On 02.12.11 at 16:19, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> I just had another look at the Dom0 side of things, and I fail to see why
> you think moving this out of per-CPU data is necessary: All Dom0 does
> with the provided info is set up the resource tree. The data
doesn''t get
> stored for any post-boot use. What am I overlooking?
(for the benifit of the list)

/sbin/kexec opens /proc/iomem and parses the strings looking for "Crash
Note" and pulls the ranges out that way.  (It also assumes that the
lowest crash note is pcpu0 and so on, which is also something I will
work around in later notes changes).
>> +    /* In the case of still not having enough memory to allocate
buffer room,
>> +     * returning a range of 0,0 is still valid. */
>> +    range->start = __pa((unsigned long)crash_notes[nr].start);
>> +    range->size = crash_notes[nr].size;
> Comment and implementation don''t match - __pa(NULL) is not zero
> (and the cast to ''unsigned long'' is pointless afaict).
> Jan
Very silly mistake on my behalf.  v6 is attached with that fixed.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2011 - [RFC] KEXEC: allocate crash note buffers at boot time

Re: [RFC] KEXEC: allocate crash note buffers at boot time

[RFC] KEXEC: allocate crash note buffers at boot time

Re: [RFC] KEXEC: allocate crash note buffers at boot time

Re: [RFC] KEXEC: allocate crash note buffers at boot time

Re: [RFC] KEXEC: allocate crash note buffers at boot time

[RFC] KEXEC: allocate crash note buffers at boot time v2

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Re: [RFC] KEXEC: allocate crash note buffers at boot time v2

Re: [RFC] KEXEC: allocate crash note buffers at boot time v2

Re: [RFC] KEXEC: allocate crash note buffers at boot time v2

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Re: [RFC] KEXEC: allocate crash note buffers at boot time v3

Re: [RFC] KEXEC: allocate crash note buffers at boot time v4

Re: [RFC] KEXEC: allocate crash note buffers at boot time v4

Re: [RFC] KEXEC: allocate crash note buffers at boot time v4

KEXEC: allocate crash note buffers at boot time v5

Re: KEXEC: allocate crash note buffers at boot time v5

Re: KEXEC: allocate crash note buffers at boot time v