Keir Fraser
2011-Nov-29 11:19 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time
On 29/11/2011 18:56, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:> Hello, > > As I have little to no knowledge of this stage of the boot process, is > this a sensible way to be setting up the per_cpu areas? I have a > sneaking suspicion that it will fall over if a CPU is onlined after > boot, and may also fall over if a CPU is offlined and reonlined later. > There appears to be no infrastructure currently in place for this type > of initialization, which is quite possibly why the code exists in its > current form.No it''s bad. For starters you should use for_each_online_cpu, not do an open-coded for-loop. Secondly you should register a cpu hotplug notifier (register_cpu_notifier()) to pick up and handle future CPU_UP_PREPARE/CPU_UP_CANCELED/CPU_DEAD events. This can be hung off an __initcall. See common/stop_machine.c for example, or common/timer.c, which doesn''t even require a for_each_online_cpu loop because its init code gets run before we bring up secondary CPUs. -- Keir> Thanks,
Hello, As I have little to no knowledge of this stage of the boot process, is this a sensible way to be setting up the per_cpu areas? I have a sneaking suspicion that it will fall over if a CPU is onlined after boot, and may also fall over if a CPU is offlined and reonlined later. There appears to be no infrastructure currently in place for this type of initialization, which is quite possibly why the code exists in its current form. Thanks, -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Nov-30 09:20 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time
>>> On 29.11.11 at 19:56, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > As I have little to no knowledge of this stage of the boot process, is > this a sensible way to be setting up the per_cpu areas? I have a > sneaking suspicion that it will fall over if a CPU is onlined after > boot, and may also fall over if a CPU is offlined and reonlined later. > There appears to be no infrastructure currently in place for this type > of initialization, which is quite possibly why the code exists in its > current form.I actually wonder how you came to those 4 statements you make in the description - none of these seem to me like they are really an issue (this would instead be plain bugs in Dom0). Did you actually look at the existing Dom0 implementation(s)? Further, while not being a huge waste of memory, it still is one in case kexec gets never enabled, especially when considering a Dom0 kernel that''s being built without CONFIG_KEXEC (or an incapable on, like any pv-ops kernel to date). So I also conceptually question that change. Jan
Andrew Cooper
2011-Nov-30 13:14 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time
On 29/11/11 11:19, Keir Fraser wrote:> On 29/11/2011 18:56, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote: > >> Hello, >> >> As I have little to no knowledge of this stage of the boot process, is >> this a sensible way to be setting up the per_cpu areas? I have a >> sneaking suspicion that it will fall over if a CPU is onlined after >> boot, and may also fall over if a CPU is offlined and reonlined later. >> There appears to be no infrastructure currently in place for this type >> of initialization, which is quite possibly why the code exists in its >> current form. > No it''s bad. For starters you should use for_each_online_cpu, not do an > open-coded for-loop. Secondly you should register a cpu hotplug notifier > (register_cpu_notifier()) to pick up and handle future > CPU_UP_PREPARE/CPU_UP_CANCELED/CPU_DEAD events. This can be hung off an > __initcall. See common/stop_machine.c for example, or common/timer.c, which > doesn''t even require a for_each_online_cpu loop because its init code gets > run before we bring up secondary CPUs. > > -- Keir >Thanks - this is exactly what I was looking for. Sadly, after thinking about cpu hotplug safety, it is not safe to store the crash note pointer in the per cpu data. Once you have reported an area to dom0 via KEXEC_CMD_get_reserve, the areas reported cant possibly move. Therefore, I need to redesign a little bit. Thanks, ~Andrew>> Thanks, >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Andrew Cooper
2011-Nov-30 14:01 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time
On 30/11/11 09:20, Jan Beulich wrote:>>>> On 29.11.11 at 19:56, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> As I have little to no knowledge of this stage of the boot process, is >> this a sensible way to be setting up the per_cpu areas? I have a >> sneaking suspicion that it will fall over if a CPU is onlined after >> boot, and may also fall over if a CPU is offlined and reonlined later. >> There appears to be no infrastructure currently in place for this type >> of initialization, which is quite possibly why the code exists in its >> current form. > I actually wonder how you came to those 4 statements you make in > the description - none of these seem to me like they are really an > issue (this would instead be plain bugs in Dom0). Did you actually look > at the existing Dom0 implementation(s)? > > Further, while not being a huge waste of memory, it still is one in case > kexec gets never enabled, especially when considering a Dom0 kernel > that''s being built without CONFIG_KEXEC (or an incapable on, like any > pv-ops kernel to date). So I also conceptually question that change. > > JanWe (XenServer) have had many cases of the kexec path failing on customer boxes under weird and seemingly inexplicable circumstances. This is why I am working on trying to bullet-proofing the entire path. We have 1 ticket where the contents of a crash note is clearly bogus (a PRSTATUS is not 2GB long). We have a ticket where it appears that the kdump kernel has failed to reassemble /proc/vmcore from elfcorehdr as a few pcpus worth of crash notes are missing. I seem to remember a ticket from before my time with a crash while writing the crash notes in Xen itself. We even have a ticket stating that you get different crash notes depending on whether you crash using the Xen debug keys (crash notes appear completely bogus) or /proc/sysrq-trigger in dom0 (seems to be fine). All of these are uncertain on reproducibility (except the final one which was shown to reproduce on Xen-3.x and not on Xen-4.x so was not investigated further) and have a habit of being unreproducible on any of our local hardware, which makes fixing the problems tricky. So yes - the 4 points I have made are certainly not regular or common behavior, but given some of the tickets we have, I am fairly sure it is not a bug-free path. I have checked the 2.6.32 implementation of dom0''s side of this and agree that it looks ok. However, it is my opinion that relying on a certain hypercalling pattern from dom0 is a perilous route for Xen, whether it is likely for that pattern to change in the future or not. Having said all of this, I agree with your second paragraph. As already noted in my other email in this thread, I need to change the implementation of this, so I will key the initial allocation of memory on whether crashkernel= has been passed. This should be sufficient indication as to whether the user minds having the space allocated or not. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Andrew Cooper
2011-Nov-30 17:24 UTC
[RFC] KEXEC: allocate crash note buffers at boot time v2
Hello, Presented is version 2 of this patch, which uses cpu hotplug notifications to be rather safer when allocating buffers. It occurs to me that there is a bit of an API problem with it comes to combining a crashdump kernel with potential hotplug events. If dom0 does not get notification of new/removed pcpus, and if it doesn''t re-evaluate /proc/iomem by recalling things like KEXEC_CMD_get_range, then subsequent calls to /sbin/kexec -p will bundle up stale information into the kdump magic bundle. Even if dom0 does get a notification of pcpu hotplug events, and it updates its iomem map, /sbin/kexec would still need to be called to bundle the new information into the kdump magic bundle. Possibly doable off a udev CPU hotplug event. Even if /sbin/kexec gets called after hotplug events, a crash before the new kexec magic bundle has been loaded will still result in the old bundle being used, with its stale information regarding the position and number of crash notes. Sadly, I don''t see any easy or sensible solution to these problems. However, it is probably worth knowing as a potential limitation. I have worked around this problem by never deallocating crash notes, so even if information is stale as to the number of crash notes to be expected, the stale physical addresses still point to allocated notes buffers which have been partially initialized to have sensible note headers in. This unfortunately means that an offlined cpu which was present at boot time will have a notes section with zero''d data. It also means that a cpu onlined after boot which crashes will not have its register state presented to the kdump environment. I would like to hope that both of these cases are unlikely, but again, it is still a potential limitation. The cpu crash notes themselves (PRSTATUS and XEN_ELFNOTE_CRASH_REGS) don''t contain a reference to which pcpu they represent. This is inferred by /sbin/kexec from the order in which they appear in dom0''s /proc/iomem, meaning that the kdump environments idea of which pcpu is which might differ from Xen''s. This depending on whether Xen allocates the notes buffer in ascending order, whether dom0 sorts memory addresses reported, and, as it currently gets it correct, whether either of these behaviors change in the future. This final issue is within my ability to fix and I will do so in the near future, along with some other additions I already need to make to the per cpu crash notes. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Dec-01 05:20 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
On 01/12/2011 12:56, "Jan Beulich" <JBeulich@suse.com> wrote:>> + register_keyhandler(''C'', &crashdump_trigger_keyhandler); >> + >> + /* If no crash area, no need to allocate space for notes. */ >> + if ( 0 == kexec_crash_area.size ) >> + return 0; > > Wouldn''t it make sense to switch the order of these?I also really hate this constant-first ordering. -- Keir
Jan Beulich
2011-Dec-01 09:08 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v2
>>> On 30.11.11 at 18:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes); >+static void * crash_notes[NR_CPUS];If at all possible, can you please allocate this dynamically using nr_cpu_ids for the size?>+static int kexec_init_cpu_notes(const int cpu)If the function''s argument was made ''unsigned int'' (as it already is in its caller), then ...>+ BUG_ON( cpu < 0 || cpu >= NR_CPUS );... only the second check would be needed. Furthermore, it''s pointless to use signed types for CPU numbers (and it''s inefficient on various architectures), despite there being numerous (bad) examples throughout the tree.>+ if ( 0 == cpu ) >+ nr_bytes = nr_bytes + >+ sizeof_note("Xen", sizeof(crash_xen_info_t));Minor nit: Why not use += here (presumably allowing the statement to fit on one line)?>+ if ( ! note ) >+ /* Ideally, this would be -ENOMEM. However, there are more problems >+ * assocated with trying to deal with -ENOMEM sensibly than just >+ * pretending that the crash note area doesn''t exist. */ >+ return 0;The only current caller ignores the return value, so why the bogus return value?>+ if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes[nr] ) >... >- if ( per_cpu(crash_notes, nr) == NULL ) >- { >... >- }I would suggest to retry allocation here. Even if this isn''t suitable for a 32-bit kdump kernel on large systems, there#s no reason to penalize fully 64-bit environment. And here it would also become meaningful that kexec_init_cpu_notes() actually returns a meaningful error upon failure. Also, I can''t see the reason for the patch to move around do_crashdump_trigger() and crashdump_trigger_keyhandler. Jan
Andrew Cooper
2011-Dec-01 09:49 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v2
On 01/12/11 09:08, Jan Beulich wrote:>>>> On 30.11.11 at 18:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> -static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes); >> +static void * crash_notes[NR_CPUS]; > If at all possible, can you please allocate this dynamically using > nr_cpu_ids for the size?Ok>> +static int kexec_init_cpu_notes(const int cpu) > If the function''s argument was made ''unsigned int'' (as it already is in > its caller), then ... > >> + BUG_ON( cpu < 0 || cpu >= NR_CPUS ); > ... only the second check would be needed. Furthermore, it''s > pointless to use signed types for CPU numbers (and it''s > inefficient on various architectures), despite there being numerous > (bad) examples throughout the tree.Good point. I will clean this up>> + if ( 0 == cpu ) >> + nr_bytes = nr_bytes + >> + sizeof_note("Xen", sizeof(crash_xen_info_t)); > Minor nit: Why not use += here (presumably allowing the statement to > fit on one line)?Because the next few patches in my series adds a new crash notes at this point. I felt it was neater to leave in this form for a reduced diff next patch, and gcc will optimize it to +>> + if ( ! note ) >> + /* Ideally, this would be -ENOMEM. However, there are more problems >> + * assocated with trying to deal with -ENOMEM sensibly than just >> + * pretending that the crash note area doesn''t exist. */ >> + return 0; > The only current caller ignores the return value, so why the bogus > return value?Originally it passed the return value back up to the cpu hotplug notifier, but I decided that causing an -ENOMEM at this point was a little silly given that: 1) a lack of mem on boot will quickly kill the host in other ways, and 2) while there is no way useful way to ensure that the crashdump kernel gets reloaded with new notes pointers, blocking on not being able to allocate notes is silly. Would you consider this enough of a problem to actually fail the CPU_PREPARE_UP ? On further thought from my musings yesterday, it would be fantastic if we were able to point the kdump kernel at a PT_NOTE header without discerned length, at which point Xen can rewrite its crash notes without having to get /sbin/kexec to repackage its magic elf blog pointing to new/different memory locations. However, as this would involve changes to kexec-tools, Linux (and Xen) in a not trivially backward compatible fashion, the chances of it happening are slim.>> + if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes[nr] ) >> ... >> - if ( per_cpu(crash_notes, nr) == NULL ) >> - { >> ... >> - } > I would suggest to retry allocation here. Even if this isn''t suitable for > a 32-bit kdump kernel on large systems, there#s no reason to penalize > fully 64-bit environment.At this point, none of the allocation makes it suitable for a 32bit system. For that, I need to start investigating something akin to xalloc_below(), which is probably going to be todays task. If we have previously failed to allocate the notes buffer (and are somehow still running), what if anything makes it likely that we will successfully allocate memory this time?> And here it would also become meaningful that kexec_init_cpu_notes() > actually returns a meaningful error upon failure. > Also, I can''t see the reason for the patch to move around > do_crashdump_trigger() and crashdump_trigger_keyhandler.This is probably collateral damage from having to reorder sizeof_note() and setup_note() in preference to making a forward declaration of them. I will see about fixing.> Jan >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Jan Beulich
2011-Dec-01 10:01 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v2
>>> On 01.12.11 at 10:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 01/12/11 09:08, Jan Beulich wrote: >>>>> On 30.11.11 at 18:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> + if ( ! note ) >>> + /* Ideally, this would be -ENOMEM. However, there are more problems >>> + * assocated with trying to deal with -ENOMEM sensibly than just >>> + * pretending that the crash note area doesn''t exist. */ >>> + return 0; >> The only current caller ignores the return value, so why the bogus >> return value? > > Originally it passed the return value back up to the cpu hotplug > notifier, but I decided that causing an -ENOMEM at this point was a > little silly given that: > 1) a lack of mem on boot will quickly kill the host in other ways, and > 2) while there is no way useful way to ensure that the crashdump kernel > gets reloaded with new notes pointers, blocking on not being able to > allocate notes is silly. > > Would you consider this enough of a problem to actually fail the > CPU_PREPARE_UP ?No, absolutely not. Ignoring the return value there is fine.>>> + if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes[nr] ) >>> ... >>> - if ( per_cpu(crash_notes, nr) == NULL ) >>> - { >>> ... >>> - } >> I would suggest to retry allocation here. Even if this isn''t suitable for >> a 32-bit kdump kernel on large systems, there#s no reason to penalize >> fully 64-bit environment. > > At this point, none of the allocation makes it suitable for a 32bit > system. For that, I need to start investigating something akin to > xalloc_below(), which is probably going to be todays task. If we have > previously failed to allocate the notes buffer (and are somehow still > running), what if anything makes it likely that we will successfully > allocate memory this time?Because memory got freed meanwhile? I''m particularly having post- boot onlining of CPUs in mind; of course, if allocation failed during boot we have bigger problems than this one. Jan
Andrew Cooper
2011-Dec-01 12:29 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
Here is v3 of this patch. After some consideration, I am not so certain the spinlock is strictly necessary. However, as it is not a common codepath, I figure that it is better to be safe than sorry. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Dec-01 12:56 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
>>> On 01.12.11 at 13:29, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >+static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;Please use DEFINE_SPINLOCK() here.>+ register_keyhandler(''C'', &crashdump_trigger_keyhandler); >+ >+ /* If no crash area, no need to allocate space for notes. */ >+ if ( 0 == kexec_crash_area.size ) >+ return 0;Wouldn''t it make sense to switch the order of these?>+ crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));Please use xmalloc_array() here.>+ if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )The first check is pointless - the function will return zero if the allocation was already done. Further, you shouldn''t take a lock around a call to xmalloc() or alike unless absolutely necessary. It is pretty simple to avoid here - you really only need to lock around the storing of the pointer and maybe the setup_note() calls (but be careful with returning -ENOMEM - you shouldn''t if the allocation fails, but you then find - under the lock - that a pointer was already set by another CPU). Finally, one thing I failed to notice on the previous version - the nr_bytes calculations are now being done twice. This should probably be moved into a helper function, especially since you said you intend to add stuff here subsequently. Jan
Andrew Cooper
2011-Dec-01 13:59 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
On 01/12/11 12:56, Jan Beulich wrote:>>>> On 01.12.11 at 13:29, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED; > Please use DEFINE_SPINLOCK() here.Ok>> + register_keyhandler(''C'', &crashdump_trigger_keyhandler); >> + >> + /* If no crash area, no need to allocate space for notes. */ >> + if ( 0 == kexec_crash_area.size ) >> + return 0; > Wouldn''t it make sense to switch the order of these?Possibly. In the case where a crash kernel has not been loaded, it would degrade to a reboot, so it is still of some use if the there is no kexec area. Having said that, there is an explicit reboot handler, so making this one disappear is probably a good thing.>> + crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*)); > Please use xmalloc_array() here.Yes - it was dim of me to forget that.>> + if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) ) > The first check is pointless - the function will return zero if the > allocation was already done.Good point - I missed that.> Further, you shouldn''t take a lock around a call to xmalloc() or alike > unless absolutely necessary. It is pretty simple to avoid here - you > really only need to lock around the storing of the pointer and maybe > the setup_note() calls (but be careful with returning -ENOMEM - you > shouldn''t if the allocation fails, but you then find - under the lock - > that a pointer was already set by another CPU).So what we should do is this: xmalloc take lock check to see if the entry is been filled in the meantime. if so, free the malloc''d region release lock only return -ENOMEM if we fail the malloc and the crash_note is still NULL when we take the lock I think this ought to cover all possible cases ? (In reality I think the xmalloc itself should be covered by the fact we will fail the !cpu_online(nr) test before we consider trying to reallocate the buffer, but that doesn''t preclude future proofing the code)> Finally, one thing I failed to notice on the previous version - the > nr_bytes calculations are now being done twice. This should > probably be moved into a helper function, especially since you > said you intend to add stuff here subsequently.I had noticed this and was going to let it slide for now, considering what would be best to do about it. Playing with void pointers and calculating lengths with sizeof is always more dangerous than calculating a size, malloc''ing it and filling in a range start and size. Given that it is such a rare codepath, I am honestly not sure which is the better tradeof - an extra function call in 2 places or doubling the size of the crash_notes array by introducing a size as well as a start. Both seem very minor in the grand scheme of things.> Jan >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Andrew Cooper
2011-Dec-01 14:00 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
On 01/12/11 05:20, Keir Fraser wrote:> On 01/12/2011 12:56, "Jan Beulich" <JBeulich@suse.com> wrote: > >>> + register_keyhandler(''C'', &crashdump_trigger_keyhandler); >>> + >>> + /* If no crash area, no need to allocate space for notes. */ >>> + if ( 0 == kexec_crash_area.size ) >>> + return 0; >> Wouldn''t it make sense to switch the order of these? > I also really hate this constant-first ordering. > > -- Keir >It was a useful habit I got into when learning C/C++ and has stayed with me. I will endeavor to not do it. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Andrew Cooper
2011-Dec-01 15:02 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
On 01/12/11 12:56, Jan Beulich wrote:>>>> On 01.12.11 at 13:29, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED; > Please use DEFINE_SPINLOCK() here. > >> + register_keyhandler(''C'', &crashdump_trigger_keyhandler); >> + >> + /* If no crash area, no need to allocate space for notes. */ >> + if ( 0 == kexec_crash_area.size ) >> + return 0; > Wouldn''t it make sense to switch the order of these? > >> + crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*)); > Please use xmalloc_array() here. > >> + if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) ) > The first check is pointless - the function will return zero if the > allocation was already done.I forgot to say in my previous email. Short circuit evaluation should prevent the kexec_init_cpu_notes() function call actually being made.> Further, you shouldn''t take a lock around a call to xmalloc() or alike > unless absolutely necessary. It is pretty simple to avoid here - you > really only need to lock around the storing of the pointer and maybe > the setup_note() calls (but be careful with returning -ENOMEM - you > shouldn''t if the allocation fails, but you then find - under the lock - > that a pointer was already set by another CPU). > > Finally, one thing I failed to notice on the previous version - the > nr_bytes calculations are now being done twice. This should > probably be moved into a helper function, especially since you > said you intend to add stuff here subsequently. > > Jan >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Jan Beulich
2011-Dec-01 15:14 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
>>> On 01.12.11 at 14:59, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 01/12/11 12:56, Jan Beulich wrote: >> Further, you shouldn''t take a lock around a call to xmalloc() or alike >> unless absolutely necessary. It is pretty simple to avoid here - you >> really only need to lock around the storing of the pointer and maybe >> the setup_note() calls (but be careful with returning -ENOMEM - you >> shouldn''t if the allocation fails, but you then find - under the lock - >> that a pointer was already set by another CPU). > > So what we should do is this: > > xmalloc > take lock > check to see if the entry is been filled in the meantime. if so, free > the malloc''d regionDon''t call xfree() with the lock held either, if possible.> release lock > only return -ENOMEM if we fail the malloc and the crash_note is still > NULL when we take the lockJan
Jan Beulich
2011-Dec-01 15:15 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
>>> On 01.12.11 at 16:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 01/12/11 12:56, Jan Beulich wrote: >>>>> On 01.12.11 at 13:29, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED; >> Please use DEFINE_SPINLOCK() here. >> >>> + register_keyhandler(''C'', &crashdump_trigger_keyhandler); >>> + >>> + /* If no crash area, no need to allocate space for notes. */ >>> + if ( 0 == kexec_crash_area.size ) >>> + return 0; >> Wouldn''t it make sense to switch the order of these? >> >>> + crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*)); >> Please use xmalloc_array() here. >> >>> + if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) ) >> The first check is pointless - the function will return zero if the >> allocation was already done. > > I forgot to say in my previous email. Short circuit evaluation should > prevent the kexec_init_cpu_notes() function call actually being made.What would that buy you? Performance is not an issue on this code path. Jan
Andrew Cooper
2011-Dec-01 17:14 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v4
Version 4 attached. Fixed the logic regarding locking and x{malloc,free}''ing, added a few more comments in places, and changed the coding style to avoid constant-first compares. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Dec-02 08:02 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v4
>>> On 01.12.11 at 18:14, Andrew Cooper <andrew.cooper3@citrix.com> wrote:Looks good to me now except for one minor thing (below), and the fact that you still decided to retain the two duplicates of the size calculation (I''ll have to remember to clean this up if you don''t, unless Keir explicitly agrees with the duplication).>+ /* Try once again to allocate room for the crash notes. It is just possible >+ * that more space has become available since we last tried. If space has >+ * already been allocated, kexec_init_cpu_notes() will return early with 0. >+ */ >+ if ( kexec_init_cpu_notes(nr) ) > return -EINVAL;The function can fail only with -ENOMEM, so why not return this here? Jan
Andrew Cooper
2011-Dec-02 12:33 UTC
Re: [RFC] KEXEC: allocate crash note buffers at boot time v4
On 02/12/11 08:02, Jan Beulich wrote:>>>> On 01.12.11 at 18:14, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > Looks good to me now except for one minor thing (below), and the > fact that you still decided to retain the two duplicates of the size > calculation (I''ll have to remember to clean this up if you don''t, unless > Keir explicitly agrees with the duplication).Ok - I will turn them into a start,size pair>> + /* Try once again to allocate room for the crash notes. It is just possible >> + * that more space has become available since we last tried. If space has >> + * already been allocated, kexec_init_cpu_notes() will return early with 0. >> + */ >> + if ( kexec_init_cpu_notes(nr) ) >> return -EINVAL; > The function can fail only with -ENOMEM, so why not return this here? > > Jan >Actually, returning -EINVAL here is counter productive. -EINVAL is used by dom0 to work out when it has asked for each CPU. This in itself is a little broken because there is nothing stopping a middle CPU from being offline at the time these hypercalls are made. The other thing is that there is nothing stopping an offline cpu from having a valid notes section. Therefore, the test of online should be removed, so -EINVAL only gets returned for cpu out of range, or not set up notes at all in the first place. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Here is v5. I have tested quite carefully the impact of changing the error conditions in kexec_get_cpu(). dom0 is happy to accept ERANGE as well as EINVAL, and ERANGE is a more sensible response. Moreover, this new scheme (ignoring for a moment ENOMEM) allows for all nr_cpu_ids to give note ranges to dom0 whether they are up or not at the moment. It is better to have more notes provided with some of them potentially not filled in, than to miss a crashed CPU register state because it was offline when the crash kernel was loaded. All this testing has been done via backport to 4.1.2. Minimal changes were necessary, such as the lack of xzalloc_array() and NR_CPUS instead of nr_cpu_ids, but it was otherwise functionally identical. I am not sure whether to recomend this for backport into 4.1-testing or not. It does make the crashkernel more likely to get crash notes, although in reality, it was only special circumstances which caused problems. If you consider it worth backporting, I can provide my already-backported patch. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
(Andrew, my previous reply dropped all Cc-s, but I''m re-sending not only because of that - I also adjusted the first part of the response.)>>> On 02.12.11 at 16:19, Andrew Cooper <andrew.cooper3@citrix.com> wrote:I just had another look at the Dom0 side of things, and I fail to see why you think moving this out of per-CPU data is necessary: All Dom0 does with the provided info is set up the resource tree. The data doesn''t get stored for any post-boot use. What am I overlooking?>+ /* In the case of still not having enough memory to allocate buffer room, >+ * returning a range of 0,0 is still valid. */ >+ range->start = __pa((unsigned long)crash_notes[nr].start); >+ range->size = crash_notes[nr].size;Comment and implementation don''t match - __pa(NULL) is not zero (and the cast to ''unsigned long'' is pointless afaict). Jan
On 02/12/11 16:04, Jan Beulich wrote:> (Andrew, my previous reply dropped all Cc-s, but I''m re-sending not only > because of that - I also adjusted the first part of the response.) > >>>> On 02.12.11 at 16:19, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > I just had another look at the Dom0 side of things, and I fail to see why > you think moving this out of per-CPU data is necessary: All Dom0 does > with the provided info is set up the resource tree. The data doesn''t get > stored for any post-boot use. What am I overlooking?(for the benifit of the list) /sbin/kexec opens /proc/iomem and parses the strings looking for "Crash Note" and pulls the ranges out that way. (It also assumes that the lowest crash note is pcpu0 and so on, which is also something I will work around in later notes changes).>> + /* In the case of still not having enough memory to allocate buffer room, >> + * returning a range of 0,0 is still valid. */ >> + range->start = __pa((unsigned long)crash_notes[nr].start); >> + range->size = crash_notes[nr].size; > Comment and implementation don''t match - __pa(NULL) is not zero > (and the cast to ''unsigned long'' is pointless afaict). > JanVery silly mistake on my behalf. v6 is attached with that fixed. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel