Andrew Cooper
2011-Nov-25 15:12 UTC
[RFC] KEXEC: Clean up logic to choose a range for the crash kernel
-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Nov-25 15:36 UTC
Re: [RFC] KEXEC: Clean up logic to choose a range for the crash kernel
>>> On 25.11.11 at 16:12, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >Before this patch, using "classic" crashdump= syntax (<size>@<start>) >causes an exact size and location to be reserved in the e820 map >before Xen and modules are relocated. However, using "new" syntax >(<start>-[<end>]:<size>) encounteres a myriad of problems, most >notibly not considering the range when allocating its region.I''m sure I tested this (with some basic values), so the "myriad of problems" make me curious as to what they are. What does "range" refer to here, and what is "its region"? You also seem to overlook that even the old syntax allowed for only specifying a size.>This patch tries to unify both codepaths, and fix the broken logic >when using "newer" syntax. > >Changes addressed: > >1) Remove use of kexec_crash_area from parse_crashkernel(). > Use ranges[0] in preference, as there is no support for multiple > ranges using classic syntax. This prevents some wierd logic in > __setup_xen() depending on whether kexec_crash_area.size is set > or not.What weird about that?>2) Change set_kexec_crash_area_size() to choose_kexec_range() > The new method of choosing a range provides an entire rather thanAnd entire what?> just setting kexec_crash_area.size. Also, fix the very broken > logic which will only consider a range if its .end is greater > than system_ram.Where''s that happening? Are you perhaps mis-interpreting the purpose of these ranges?> Additionally, prefer range with the largest > size if we have a choice of valid ones, and prefer ranges with > more flexibility in their limits. > >3) Move kexec_reserve_area() from setup.c to kexec.c > It makes more sense here, Also, change its functionality a little > to always work from the kexec range provided by choose_kexec_range() > >4) Remove the kexec reservation code when considering modules > This code only had any effect for people using the "newer" syntax > on the command line, as people using the "classic" syntax would > reserve a memory region before considering modules.Are you sure? How do you prevent the kexec area from overlapping any of the modules (in particular the initrd, which can get passed in place to Dom0)? Jan
Andrew Cooper
2011-Nov-25 16:03 UTC
Re: [RFC] KEXEC: Clean up logic to choose a range for the crash kernel
On 25/11/11 15:36, Jan Beulich wrote:>>>> On 25.11.11 at 16:12, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> Before this patch, using "classic" crashdump= syntax (<size>@<start>) >> causes an exact size and location to be reserved in the e820 map >> before Xen and modules are relocated. However, using "new" syntax >> (<start>-[<end>]:<size>) encounteres a myriad of problems, most >> notibly not considering the range when allocating its region. > I''m sure I tested this (with some basic values), so the "myriad of > problems" make me curious as to what they are.The comment was a bit old from before I twigged what some of the code was doing. The two key problems were only considering the size, and a bug in set_kexec_crash_area_size() which will only consider a range as being valid if its end is greater than system_ram. This bug prevents specifying a range as "128M anywhere in 1G-2G" actually working on a box with more than 2G of ram.> What does "range" refer to here, and what is "its region"?I will need to clean up my use of the term range and region, although it doesn''t help that there is ambiguity in the source code as well. Loosely, I have been using "range" to mean a possible range as given on the command line, and "region" to be the actual area in memory where the kdump kernel is going to be put.> You also seem to overlook that even the old syntax allowed for only > specifying a size.No. In the case of using the old syntax, both kexec_crash_area.{start,size} are set, which caused the earlier call to kexec_reserve_region() in __setup_xen() to attempt to reserve an exact region at an exact address. Using the newer syntax, kexec_crash_area.size did not get set in the parsing function, and got considered in set_kexec_crash_area_size(), which only sets the size. The subsiquent call to kexec_reverse_area() on the next line fails silently because kexec_crash_area.start is 0. Later, when considering the modules, kexec_crash_area.start gets set to e-size, at which point the call to kexec_reserve_area() actually does attempt to reserve a region, based on the limits of an arbitrary e820 entry, not on the limits provided on the command line.>> This patch tries to unify both codepaths, and fix the broken logic >> when using "newer" syntax. >> >> Changes addressed: >> >> 1) Remove use of kexec_crash_area from parse_crashkernel(). >> Use ranges[0] in preference, as there is no support for multiple >> ranges using classic syntax. This prevents some wierd logic in >> __setup_xen() depending on whether kexec_crash_area.size is set >> or not. > What weird about that?As described above.>> 2) Change set_kexec_crash_area_size() to choose_kexec_range() >> The new method of choosing a range provides an entire rather than > And entire what?Oops. An entire "kexec_range" entry giving a start, end and size value, rather than the current case which takes either an exact size/start pair from the ''classic'' case, or a size anywhere in memory for the ''newer'' case.>> just setting kexec_crash_area.size. Also, fix the very broken >> logic which will only consider a range if its .end is greater >> than system_ram. > Where''s that happening? Are you perhaps mis-interpreting the > purpose of these ranges?I might possibly be misinterpreting these ranges as the only documentation is the code, but I believe I have got it correct based on the upstream Linux code. What are your interpretations of the ranges?>> Additionally, prefer range with the largest >> size if we have a choice of valid ones, and prefer ranges with >> more flexibility in their limits. >> >> 3) Move kexec_reserve_area() from setup.c to kexec.c >> It makes more sense here, Also, change its functionality a little >> to always work from the kexec range provided by choose_kexec_range() >> >> 4) Remove the kexec reservation code when considering modules >> This code only had any effect for people using the "newer" syntax >> on the command line, as people using the "classic" syntax would >> reserve a memory region before considering modules. > Are you sure? How do you prevent the kexec area from overlapping > any of the modules (in particular the initrd, which can get passed in > place to Dom0)?Good point, which is why this is an RFC. Note that the "classic" syntax, which everyone refers to in documents on the subject, will never consider any of the modules. I am not really sure which is best, but I would consider positioning the kdump kernel more important than the location of initrd. The user is likely to know more about why their kdump kernel needs to be located where specified than Xen does. Perhaps after deciding where the kdump kernel should go, we should move any overlapping modules?> Jan >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Andrew Cooper
2011-Nov-25 16:09 UTC
Re: [RFC] KEXEC: Clean up logic to choose a range for the crash kernel
On 25/11/11 16:03, Andrew Cooper wrote:> On 25/11/11 15:36, Jan Beulich wrote: >>>>> On 25.11.11 at 16:12, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> Before this patch, using "classic" crashdump= syntax (<size>@<start>) >>> causes an exact size and location to be reserved in the e820 map >>> before Xen and modules are relocated. However, using "new" syntax >>> (<start>-[<end>]:<size>) encounteres a myriad of problems, most >>> notibly not considering the range when allocating its region. >> I''m sure I tested this (with some basic values), so the "myriad of >> problems" make me curious as to what they are. > The comment was a bit old from before I twigged what some of the code > was doing. The two key problems were only considering the size, and a > bug in set_kexec_crash_area_size() which will only consider a range as > being valid if its end is greater than system_ram. This bug prevents > specifying a range as "128M anywhere in 1G-2G" actually working on a box > with more than 2G of ram. > >> What does "range" refer to here, and what is "its region"? > I will need to clean up my use of the term range and region, although it > doesn''t help that there is ambiguity in the source code as well. > Loosely, I have been using "range" to mean a possible range as given on > the command line, and "region" to be the actual area in memory where the > kdump kernel is going to be put. > >> You also seem to overlook that even the old syntax allowed for only >> specifying a size. > No. In the case of using the old syntax, both > kexec_crash_area.{start,size} are set, which caused the earlier call to > kexec_reserve_region() in __setup_xen() to attempt to reserve an exact > region at an exact address. Using the newer syntax, > kexec_crash_area.size did not get set in the parsing function, and got > considered in set_kexec_crash_area_size(), which only sets the size. > The subsiquent call to kexec_reverse_area() on the next line fails > silently because kexec_crash_area.start is 0. Later, when considering > the modules, kexec_crash_area.start gets set to e-size, at which point > the call to kexec_reserve_area() actually does attempt to reserve a > region, based on the limits of an arbitrary e820 entry, not on the > limits provided on the command line. > >>> This patch tries to unify both codepaths, and fix the broken logic >>> when using "newer" syntax. >>> >>> Changes addressed: >>> >>> 1) Remove use of kexec_crash_area from parse_crashkernel(). >>> Use ranges[0] in preference, as there is no support for multiple >>> ranges using classic syntax. This prevents some wierd logic in >>> __setup_xen() depending on whether kexec_crash_area.size is set >>> or not. >> What weird about that? > As described above. > >>> 2) Change set_kexec_crash_area_size() to choose_kexec_range() >>> The new method of choosing a range provides an entire rather than >> And entire what? > Oops. An entire "kexec_range" entry giving a start, end and size value, > rather than the current case which takes either an exact size/start pair > from the ''classic'' case, or a size anywhere in memory for the ''newer'' case. > >>> just setting kexec_crash_area.size. Also, fix the very broken >>> logic which will only consider a range if its .end is greater >>> than system_ram. >> Where''s that happening? Are you perhaps mis-interpreting the >> purpose of these ranges? > I might possibly be misinterpreting these ranges as the only > documentation is the code, but I believe I have got it correct based on > the upstream Linux code. What are your interpretations of the ranges? > >>> Additionally, prefer range with the largest >>> size if we have a choice of valid ones, and prefer ranges with >>> more flexibility in their limits. >>> >>> 3) Move kexec_reserve_area() from setup.c to kexec.c >>> It makes more sense here, Also, change its functionality a little >>> to always work from the kexec range provided by choose_kexec_range() >>> >>> 4) Remove the kexec reservation code when considering modules >>> This code only had any effect for people using the "newer" syntax >>> on the command line, as people using the "classic" syntax would >>> reserve a memory region before considering modules. >> Are you sure? How do you prevent the kexec area from overlapping >> any of the modules (in particular the initrd, which can get passed in >> place to Dom0)? > Good point, which is why this is an RFC. Note that the "classic" > syntax, which everyone refers to in documents on the subject, will never > consider any of the modules. I am not really sure which is best, but I > would consider positioning the kdump kernel more important than the > location of initrd. The user is likely to know more about why their > kdump kernel needs to be located where specified than Xen does. Perhaps > after deciding where the kdump kernel should go, we should move any > overlapping modules?On second thoughts, how about this? If a user provides an exact position and size for the kdump kernel, place the kdump kernel there and move any overlapping modules. If the user provides a range, attempt to fit the kdump kernel around other modules, and in the case where we cant, move the offending module(s) elsewhere.>> Jan >>-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Jan Beulich
2011-Nov-25 16:49 UTC
Re: [RFC] KEXEC: Clean up logic to choose a range for the crash kernel
>>> On 25.11.11 at 17:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 25/11/11 15:36, Jan Beulich wrote: >>>>> On 25.11.11 at 16:12, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> Where''s that happening? Are you perhaps mis-interpreting the >> purpose of these ranges? > > I might possibly be misinterpreting these ranges as the only > documentation is the code, but I believe I have got it correct based on > the upstream Linux code. What are your interpretations of the ranges?The ranges are to say "if the system has between <start> and <end> amount of memory, then allocate a <size> block (possibly located at <offset>).>>> 4) Remove the kexec reservation code when considering modules >>> This code only had any effect for people using the "newer" syntax >>> on the command line, as people using the "classic" syntax would >>> reserve a memory region before considering modules. >> Are you sure? How do you prevent the kexec area from overlapping >> any of the modules (in particular the initrd, which can get passed in >> place to Dom0)? > > Good point, which is why this is an RFC. Note that the "classic" > syntax, which everyone refers to in documents on the subject, will never > consider any of the modules.It won''t in either case when addresses are specified. It will in both cases when only sizes are provided.> I am not really sure which is best, but I > would consider positioning the kdump kernel more important than the > location of initrd. The user is likely to know more about why their > kdump kernel needs to be located where specified than Xen does. Perhaps > after deciding where the kdump kernel should go, we should move any > overlapping modules?And that''s already happening (or is at least supposed to be). Jan
Andrew Cooper
2011-Nov-25 17:03 UTC
Re: [RFC] KEXEC: Clean up logic to choose a range for the crash kernel
On 25/11/11 16:49, Jan Beulich wrote:>>>> On 25.11.11 at 17:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 25/11/11 15:36, Jan Beulich wrote: >>>>>> On 25.11.11 at 16:12, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> Where''s that happening? Are you perhaps mis-interpreting the >>> purpose of these ranges? >> I might possibly be misinterpreting these ranges as the only >> documentation is the code, but I believe I have got it correct based on >> the upstream Linux code. What are your interpretations of the ranges? > The ranges are to say "if the system has between <start> and <end> > amount of memory, then allocate a <size> block (possibly located at > <offset>).Ah ok. That actually makes more logical sense. Is there some documentation as to exactly what is permitted to specify? I have not been able to find any.>>>> 4) Remove the kexec reservation code when considering modules >>>> This code only had any effect for people using the "newer" syntax >>>> on the command line, as people using the "classic" syntax would >>>> reserve a memory region before considering modules. >>> Are you sure? How do you prevent the kexec area from overlapping >>> any of the modules (in particular the initrd, which can get passed in >>> place to Dom0)? >> Good point, which is why this is an RFC. Note that the "classic" >> syntax, which everyone refers to in documents on the subject, will never >> consider any of the modules. > It won''t in either case when addresses are specified. It will in > both cases when only sizes are provided.How can you specify an offset while using the range syntax? This leaves the classic syntax as the only way to specific an exact location in memory.>> I am not really sure which is best, but I >> would consider positioning the kdump kernel more important than the >> location of initrd. The user is likely to know more about why their >> kdump kernel needs to be located where specified than Xen does. Perhaps >> after deciding where the kdump kernel should go, we should move any >> overlapping modules? > And that''s already happening (or is at least supposed to be). > > Jan >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Andrew Cooper
2011-Nov-25 17:49 UTC
Re: [RFC] KEXEC: Clean up logic to choose a range for the crash kernel
On 25/11/11 17:03, Andrew Cooper wrote:> > On 25/11/11 16:49, Jan Beulich wrote: >>>>> On 25.11.11 at 17:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> On 25/11/11 15:36, Jan Beulich wrote: >>>>>>> On 25.11.11 at 16:12, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>>> Where''s that happening? Are you perhaps mis-interpreting the >>>> purpose of these ranges? >>> I might possibly be misinterpreting these ranges as the only >>> documentation is the code, but I believe I have got it correct based on >>> the upstream Linux code. What are your interpretations of the ranges? >> The ranges are to say "if the system has between <start> and <end> >> amount of memory, then allocate a <size> block (possibly located at >> <offset>). > Ah ok. That actually makes more logical sense. Is there some > documentation as to exactly what is permitted to specify? I have not > been able to find any. > >>>>> 4) Remove the kexec reservation code when considering modules >>>>> This code only had any effect for people using the "newer" syntax >>>>> on the command line, as people using the "classic" syntax would >>>>> reserve a memory region before considering modules. >>>> Are you sure? How do you prevent the kexec area from overlapping >>>> any of the modules (in particular the initrd, which can get passed in >>>> place to Dom0)? >>> Good point, which is why this is an RFC. Note that the "classic" >>> syntax, which everyone refers to in documents on the subject, will never >>> consider any of the modules. >> It won''t in either case when addresses are specified. It will in >> both cases when only sizes are provided. > How can you specify an offset while using the range syntax? This leaves > the classic syntax as the only way to specific an exact location in memory. > >>> I am not really sure which is best, but I >>> would consider positioning the kdump kernel more important than the >>> location of initrd. The user is likely to know more about why their >>> kdump kernel needs to be located where specified than Xen does. Perhaps >>> after deciding where the kdump kernel should go, we should move any >>> overlapping modules? >> And that''s already happening (or is at least supposed to be). >> >> JanAfter comparing the Xen parsing code with the Linux parsing code, with the correct idea of what we are trying to do, Xen still lacks the ability to specify an exact offset using the newer style. I will implement this. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Jan Beulich
2011-Nov-28 07:45 UTC
Re: [RFC] KEXEC: Clean up logic to choose a range for the crash kernel
>>> On 25.11.11 at 18:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > After comparing the Xen parsing code with the Linux parsing code, with > the correct idea of what we are trying to do, Xen still lacks the > ability to specify an exact offset using the newer style. I will > implement this.What am I overlooking? set_kexec_crash_area_size() takes the new syntax into consideration, and kexec_reserve_area() gets called only afterwards. Jan