Arnd Bergmann
2021-Nov-19 15:57 UTC
[PATCH 00/17] Add memberof(), split some headers, and slightly simplify code
On Fri, Nov 19, 2021 at 4:06 PM Alejandro Colomar (man-pages) <alx.manpages at gmail.com> wrote:> On 11/19/21 15:47, Arnd Bergmann wrote: > > On Fri, Nov 19, 2021 at 12:36 PM Alejandro Colomar > > Yes, I would like to untangle the dependencies. > > The main reason I started doing this splitting > is because I wouldn't be able to include > <linux/stddef.h> in some headers, > because it pulled too much stuff that broke unrelated things. > > So that's why I started from there. > > I for example would like to get NULL in memberof() > without puling anything else, > so <linux/NULL.h> makes sense for that. > > It's clear that every .c wants NULL, > but it's not so clear that every .c wants > everything that <linux/stddef.h> pulls indirectly.>From what I can tell, linux/stddef.h is tiny, I don't think it's reallyworth optimizing this part. I have spent some time last year trying to untangle some of the more interesting headers, but ended up not completing this as there are some really hard problems once you start getting to the interesting bits. The approach I tried was roughly: - For each header in the kernel, create a preprocessed version that includes all the indirect includes, from that start a set of lookup tables that record which header is eventually included by which ones, and the size of each preprocessed header in bytes - For a given kernel configuration (e.g. defconfig or allmodconfig) that I'm most interested in, look at which files are built, and what the direct includes are in the source files. - Sort the headers by the product of the number of direct includes and the preprocessed size: the largest ones are those that are worth looking at first. - use graphviz to visualize the directed graph showing the includes between the top 100 headers in that list. You get something like I had in [1], or the version afterwards at [2]. - split out unneeded indirect includes from the headers in the center of that graph, typically by splitting out struct definitions. - repeat. The main problem with this approach is that as soon as you start actually reducing the unneeded indirect includes, you end up with countless .c files that no longer build because they are missing a direct include for something that was always included somewhere deep underneath, so I needed a second set of scripts to add direct includes to every .c file. On the plus side, I did see something on the order of a 30% compile speed improvement with clang, which is insane given that this only removed dead definitions.> But I'll note that linux/fs.h, linux/sched.h, linux/mm.h are > interesting headers for further splitting. > > > BTW, I also have a longstanding doubt about > how header files are organized in the kernel, > and which headers can and cannot be included > from which other files. > > For example I see that files in samples or scripts or tools, > that redefine many things such as offsetof() or ARRAY_SIZE(), > and I don't know if there's a good reason for that, > or if I should simply remove all that stuff and > include <linux/offsetof.h> everywhere I see offsetof() being used.The main issue here is that user space code should not include anything outside of include/uapi/ and arch/*/include/uapi/ offsetof() is defined in include/linux/stddef.h, so this is by definition not accessible here. It appears that there is also an include/uapi/linux/stddef.h that is really strange because it includes linux/compiler_types.h, which in turn is outside of uapi/. This should probably be fixed. Arnd [1] https://drive.google.com/file/d/14IKifYDadg2W5fMsefxr4373jizo9bLl/view?usp=sharing [2] https://drive.google.com/file/d/1pWQcv3_ZXGqZB8ogV-JOfoV-WJN2UNnd/view?usp=sharing
Andy Shevchenko
2021-Nov-19 16:10 UTC
[PATCH 00/17] Add memberof(), split some headers, and slightly simplify code
On Fri, Nov 19, 2021 at 04:57:46PM +0100, Arnd Bergmann wrote:> On Fri, Nov 19, 2021 at 4:06 PM Alejandro Colomar (man-pages) > <alx.manpages at gmail.com> wrote: > > On 11/19/21 15:47, Arnd Bergmann wrote: > > > On Fri, Nov 19, 2021 at 12:36 PM Alejandro Colomar > > > > Yes, I would like to untangle the dependencies. > > > > The main reason I started doing this splitting > > is because I wouldn't be able to include > > <linux/stddef.h> in some headers, > > because it pulled too much stuff that broke unrelated things. > > > > So that's why I started from there. > > > > I for example would like to get NULL in memberof() > > without puling anything else, > > so <linux/NULL.h> makes sense for that. > > > > It's clear that every .c wants NULL, > > but it's not so clear that every .c wants > > everything that <linux/stddef.h> pulls indirectly. > > From what I can tell, linux/stddef.h is tiny, I don't think it's really > worth optimizing this part. I have spent some time last year > trying to untangle some of the more interesting headers, but ended > up not completing this as there are some really hard problems > once you start getting to the interesting bits. > > The approach I tried was roughly: > > - For each header in the kernel, create a preprocessed version > that includes all the indirect includes, from that start a set > of lookup tables that record which header is eventually included > by which ones, and the size of each preprocessed header in > bytes > > - For a given kernel configuration (e.g. defconfig or allmodconfig) > that I'm most interested in, look at which files are built, and what > the direct includes are in the source files. > > - Sort the headers by the product of the number of direct includes > and the preprocessed size: the largest ones are those that are > worth looking at first. > > - use graphviz to visualize the directed graph showing the includes > between the top 100 headers in that list. You get something like > I had in [1], or the version afterwards at [2]. > > - split out unneeded indirect includes from the headers in the center > of that graph, typically by splitting out struct definitions. > > - repeat. > > The main problem with this approach is that as soon as you start > actually reducing the unneeded indirect includes, you end up with > countless .c files that no longer build because they are missing a > direct include for something that was always included somewhere > deep underneath, so I needed a second set of scripts to add > direct includes to every .c file.Can't it be done with cocci support?> On the plus side, I did see something on the order of a 30% > compile speed improvement with clang, which is insane > given that this only removed dead definitions.Thumb up!> > But I'll note that linux/fs.h, linux/sched.h, linux/mm.h are > > interesting headers for further splitting. > > > > > > BTW, I also have a longstanding doubt about > > how header files are organized in the kernel, > > and which headers can and cannot be included > > from which other files. > > > > For example I see that files in samples or scripts or tools, > > that redefine many things such as offsetof() or ARRAY_SIZE(), > > and I don't know if there's a good reason for that, > > or if I should simply remove all that stuff and > > include <linux/offsetof.h> everywhere I see offsetof() being used. > > The main issue here is that user space code should not > include anything outside of include/uapi/ and arch/*/include/uapi/ > > offsetof() is defined in include/linux/stddef.h, so this is by > definition not accessible here. It appears that there is also > an include/uapi/linux/stddef.h that is really strange because > it includes linux/compiler_types.h, which in turn is outside > of uapi/. This should probably be fixed. > > Arnd > > [1] https://drive.google.com/file/d/14IKifYDadg2W5fMsefxr4373jizo9bLl/view?usp=sharing > [2] https://drive.google.com/file/d/1pWQcv3_ZXGqZB8ogV-JOfoV-WJN2UNnd/view?usp=sharing-- With Best Regards, Andy Shevchenko