Pekka Paalanen
2009-Aug-10 17:40 UTC
[Nouveau] [RFC] drm/nouveau: optimize code emission of inline functions
[This email is either empty or too large to be displayed at this time]
Younes Manton
2009-Aug-11 00:14 UTC
[Nouveau] [RFC] drm/nouveau: optimize code emission of inline functions
On Mon, Aug 10, 2009 at 1:40 PM, Pekka Paalanen<pq at iki.fi> wrote:> Before this patch: > > $ objdump -t nouveau.ko --section=.text | cut -f2 | sort -k2 | uniq -d -c > 4 > 9 0000000000000010 BEGIN_RING > 5 0000000000000051 FIRE_RING > 2 00000000000000b3 NVLockVgaCrtcs > 4 000000000000008b NVReadVgaCrtc > 2 000000000000008c NVReadVgaCrtc > 2 0000000000000011 NVVgaSeqReset > 2 000000000000006b NVWriteCRTC > 2 0000000000000066 NVWriteRAMDAC > 4 0000000000000081 NVWriteVgaCrtc > 3 0000000000000082 NVWriteVgaCrtc > 11 000000000000001a OUT_RING > 9 0000000000000028 RING_SPACE > 2 0000000000000019 crtc_wr_cio_state > 3 0000000000000012 drm_gem_object_unreference > 2 0000000000000005 kmalloc > 3 000000000000000b kzalloc > 4 0000000000000051 nouveau_bo_ref > 2 0000000000000050 nvReadMC > 2 0000000000000052 nvWriteMC > 3 0000000000000029 nv_gf4_disp_arch > 4 000000000000001b nv_rd08 > 3 000000000000001c nv_rd08 > 29 0000000000000012 nv_rd32 > 2 0000000000000012 nv_ri32 > 5 000000000000001c nv_ro32 > 4 000000000000008b nv_two_heads > 11 0000000000000022 nv_wo32 > 8 0000000000000015 nv_wr08 > 29 0000000000000014 nv_wr32 > 2 0000000000000013 pci_read_config_dword > > After this patch: > > $ objdump -t nouveau.ko --section=.text | cut -f2 | sort -k2 | uniq -d -c > 4 > 9 0000000000000010 BEGIN_RING > 5 0000000000000051 FIRE_RING > 2 00000000000000b3 NVLockVgaCrtcs > 5 00000000000000a7 NVReadVgaCrtc > 2 0000000000000011 NVVgaSeqReset > 2 0000000000000073 NVWriteCRTC > 3 0000000000000072 NVWriteRAMDAC > 4 0000000000000091 NVWriteVgaCrtc > 3 0000000000000092 NVWriteVgaCrtc > 11 000000000000001a OUT_RING > 9 0000000000000028 RING_SPACE > 2 0000000000000019 crtc_wr_cio_state > 3 0000000000000012 drm_gem_object_unreference > 2 0000000000000005 kmalloc > 3 000000000000000b kzalloc > 3 0000000000000051 nouveau_bo_ref > 2 0000000000000052 nouveau_bo_ref > 3 000000000000005d nvReadMC > 2 000000000000005c nvWriteMC > 3 0000000000000029 nv_gf4_disp_arch > 4 000000000000008b nv_two_heads > 2 0000000000000013 pci_read_config_dword > > As you can see, the static inline functions changed to extern > inline functions no longer appear many times in the final kernel > module. But, at the same time nouveau.ko file size > before: 583683 B (.text size 0x000312c8) > after: 681075 B (.text size 0x00039474) > That's .text size increase by 32 kB. > > So something is definitely inlined a lot more. This was tested on > x86_64, gcc 4.1.2, CONFIG_OPTIMIZE_INLINING=y, > CONFIG_CC_OPTIMIZE_FOR_SIZE=y. > > Now, I'm not sure if this patch would be a good thing or not. > Comments?Well if the goal is a small module then I guess it's not a good idea, but then we should be disabling some other optimizations that excessively bloat the module. I don't think it's a bad idea, but I'd be curious where all the extra text comes from. I'm guessing more inlining and/or loop unrolling.
Apparently Analagous Threads
- [PATCH 1/4] drm/nouveau: add reg_debug module parameter
- [PATCHv2 01/10] drm/nouveau: Fix a lock up at NVSetOwner with nv11.
- [PATCH 00/12] TV-out modesetting kernel patches.
- [PATCH] Use nanosleep instead of usleep when waiting the hardware.
- [PATCH] kms: Fix <nv11 hardware cursor.