Martin Peres
2014-Jun-12 23:14 UTC
[Nouveau] EVoC Proposal: REclock - Reverse-engineer and implement NVA3/5/8 Voltage- and Frequency Scaling in Nouveau
On 11/06/2014 13:59, Roy Spliet wrote:> Dear Mr. Dew, > > I hereby wish to propose the X.org EVoC project "REclock - > Reverse-engineer and implement NVA3/5/8 Voltage- and Frequency Scaling > in Nouveau" for which I am willing to participate, and apply for the > associated funding. Full details below or on > http://nouveau.spliet.org/evoc.html . For any further questions feel > free to contact me either on Freenode IRC (rspliet) or by e-mail to > this address. > Thank you for your consideration, and I look forward to hearing more > from you soon. Yours, > > Roy SplietHello Roy, Thank you for your proposal. After careful consideration from the board of directors, we accepted it. You may start your EVoC on June 16th. Our treasurer will contact you in private to get your banking information and send you an initial payment along with 250? for buying the hardware you need. We wish you and your mentor the best of luck on this project! Do not hesitate to contact us if you have any question. Martin Peres, on behalf of the board of directors> > --- > > > REclock: Reverse-engineer and implement NVA3/5/8 Voltage- and > Frequency Scaling in Nouveau > > NVIDIA graphics cards often support running at a variety of different > performance "levels". This aids in reducing the power demand and heat > dissipation of the devices when idle, while unleashing full potential > under load. A performance level comprises the clock speed and voltage > for several subcomponents in the GPU. The difference between the > lowest and highest performance level can be as much as a factor 10 in > clock speed. > > Despite hard work from many developers, reclocking support in Nouveau > still has quite a few loose ends: engine reclocking is mostly in place > but not always reliable, there are several missing routines related to > memory reclocking and in general the actions required to perform > voltage- and frequency scaling are not or only partially understood. > Because of this, NVIDIA GPUs driven by nouveau are limited to using > the boot speed and voltage only, severely limiting performance and > usability. > > For this project, I aim to tie these loose ends together for NVIDIAs > NVA3/5/8 GPUs. I intend to fully reverse engineer several > subcomponents related to voltage and frequency scaling, try to get a > full understanding of the clock tree and use this gained knowledge to > further improve the nouveau voltage and frequency scaling > implementation for said GPUs. > > > Personal information > > My name is Roy Spliet, I'm a graduated masters student from Delft > University of Technology (TU Delft), planning to continue my academic > career as a PhD student in computer architecture. My background > includes kernel/driver development (nouveau, LITMUS^RT) and GPGPU > programming in OpenCL. > > Previous involvement in Nouveau has led to successfully > reverse-engineering and implementing reclocking support for the > memory-less NVIDIA NVAA and NVAC chipsets, alongside many > contributions to memory reclocking for pre-NVC0 (Fermi) GPUs. For more > details about my personal background, please consult > http://roy.spliet.org. > > > Background > > NVIDIA GPUs feature a complex multi-layer clock tree that allows for > per-subcomponent alteration of clock speeds. The precise clock tree is > a complex network consisting of one or more input clocks, several > fixed dividers, and a lot of routing to distribute these clocks to > every subcomponent. On the last level there is usually a Phase-Lock > Loop (PLL) that can take either the original clock or one of several > divided clocks as an input, and bring this clock up to the desired > level for the associated subcomponent. Control registers alter the > precise input of these PLLs, and can in addition be configured to > bypass the PLLs. > > The video BIOS (VBIOS) provides two services: it takes care of > bringing the GPU in to an initial valid state, and it contains crucial > information regarding reclocking. Most importantly, the VBIOS > describes the ranges of each PLL in the system. On a higher level, the > VBIOS also contains several "performance levels". Each level consists > of a clock speed for each subcomponent. NVIDIA's driver switches > between these performance levels based on the load. For most engines > this routine consists of bypassing the PLL, setting it to a new value, > testing the newly set values, and then re-enabling the PLL. > > > Memory reclocking > > Memory reclocking is a bit more difficult than other engines. Besides > an input clock, the memory controller also needs to know of a variety > of latencies, that are usually defined in clock ticks but mandated in > nanoseconds. These latencies, or timings, are described in the VBIOS. > > To keep the memory controller and the engines running in sync, a form > of link training is also required. Updating all this information must > be done according to strict timing requirements, and failure to meet > these deadlines results in corrupted memory and all consequences > associated. Although the memory is often well documented in the > public, NVIDIA's memory controller is not. Reverse engineering it is a > difficult challenge, as there is very little feedback beyond either a > working system or a complete crash. > > > Reclocking engine > > To facilitate the action of reclocking from within the GPU itself, > increasing stability on operating system failures, NVIDIA added a > subcomponent called PDAEMON. This component has full access to many > registers accessible through MMIO, including the registers controlling > the clocks, latencies and other power-management related features. > PDAEMON is a programmable engine supporting the Falcon or f?c ISA. > NVIDIA's driver uploads the firmware for this engine, dubbed PMU. > > PMU is responsible for many power-management related functions, > including: monitor temperature, control fan speed and monitor the load > on the GPU. To alter clock speeds, the NVIDIA driver can upload > special scripts in a language called "seq" that will be interpreted by > PMU. These scripts contain sequences of registers that need to be > adjusted in order, along with required pause commands and other logic. > Full understanding of the seq ISA gives full understanding of the > actions executed by NVIDIA's driver on a reclock operation and their > timing. > > Nouveau has it's own implementation of the PMU microcode, including a > scriptable engine offering many of the capabilities implemented in > older hardware. However, it's capabilities might be insufficient to > perform all the tasks that NVIDIA's driver performs through PMU. > > > Current state > > Nouveau has a lot of code in place for engine reclocking. Many of the > PLLs have been identified, and some of the control registers have been > reverse engineered either partially or completely. Although known to > work on some GPUs, engine reclocking does not work reliably at least > on my NVA8. > > For memory reclocking, some code exists to determine the latencies > that the memory and the memory controller need to know. Still, there > are some other features vital for memory reclocking that are > ill-understood, unimplemented and/or incorrect. In addition, the order > of events is likely wrong. As a result, clocking memory to any > performance level higher than the boot clocks likely results in memory > corruption. The link training unit found on some GPUs with DDR3 is one > important example of a feature not handled by Nouveau currently. > > Large parts of the VBIOS are well understood and parsed both by the > nouveau kernel driver and the envytools VBIOS parsing tool. Any bits > left could lead to interesting clues on actions required for reclocking. > > > Project > > > Scope > > In this project I aim to get a better understanding of the reclocking > features of the NVA3/5/8, as utilised by NVIDIA's official device > driver. The eventual goal of this project is complete voltage and > frequency scaling for these GPUs in nouveau. Gained knowledge could > benefit the implementation of newer generations of cards as well. > > I limit myself to the core features and aim for a manual control of > the voltage and clock frequencies based on profiles in the VBIOS; > dynamic reclocking based on load information is beyond the scope of > this project. > > Initial code contributions will not make use of Nouveaus PMU engine. > When established that this is absolutely necessary, the firmware could > be extended to support the desired functionality. However, until this > is established, reclocking through PDAEMON is considered a nice to > have feature with low priority. > > > Benefits to the community > > Users will benefit from the increased performance that nouveau can > offer under higher clocks, while having the capability to save energy > when the processing power is not required. This could lead to > prolongued battery life for mobile systems using the Open Source > NVIDIA driver stack. > > This work combined with the GSoC project on performance counters > provides the prerequisites for implementing dynamic frequency scaling > in future work, enabling all users of the open source graphics driver > stack to profit from these benefits without manual intervention. > > > Deliverables > > Implementation will be done entirely in the Nouveau kernel module, > forked from an upstream kernel. Produced patches are intended to be > merged back into mainline kernel at the end of the project, but might > require some after-care when conflicting maintenance is done on > nouveau. Controls are exposed through sysfs. > > Documentation will be added to the "envytools" GIT repository where > applicable. > > > Mentor > > Ilia Mirkin > > > Schedule > > My availability is roughly full time between now and the start of the > new academic year in October. Tentative planning: > > Description Deliverable Timeframe Required > Reverse engineer seq ISA Documentation (envytools) 1 week X > Write seq script decoder Decoding tool (envytools) 1 week X > RE clock tree for NVA3/5/8 Documentation (envytools), full graph 1-2 > week(s) X > Finish/fix engine reclocking for NVA3/5/8 Kernel code allowing users > to successfully select any performance level through SysFS 1 week X > RE+implement DDR3 link training unit Documentation (envytools) + > Kernel code (no directly visible changes) 1 week X > RE+implement DDR3 memory reclocking Kernel code, observable > performance improvements for highest performance level on affected > GPUs 3 weeks X > RE+implement GDDR3 memory reclocking Kernel code, observable > performance improvements for highest performance level on affected > GPUs 3 weeks X > RE+implement GDDR5 memory reclocking* Kernel code, observable > performance improvements for highest performance level on affected > GPUs ? > RE+implement DDR2 memory reclocking* Kernel code, observable > performance improvements for highest performance level on affected > GPUs ? > > > * If hardware available > > > Risks > > There is little risk attached to all tasks resulting in documentation > of the clock tree. Patches to the nouveau kernel tree are expected, > but chances exist that the code does not generalise to all cards. > Earlier experience makes me confident engine reclocking can be > implemented with low risk. Achievements for memory reclocking are not > guaranteed given the complexity of the job, although progress is > definitely expected. > > > Hardware > > I currently possess one NVA8 GPU with DDR3 memory. More NVA3/5/8 > hardware is available through Martin Peres and accessible remotely. > Possibly missing in our combined collection are NVA3/5/8 graphics > cards with DDR2. If budget is available, this could be purchased (new > approximately ?50,=) by either Martin Peres or myself for > reverse-engineering purposes. > > > _______________________________________________ > board at foundation.x.org: X.Org Foundation Board of Directors > Archives: http://foundation.x.org/cgi-bin/mailman/private/board > Info: http://foundation.x.org/cgi-bin/mailman/listinfo/board