thr3ads.net - Linux Virtualization - + stupid-hack-to-make-mainline-build.patch added to -mm tree [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Thomas Gleixner

2007-Apr-18 13:02 UTC

+ stupid-hack-to-make-mainline-build.patch added to -mm tree

On Tue, 2007-03-06 at 00:55 -0800, Zachary Amsden wrote:> > a proper CE device also has the added bonus of making high-res timers 
> > guests work automatically. It should be simple: just pass it through
to
> > your hypervisor, a hyper-CE-device, like a hyper-clocksource device
has
> > essentially no guest-side complexity.
> >   
> 
> It is not so simple.  In theory it works great.  In reality, the i386 
> implementation is completely hardwired to work the way hardware works, 
> and breaking the clockevent code out of the deep ties to the APIC is 
> extremely non-trivial.  We tried, and could not accomplish it for 2.6.21 
> because the hrtimers integration was complex, and introduced many bugs 
> for us.
Why is this so non-trivial ? All you have to do is _NOT_ register
PIT/HPET/APIC timers and register a per CPU hyper-CE-device instead,
which uses the hypervisor timer emulation instead of real hardware.

clockevents breaks the hardwired assumptions of the old timer code and
allows you to remove _ALL_ the hardwired hackery in vmitimer.c, i.e.
stuff like

       /* Disable PIT. */
        outb_p(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */
> We worked around this by keeping NO_IDLE_HZ support, which now 
> you deprecated.  So now we are using NO_HZ without a hyper-CE device, 
> and it is working fine.  We understand the benefits of moving to the CE 
> model - but it cannot be done overnight.
This is ugly as hell. NO_HZ enables the dyntick functions in idle(),
irq_enter() and irq_exit() so the clockevents code is actually invoked.
I have not looked close enough why this does work at all.

I have the feeling that "working fine" means something like "does
not
explode".

We really want to fix this now instead of pushing some not know why it
works hack into the kernel.

	tglx

Zachary Amsden

2007-Apr-18 13:02 UTC

head link

+ stupid-hack-to-make-mainline-build.patch added to -mm tree

Ingo Molnar wrote:> * Ingo Molnar <mingo@elte.hu> wrote:
>
>   
>> no, that's not the case: next_timer_interrupt() is the NO_IDLE_HZ 
>> method of doing things - while in the NO_HZ case you are supposed to 
>> use clockevent devices to program timer hardware.
>>     
We don't have a clockevent device.  But we need NO_IDLE_HZ support, 
which NO_HZ has now subsumed.
> a proper CE device also has the added bonus of making high-res timers 
> guests work automatically. It should be simple: just pass it through to 
> your hypervisor, a hyper-CE-device, like a hyper-clocksource device has 
> essentially no guest-side complexity.
>   
It is not so simple.  In theory it works great.  In reality, the i386 
implementation is completely hardwired to work the way hardware works, 
and breaking the clockevent code out of the deep ties to the APIC is 
extremely non-trivial.  We tried, and could not accomplish it for 2.6.21 
because the hrtimers integration was complex, and introduced many bugs 
for us.  We worked around this by keeping NO_IDLE_HZ support, which now 
you deprecated.  So now we are using NO_HZ without a hyper-CE device, 
and it is working fine.  We understand the benefits of moving to the CE 
model - but it cannot be done overnight.

Xen has the same requirements for integrating their timer code.

Zach

Zachary Amsden

2007-Apr-18 13:02 UTC

head link

hardwired VMI crap

Ingo Molnar wrote:> * Zachary Amsden <zach@vmware.com> wrote:
>
>   
>> The correct solution here is to properly separate the APIC, SMP, and 
>> timer code so the logic of it which we want to reuse is separated from 
>> the hardware dependence.  Clock events and clocksources take care of 
>> most of the timer issues, but there is still ugliness from SMP timer 
>> events depending on having part of the APIC infrastructure for wiring 
>> the interrupt gates.
>>     
>
> what are you talking about? A clockevents driver does not need to know 
> about lapic details, at all. In terms of interrupt gates for the 
> hypervisor to notify about clock events - use a virtual interrupt 
> controller via genirq.
>   
See my last e-mail.  It is not possible on i386, since local per-cpu 
interrupts are only supported via the APIC.
> if you want to use hardwired hardware details as your API: DO IT WITHOUT 
> MODIFYING LINUX. If you want anything more intelligent, something more 
> 'paravirtual' - WORK WITH US AND WORK WITH THE OTHER HYPERVISORS.
So far
> all i've seen from you was excuses and stonewalling on every step! We 
>   
So far, all you have done is not complain about our code until it was 
merged, the pursue every tactic possible to break it.  It is not us that 
are stonewalling.
> told you about the need to do VMI-timer ontop of clockevents last year 
> already! You resisted virtually EVERY SINGLE cleanup suggestion since 
> your stuff got upstream and you ONLY acted when a change was force-fed 
> to you. Just count the number of emails you wrote, versus the patches 
> you did. And your code is barely 2 weeks in! That is unacceptable.
Which cleanups have we resisted in particular?  I can't recall any.  
Just count the number of emails you wrote versus the patches and helpful 
suggestions you made.  No, instead, you broke our code, in many ways, 
with the untouchable aim of cleaning up the kernel source to do things 
the way you think they should be done in a future release.

Our code is in the tree now, and any attempts to break it using such 
justifications as easing maintenance for kernel developers in future 
releases are flat out false and improper.  We are working to correct 
flaws that we have and properly conform to the changing interfaces such 
as the timer subsystem, and also to interoperate properly with the full 
set of available configurations.

In the meantime, having code that uses slightly older interfaces in the 
kernel tree is not wrong in any way - it is pragmatic, because that code 
is working today, and not only that, the sanest thing to do in a release 
cycle.  And our code in the tree to be released imposes zero burden on 
anyone except for us.  Are we stopping you from rewriting the timer 
subsystem in the -rc tree?  How?  Because this code is supposed to be 
settled.  Your deliberate breaking of our code forces us to come up with 
workarounds that might be considered inappropriate, but nevertheless, 
necessary.  Who has to deal with and adapt to this?  Certainly not you.  
The burden to maintain the correctness of our code is on us.

Working together to make sure that this code completely integrates with 
all this new development is the right thing to do - in the development 
tree.  Why you insist on stopping our code in the tip kernel release 
tree is beyond me, as there is no purpose to it other than to block our 
code.

Zach

Apparently Analagous Threads

Search for more possibly parallel threads

Linux Virtualization - Apr 2007 - + stupid-hack-to-make-mainline-build.patch added to -mm tree

+ stupid-hack-to-make-mainline-build.patch added to -mm tree

+ stupid-hack-to-make-mainline-build.patch added to -mm tree

hardwired VMI crap

Apparently Analagous Threads