Anthony Liguori
2007-Mar-13 19:32 UTC
[Xen-devel] vram_dirty vs. shadow paging dirty tracking
When thinking about multithreading the device model, it occurred to me that it''s a little odd that we''re doing a memcmp to determine which portions of the VRAM has changed. Couldn''t we just use dirty page tracking in the shadow paging code? That should significantly lower the overhead of this plus I believe the infrastructure is already mostly there in the shadow2 code. Is this a sane idea? Regards, Anthony Liguori _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2007-Mar-13 21:02 UTC
[Xen-devel] RE: vram_dirty vs. shadow paging dirty tracking
> When thinking about multithreading the device model, it occurred to me > that it''s a little odd that we''re doing a memcmp to determine which > portions of the VRAM has changed. Couldn''t we just use dirty page > tracking in the shadow paging code? That should significantly lower > the > overhead of this plus I believe the infrastructure is already mostly > there in the shadow2 code.Yep, its been in the roadmap doc for quite a while. However, the log dirty code isn''t ideal for this. We''d need to extend it to enable it to be turned on for just a subset of the GFN range (we could use a xen rangeset for this). Even so, I''m not super keen on the idea of tearing down and rebuilding 1024 PTE''s up to 50 times a second. A lower overhead solution would be to do scanning and resetting of the dirty bits on the PTEs (and a global tlb flush). In the general case this is tricky as the framebuffer could be mapped by multiple PTEs. In practice, I believe this doesn''t happen for either Linux or Windows. There''s always a good fallback of just returning ''all dirty'' if the heuristic is violated. Would be good to knock this up. Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Mar-13 21:30 UTC
[Xen-devel] Re: vram_dirty vs. shadow paging dirty tracking
Ian Pratt wrote:>> When thinking about multithreading the device model, it occurred to me >> that it''s a little odd that we''re doing a memcmp to determine which >> portions of the VRAM has changed. Couldn''t we just use dirty page >> tracking in the shadow paging code? That should significantly lower >> the >> overhead of this plus I believe the infrastructure is already mostly >> there in the shadow2 code. >> > > Yep, its been in the roadmap doc for quite a while. However, the log > dirty code isn''t ideal for this. We''d need to extend it to enable it to > be turned on for just a subset of the GFN range (we could use a xen > rangeset for this). >Okay, I was curious if the log dirty stuff could do ranges. I guess not.> Even so, I''m not super keen on the idea of tearing down and rebuilding > 1024 PTE''s up to 50 times a second. > > A lower overhead solution would be to do scanning and resetting of the > dirty bits on the PTEs (and a global tlb flush).Right, this is the approach I was assuming. There''s really no use in tearing down the whole PTE (since you would have to take an extraneous read fault).> In the general case > this is tricky as the framebuffer could be mapped by multiple PTEs. In > practice, I believe this doesn''t happen for either Linux or Windows. >I wouldn''t think so, but showing my ignorance for a moment, does shadow2 not provide a mechanism to lookup VA''s given a GFN? This lookup could be cheap if the structures are built during shadow page table construction. Sounds like this is a good long term goal but I think I''ll stick with the threading as an intermediate goal. I''ve got a minor concern that threading isn''t going to help us much when dom0 is UP since the VGA scanning won''t happen while an MMIO/PIO request happens. With an SMP dom0, you could potentially do all the VGA scanning on one processor ensuring that qemu-dm wasn''t ever "busy" when a request occurs. I''m slightly concerned though that having a thread that''s as CPU hungry as the VGA scanning may increase context-switches during the MMIO/PIO handling which would actually hurt performance. We''ll see soon enough though. Regards, Anthony Liguori> There''s always a good fallback of just returning ''all dirty'' if the > heuristic is violated. Would be good to knock this up. > > Best, > Ian >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2007-Mar-14 00:17 UTC
RE: [Xen-devel] Re: vram_dirty vs. shadow paging dirty tracking
> > Yep, its been in the roadmap doc for quite a while. However, the log > > dirty code isn''t ideal for this. We''d need to extend it to enable it > to > > be turned on for just a subset of the GFN range (we could use a xen > > rangeset for this). > > > > Okay, I was curious if the log dirty stuff could do ranges. I guess > not.It could certainly be added, but I prefer the dirty bit solution to this particular problem.> > Even so, I''m not super keen on the idea of tearing down and > rebuilding > > 1024 PTE''s up to 50 times a second. > > > > A lower overhead solution would be to do scanning and resetting of > the > > dirty bits on the PTEs (and a global tlb flush). > > Right, this is the approach I was assuming. There''s really no use in > tearing down the whole PTE (since you would have to take an extraneous > read fault). > > > In the general case > > this is tricky as the framebuffer could be mapped by multiple PTEs. > In > > practice, I believe this doesn''t happen for either Linux or Windows. > > > > I wouldn''t think so, but showing my ignorance for a moment, does > shadow2 not provide a mechanism to lookup VA''s given a GFN? Thislookup could> be cheap if the structures are built during shadow page table > construction.No, it deliberately doesn''t because threading all the PTEs that point to a GFN can consume quite a bit of memory, introduces locking complexity that will effect future scalability, and turns out to be completely unnecessary for normal shadow mode operation because some simple heuristics get a near-perfect hit rate.> Sounds like this is a good long term goal but I think I''ll stick with > the threading as an intermediate goal.Yes, that''s more immediately useful, thanks.> I''ve got a minor concern that threading isn''t going to help us much > when > dom0 is UP since the VGA scanning won''t happen while an MMIO/PIO > request happens.I think the VGA scanning burns enough CPU to stand a good chance of getting pre-empted when an MMIO/PIO request arrives. We need to make sure there''s no synchronization required that prevents this. Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhai, Edwin
2007-Mar-14 08:22 UTC
Re: [Xen-devel] vram_dirty vs. shadow paging dirty tracking
On Tue, Mar 13, 2007 at 02:32:56PM -0500, Anthony Liguori wrote:> When thinking about multithreading the device model, it occurred to me > that it''s a little odd that we''re doing a memcmp to determine which > portions of the VRAM has changed. Couldn''t we just use dirty pagewe made this code to improve the user vnc responsiveness long before. now QEMU has new vnc implementation to resolve this issue and this code introduce perf drop for guest of linux with X or windows. so i''d like to send a patch to revert it and make a proper solution in future. thanks,> tracking in the shadow paging code? That should significantly lower the > overhead of this plus I believe the infrastructure is already mostly > there in the shadow2 code. > > Is this a sane idea? > > Regards, > > Anthony Liguori > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-- best rgds, edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Mar-14 16:00 UTC
Re: [Xen-devel] vram_dirty vs. shadow paging dirty tracking
Zhai, Edwin wrote:> On Tue, Mar 13, 2007 at 02:32:56PM -0500, Anthony Liguori wrote: > >> When thinking about multithreading the device model, it occurred to me >> that it''s a little odd that we''re doing a memcmp to determine which >> portions of the VRAM has changed. Couldn''t we just use dirty page >> > > we made this code to improve the user vnc responsiveness long before. > now QEMU has new vnc implementation to resolve this issue and this code > introduce perf drop for guest of linux with X or windows. >Compared to what, just updating the full screen 30 times a second? I suspect that''s not as bad as it sounds since SDL will be using a XShmImage. The VNC minimization is done based on a timer however so sticking the timer stuff into a thread is still useful. Of course, we should be able to quickly determine how useful this is by just changing SDL to update the whole image... Regards, Anthony Liguori> so i''d like to send a patch to revert it and make a proper solution in future. > > thanks, > > >> tracking in the shadow paging code? That should significantly lower the >> overhead of this plus I believe the infrastructure is already mostly >> there in the shadow2 code. >> >> Is this a sane idea? >> >> Regards, >> >> Anthony Liguori >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhai, Edwin
2007-Mar-15 02:59 UTC
Re: [Xen-devel] vram_dirty vs. shadow paging dirty tracking
On Wed, Mar 14, 2007 at 11:00:17AM -0500, Anthony Liguori wrote:> Zhai, Edwin wrote: > >On Tue, Mar 13, 2007 at 02:32:56PM -0500, Anthony Liguori wrote: > > > >>When thinking about multithreading the device model, it occurred to me > >>that it''s a little odd that we''re doing a memcmp to determine which > >>portions of the VRAM has changed. Couldn''t we just use dirty page > >> > > > >we made this code to improve the user vnc responsiveness long before. > >now QEMU has new vnc implementation to resolve this issue and this code > >introduce perf drop for guest of linux with X or windows. > > > > Compared to what, just updating the full screen 30 times a second? I > suspect that''s not as bad as it sounds since SDL will be using a XShmImage.removing the memcpy and having whole screen update each time has better performance.> > The VNC minimization is done based on a timer however so sticking the > timer stuff into a thread is still useful. Of course, we should be able > to quickly determine how useful this is by just changing SDL to update > the whole image... > > Regards, > > Anthony Liguori > > >so i''d like to send a patch to revert it and make a proper solution in > >future. > > > >thanks, > > > >-- best rgds, edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dong, Eddie
2007-Mar-15 03:22 UTC
RE: [Xen-devel] vram_dirty vs. shadow paging dirty tracking
> > Compared to what, just updating the full screen 30 times a second? I > suspect that''s not as bad as it sounds since SDL will be using a > XShmImage. >It depends on how you run the benchmark. In case of multiple VMs case where multiple Qemus (say 8) are running, this kind of comparation eat unacceptable cpu cycles. Eddie _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- [PATCH] unshadow the page table page which are used as data page
- [RFC] New shadow paging code
- Consult some concepts about shadow paging mechanism
- Walking an HVM''s shadow page tables and other memory management questions.
- Re: [XenARM] XEN tools for ARM with Virtualization Extensions