>> When the system boots, the processor is normally in "real-mode", and >> it''s definitely not got paging enabled. So we have to "make >> the guest OS >> believe this is the case". But at the same time, the guest OS is most >> likely not loaded at address zero in memory, so we need paging enabled >> to remap the GUEST PHYSICAL address to match the machine physical >> address. So we have a "linear map" to translate the "address zero" to >> the "start of guest memory", and so on for every page of memory in the >> guest. >> >> This is not hard to do, since the AMD-V/VT feature of the processor >> expects the paging-bit to be different between what the guest "thinks" >> and the actual case. In the AMD-V, there''s even support to >> run real-mode >> with paging enabled, so all the BIOS-code and such will be running in >> this mode. VT has to do a bunch of tricky stuff to work around that >> problem. >> >> Ok fine, does this argument holds true for even non-VT and >> non-Pacifica enabled processors? >> I doubt it. >> > > Not precisely. I''m talking only about HVM mode, which is "full > virtualization". PV-mode uses a different paging interface, which at > least for most parts, comprise of changing the whole area of code in the > kernel that updates the page-tables, by adding code that is aware of the > THREE types of address (guest-virtual, guest-physical and > machine-physical). This means that there''s no real need for the > "read-only page-tables" and "shadow-mode" - the page-table just contains > the right value for the machine-physical address. [That''s not to say > that read-only page-tables can''t be used in a PV system too - I''m not > 100% sure how the page-table management works in the PV mode]. >That is very interesting info on the paging system. Mats, could you please explain a bit the working of the PV paging? How do the the guest+host page tables work together? What does the guest page table point to, i.e. how+when is it mapped onto the host page table? I have seen in the code that there are different cases of guest+host paging table heights. Why? thanks. Armand>>> I hope i made myself clear. >>> Please enlighten me :-). >>> >>> When paging is enabled, we use a shadow page-table, which is >>> essentially >>> that the GUEST sees one page-table, and the processor another >>> (thanks to >>> the fact that the hypervisor intercepts the CR3 read/write >>> >> operations, >> >>> and when CR3 is read back by the guest, we don''t send back the value >>> it''s ACTUALLY POINTING TO IN THE PROCESSOR, but the value >>> >> that was set >> >>> by the guest). So there are two page-tables. >>> >>> Got this well, thanks Mats :). >>> >>> To make the page-table updates by the guest visible to the >>> >> hypervisor, >> >>> all of the guest-page-tables are made read-only (by scanning >>> the new CR3 >>> value whenever one is set). >>> >>> I didn''t get this either well :( >>> sorry, but do you mean CR3 for the guest or for the >>> processor? i hope you mean guest? >>> >> Yes, scan the guest-CR3 to see where it placed the page-tables. >> >> >>> Whenever a page-fault happens, the hypervisor has "first look", and >>> determines if the update is for a page-table or not. If it is a >>> page-table update, the guest operation is emulated (in >>> >> x86_emulate.c), >> >>> and the result is written to the shadow-page-table AND the >>> >>> Why do we need emulation?some peculiar reason for emulating? >>> Do you mean to say if i am running a 32 bit domU on top of a >>> 64 bit processor, the guest operation for updating the page >>> table is emulated by the hypervisor.am i right? >>> >> No, it''s simply because we need to see the result of the >> instruction and >> write it to two places (with some modification in one of >> those places). >> So if the code is doing, for example: "*pte |= 1;" (set a >> page-table-entry to "present"), we need to mark both the >> guest-page-table-entry to "present", and mark our >> shadow-entry "present" >> (and perhaps do some other work too, but that''s the minimum work >> needed). >> >> This brings one more question in my mind.Why do we use pinning then? >> > > I believe there''s two types of pinning! Page-pinning, which is blocking > a page from being accessed in an incorrect way [again, I''m not 100% sure > how this works, or exactly what it does - just that it''s a term used in > the general way I described in the previous sentence]. > > >> As i see at it.To avoid shadow page tables to be swapped out >> before the page tables they actually point to are swapped.Am i right? >> >> But according to interface manual,-> to bind a vcpu to a >> specific CPU in a SMP environment we use pining.But these two >> look pretty orthogonal statements to me, which means i may be >> wrong :(. >> Can somebody help me in this regard? >> > > CPU pinning is to tie a VCPU to a (set of) processor(s). For example, > you may want to pin Dom0 to run only on CPU0, and pin a DomU to run on > CPU''s 1,2 and 3. That way, Dom0 is ALWAYS able to run on it''s own CPU, > and it''s never in contention about which CPU to use, and DomU can run on > three CPU''s as much as it likes. You could have another DomU pinned to > CPU 3 if you wish. That means that CPU 1, 2 are exclusively for the > first DomU, whilst the second DomU shares CPU3 with the first DomU (so > they both get half the CPU performance of one CPU - on average over a > reasonable amount of time). > > -- >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2007-Mar-12 16:19 UTC
RE: [Xen-devel] Re: Xen-devel Digest, Vol 25, Issue 93
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > PUCCETTI Armand > Sent: 12 March 2007 16:11 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Re: Xen-devel Digest, Vol 25, Issue 93 > > > >> When the system boots, the processor is normally in > "real-mode", and > >> it''s definitely not got paging enabled. So we have to "make > >> the guest OS > >> believe this is the case". But at the same time, the guest > OS is most > >> likely not loaded at address zero in memory, so we need > paging enabled > >> to remap the GUEST PHYSICAL address to match the machine physical > >> address. So we have a "linear map" to translate the > "address zero" to > >> the "start of guest memory", and so on for every page of > memory in the > >> guest. > >> > >> This is not hard to do, since the AMD-V/VT feature of the processor > >> expects the paging-bit to be different between what the > guest "thinks" > >> and the actual case. In the AMD-V, there''s even support to > >> run real-mode > >> with paging enabled, so all the BIOS-code and such will be > running in > >> this mode. VT has to do a bunch of tricky stuff to work around that > >> problem. > >> > >> Ok fine, does this argument holds true for even non-VT and > >> non-Pacifica enabled processors? > >> I doubt it. > >> > > > > Not precisely. I''m talking only about HVM mode, which is "full > > virtualization". PV-mode uses a different paging interface, which at > > least for most parts, comprise of changing the whole area > of code in the > > kernel that updates the page-tables, by adding code that is > aware of the > > THREE types of address (guest-virtual, guest-physical and > > machine-physical). This means that there''s no real need for the > > "read-only page-tables" and "shadow-mode" - the page-table > just contains > > the right value for the machine-physical address. [That''s not to say > > that read-only page-tables can''t be used in a PV system too > - I''m not > > 100% sure how the page-table management works in the PV mode]. > > > That is very interesting info on the paging system. Mats, > could you please > explain a bit the working of the PV paging? How do the the guest+host > page tables work > together? What does the guest page table point to, i.e. > how+when is it > mapped onto the host page table? > > I have seen in the code that there are different cases of guest+host > paging table heights. Why?I''m sorry, I don''t quite know this. I believe that the page-table has to be the same number of levels in both Xen and the PV guest. There''s been some recent work to implement 32-bit PV on 64-bit HV, which I think changes this by allowing a 32-bit PAE guest to run on a 64-bit hypervisor. Someone else who works more on PV is probably better to answer this... In HVM, you definitely have 32-bit both PAE and non-PAE on 64-bit HV, which obviously means different number of page-table levels (2, 3 or 4 respectively for non-PAE, PAE and 64-bit). -- Mats> > thanks. Armand_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/3/07 16:19, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:>> I have seen in the code that there are different cases of guest+host >> paging table heights. Why? > > I''m sorry, I don''t quite know this. I believe that the page-table has to > be the same number of levels in both Xen and the PV guest. > > There''s been some recent work to implement 32-bit PV on 64-bit HV, which > I think changes this by allowing a 32-bit PAE guest to run on a 64-bit > hypervisor. Someone else who works more on PV is probably better to > answer this...For PV guests, there are no separate Xen/shadow page tables. Xen reserves a bit of space at the top end of guest pagetables to map itself. Hence normally the guest and Xen pagetables must be the same height as they are actually the same pagetables. Supporting PAE guest on 64-bit Xen is the only exception. Xen maintains a hidden top-level page directory and one of the entries in that directory points at the guest''s three-level pagetable. But again there is no shadowing of the guest three-level pagetable: they are directly hooked into the hidden top-level directory, and the real physical %cr3 points at that hidden directory. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Keir Fraser [mailto:keir@xensource.com] > Sent: 12 March 2007 16:23 > To: Petersson, Mats; PUCCETTI Armand; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Re: Xen-devel Digest, Vol 25, Issue 93 > > On 12/3/07 16:19, "Petersson, Mats" <Mats.Petersson@amd.com> wrote: > > >> I have seen in the code that there are different cases of > guest+host > >> paging table heights. Why? > > > > I''m sorry, I don''t quite know this. I believe that the > page-table has to > > be the same number of levels in both Xen and the PV guest. > > > > There''s been some recent work to implement 32-bit PV on > 64-bit HV, which > > I think changes this by allowing a 32-bit PAE guest to run > on a 64-bit > > hypervisor. Someone else who works more on PV is probably better to > > answer this... > > For PV guests, there are no separate Xen/shadow page tables. > Xen reserves a > bit of space at the top end of guest pagetables to map itself. Hence > normally the guest and Xen pagetables must be the same height > as they are > actually the same pagetables. > > Supporting PAE guest on 64-bit Xen is the only exception. Xen > maintains a > hidden top-level page directory and one of the entries in > that directory > points at the guest''s three-level pagetable. But again there > is no shadowing > of the guest three-level pagetable: they are directly hooked > into the hidden > top-level directory, and the real physical %cr3 points at that hidden > directory.Are the page-tables ever updated directly by the guest, or is it all done via hyper-calls? -- Mats> -- Keir > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/3/07 16:26, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:> Are the page-tables ever updated directly by the guest, or is it all > done via hyper-calls?Leaf PTEs (i.e., really just PTEs, not PDEs) can be directly written from the point-of-view of the guest. In fact they are trapped and emulated by Xen. The guest is somewhat aware of this because it has explicitly write-protected all its pagetables, so if it were to attempt the direct write on native hardware in these circumstances it would receive a page fault. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Keir Fraser [mailto:keir@xensource.com] > Sent: 12 March 2007 16:32 > To: Petersson, Mats; Keir Fraser; PUCCETTI Armand; > xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] More page-table questions. > > On 12/3/07 16:26, "Petersson, Mats" <Mats.Petersson@amd.com> wrote: > > > Are the page-tables ever updated directly by the guest, or is it all > > done via hyper-calls? > > Leaf PTEs (i.e., really just PTEs, not PDEs) can be directly > written from > the point-of-view of the guest. In fact they are trapped and > emulated by > Xen. The guest is somewhat aware of this because it has explicitly > write-protected all its pagetables, so if it were to attempt > the direct > write on native hardware in these circumstances it would > receive a page > fault.So in one way or another, the hypervisor knows about every write to the page-table, yes? -- Mats> > -- Keir > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/3/07 16:35, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:> So in one way or another, the hypervisor knows about every write to the > page-table, yes?Only the hypervisor ever actually updates pagetables. Guest attempts are trapped and emulated, or the guest explicitly executes a hypercall. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser a écrit :> On 12/3/07 16:26, "Petersson, Mats" <Mats.Petersson@amd.com> wrote: > > >> Are the page-tables ever updated directly by the guest, or is it all >> done via hyper-calls? >> > > Leaf PTEs (i.e., really just PTEs, not PDEs) can be directly written from > the point-of-view of the guest. In fact they are trapped and emulated by > Xen. The guest is somewhat aware of this because it has explicitly > write-protected all its pagetables, so if it were to attempt the direct > write on native hardware in these circumstances it would receive a page > fault. > > -- Keir > > >This is unclear to me: "a guest believes he can write PTEs" means that his source code to access the page tables is left unchanged between legacy and PV version? Merely, the hypervisor traps the guest''s accesses to the page tables, to control what he is doing (e.g. not overlapping any other domain''s pages) and allowing or denying any writes. This should apply to any page table level, so why only blocking writes to PTEs? This is for 4K pages, but how are 2M pages mixed? or do we assume that every domain pages are 4K? Armand _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: PUCCETTI Armand [mailto:armand.puccetti@cea.fr] > Sent: 12 March 2007 17:27 > To: Keir Fraser > Cc: Petersson, Mats; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] More page-table questions. > > Keir Fraser a écrit : > > On 12/3/07 16:26, "Petersson, Mats" <Mats.Petersson@amd.com> wrote: > > > > > >> Are the page-tables ever updated directly by the guest, or > is it all > >> done via hyper-calls? > >> > > > > Leaf PTEs (i.e., really just PTEs, not PDEs) can be > directly written from > > the point-of-view of the guest. In fact they are trapped > and emulated by > > Xen. The guest is somewhat aware of this because it has explicitly > > write-protected all its pagetables, so if it were to > attempt the direct > > write on native hardware in these circumstances it would > receive a page > > fault. > > > > -- Keir > > > > > > > This is unclear to me: "a guest believes he can write PTEs" means that > his source code to access the page tables is left unchanged between > legacy and PV version? > > Merely, the hypervisor traps the guest''s accesses to the page > tables, to > control > what he is doing (e.g. not overlapping any other domain''s pages) and > allowing or denying > any writes. This should apply to any page table level, so why only > blocking writes to PTEs?No, it''s the other way around (and I''m sure Keir will correct me if I''m wrong). The guest is not allowed to write AT ALL to the upper levels of the page-table (aside from via hypercalls). So, code in the guest can be unmodified as long as it''s touching just the bottom level of page-table (i.e. the individual 4K page).> > This is for 4K pages, but how are 2M pages mixed? or do we > assume that > every domain pages > are 4K?As far as I know, Xen _ONLY_ supports small pages (4K), no large page support at present. -- Mats> > Armand > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > This is unclear to me: "a guest believes he can write PTEs" means that > > his source code to access the page tables is left unchanged between > > legacy and PV version? > > > > Merely, the hypervisor traps the guest''s accesses to the page > > tables, to > > control > > what he is doing (e.g. not overlapping any other domain''s pages) and > > allowing or denying > > any writes. This should apply to any page table level, so why only > > blocking writes to PTEs? > > No, it''s the other way around (and I''m sure Keir will correct me if I''m > wrong). The guest is not allowed to write AT ALL to the upper levels of the > page-table (aside from via hypercalls). So, code in the guest can be > unmodified as long as it''s touching just the bottom level of page-table > (i.e. the individual 4K page).The guest doesn''t actually do explicit hypercalls in PV these days; it tries to write to the page table leaf nodes and these writes cause a fault (because the page tables must be mapped read only). Xen then validates the change being made and applies it to the page table. Guests have to be modified to translate pseudophysical->machine addresses and to map pagetables readonly, but they don''t make explicit hypercalls anymore (although the effect is much the same).> > This is for 4K pages, but how are 2M pages mixed? or do we > > assume that > > every domain pages > > are 4K? > > As far as I know, Xen _ONLY_ supports small pages (4K), no large page > support at present.Large page support hasn''t been figured out yet, so 4K pages is fixed on x86. I think the IA64 guys (and maybe PPC?) may have considered large pages (IA64 at least has a far wider range of allowed page sizes than x86). Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-15 22:15 UTC
[Xen-devel] Questions about device/event channels in Xen.
Hello, I just have several questions about device and event channel: 1. From the implementation point of view, are device and event channel the same (i.e. both based on shared memory)? 2. In Xen papers, it is said up to 1024 channels are supported per domain. Does 1024 include both device channel and event channel? 3. Are these device/event channels allocated dynamically or statically for each domain? 4. It seems I need to allocate one device channel per device, is this true? Thanks, Liang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Mar-16 00:34 UTC
Re: [Xen-devel] Questions about device/event channels in Xen.
The terminology may be confusing you here, so let me just say: Device channels are not like Event channels. They''re different concepts... let me elaborate:> I just have several questions about device and event channel: > 1. From the implementation point of view, are device and event channel the > same (i.e. both based on shared memory)?Event channels don''t use interdomain shared memory. They''re like an interdomain interrupt line, provided as a service by Xen. Basically a way for a pair of domains to "poke" each other to say "Something just happened and there''s work for you to do". The "device channel" uses interdomain shared memory (using grant tables) and event channels to emulate the functionality of a device. For instance, the blkfront and blkback drivers do something like the following: 1. blkfront wants to access a block of data -> queue a "read request" into memory it shares with blkback -> notify blkback in dom0 using an event channel 2. blkback experiences an "interrupt" as a result of the event sent to it -> looks in the shared memory to find the request -> executes the read operation -> puts a response in shared memory -> notifies blkfront in the domU using an event channel 3. blkfront experiences an "interrupt" due to the event sent to it -> completes processing of the new data The combination of the shared memory (containing a ring buffer for requests and responses) and the event channel provides the facilities for the front and back drivers to talk to each other; this is the device channel.> 2. In Xen papers, it is said up to 1024 channels are supported per domain. > Does 1024 include both device channel and event channel?This should be answered by the text above; device channels are a different thing, built using event channels.> 3. Are these device/event channels allocated dynamically or statically for > each domain?XenLinux virtual device drivers bind event channels dynamically when they set up their communications with another domain. I think there are some statically allocated event channels for essential services (e.g. for XenStore and the domain''s console).> 4. It seems I need to allocate one device channel per device, is this true?Yes, but the device channel is something you build yourself using shared memory and event channels - it''s up to you how you implement it. In summary: event channels and shared memory are concrete services provided by Xen using an API. A "device channel" is a high level term for the way drivers use these facilities to communicate. I hope this helps, please ask if you need any clarification. Cheers, Mark _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2007-Mar-16 03:17 UTC
Re: [Xen-devel] Questions about device/event channels in Xen.
On Thu, 2007-03-15 at 15:15 -0700, Liang Yang wrote:> Hello, > > I just have several questions about device and event channel: > 1. From the implementation point of view, are device and event channel the > same (i.e. both based on shared memory)? > > 2. In Xen papers, it is said up to 1024 channels are supported per domain. > Does 1024 include both device channel and event channel?actually it depends on the architecture. on 64-bit-systems it''s 4096. there''s a page of memory every domain shares with xen. this specific limitation is due to the length of a bitvector where every event channel marked pending sets a unique bit to 1, according to its port number (you may think of this as a ''channel number'', but actually the number depends on who''s holding the endpoint, similar to TCP/UDP connections. two numbers connecting two domains by one channel). the length of the bitvector in turn is more or less fixed, due to the way it is indexed to speed up searches a little. when interrupted, domains receiving events search the vector in order to determine which device sent the notification.> 3. Are these device/event channels allocated dynamically or statically for > each domain?the channel itself is allocated dynamically. it''s actually the port numbers per domain being limited. but that is not much space.> 4. It seems I need to allocate one device channel per device, is this true?yes, as mark correctly explained. equivalent to the way different interrupt lines in a physical host would be assigned to different devices. one *may* share them, but it''s tedious, and event channels are cheaper than actual wire. :) note: correctly termed, there''s no such thing as a ''device channel''. there are ''devices'', being an event channel (for notification) and shared memory (for the data). regards, daniel -- Daniel Stodden LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik der TU München D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@cs.tum.edu PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-16 06:02 UTC
RE: [Xen-devel] Questions about device/event channels in Xen.
Hi Mark, Thanks for your clarification. It is clear now. But I still have several questions. First: it seems Xen uses at least two different types of even "channels". First type is for interrupt notification (upper call or uni-directional) and the second if for the notification of queued descriptors (bi-directional). So is the type of event channel fixed when Xen allocate them or not fixed (for the same device), e.g. event channel 2 was a uni-directional type and later can be changed to bi-directional type. Second: as these events are handled asynchronously, does Xen treat different type of event differently? For example, does Xen always respond to interrupt event immediately (unlike queuing more descriptors and then set up event)? Third: for a PCIe device, I can choose to use MSI or the legacy line-based interrupt. Does different type of interrupt handling mechanism affect the event channel set-up? Liang -----Original Message----- From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On Behalf Of Mark Williamson Sent: Thursday, March 15, 2007 5:34 PM To: xen-devel@lists.xensource.com Cc: Liang Yang; Petersson, Mats Subject: Re: [Xen-devel] Questions about device/event channels in Xen. The terminology may be confusing you here, so let me just say: Device channels are not like Event channels. They''re different concepts... let me elaborate:> I just have several questions about device and event channel: > 1. From the implementation point of view, are device and event channel the > same (i.e. both based on shared memory)?Event channels don''t use interdomain shared memory. They''re like an interdomain interrupt line, provided as a service by Xen. Basically a way for a pair of domains to "poke" each other to say "Something just happened and there''s work for you to do". The "device channel" uses interdomain shared memory (using grant tables) and event channels to emulate the functionality of a device. For instance, the blkfront and blkback drivers do something like the following: 1. blkfront wants to access a block of data -> queue a "read request" into memory it shares with blkback -> notify blkback in dom0 using an event channel 2. blkback experiences an "interrupt" as a result of the event sent to it -> looks in the shared memory to find the request -> executes the read operation -> puts a response in shared memory -> notifies blkfront in the domU using an event channel 3. blkfront experiences an "interrupt" due to the event sent to it -> completes processing of the new data The combination of the shared memory (containing a ring buffer for requests and responses) and the event channel provides the facilities for the front and back drivers to talk to each other; this is the device channel.> 2. In Xen papers, it is said up to 1024 channels are supported per domain. > Does 1024 include both device channel and event channel?This should be answered by the text above; device channels are a different thing, built using event channels.> 3. Are these device/event channels allocated dynamically or statically for > each domain?XenLinux virtual device drivers bind event channels dynamically when they set up their communications with another domain. I think there are some statically allocated event channels for essential services (e.g. for XenStore and the domain''s console).> 4. It seems I need to allocate one device channel per device, is thistrue? Yes, but the device channel is something you build yourself using shared memory and event channels - it''s up to you how you implement it. In summary: event channels and shared memory are concrete services provided by Xen using an API. A "device channel" is a high level term for the way drivers use these facilities to communicate. I hope this helps, please ask if you need any clarification. Cheers, Mark _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2007-Mar-16 08:38 UTC
[Xen-devel] RE: Questions about device/event channels in Xen.
I have no idea on any of the below questions. Perhaps you may want to send it to xen-devel. -- Mats> -----Original Message----- > From: Liang Yang [mailto:multisyncfe991@hotmail.com] > Sent: 15 March 2007 22:15 > To: xen-devel@lists.xensource.com > Cc: Petersson, Mats > Subject: Questions about device/event channels in Xen. > > Hello, > > I just have several questions about device and event channel: > 1. From the implementation point of view, are device and > event channel the > same (i.e. both based on shared memory)? > > 2. In Xen papers, it is said up to 1024 channels are > supported per domain. > Does 1024 include both device channel and event channel? > > 3. Are these device/event channels allocated dynamically or > statically for > each domain? > > 4. It seems I need to allocate one device channel per device, > is this true? > > Thanks, > > Liang > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-16 17:30 UTC
[Xen-devel] Does Dom0 always get interrupts first before they are delivered to other guest domains?
Hello, It seems if HVM domains access device using emulation mode w/ device model in domain0, Xen hypervisor will send the interrupt event to domain0 first and then the device model in domain0 will send event to HVM domains. However, if I''m using split driver model and I only run BE driver on domain0. Does domain0 still get the interrupt first (assume this interupt is not owned by the Xen hypervisor ,e.g. local APIC timer) or Xen hypervisor will send event directly to HVM domain bypass domain0 for split driver model? Another question is: for interrupt delivery, does Xen treat para-virtualized domain differently from HVM domain considering using device model and split driver model? Thanks a lot, Liang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2007-Mar-16 17:40 UTC
RE: [Xen-devel] Does Dom0 always get interrupts first before they are delivered to other guest domains?
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang > Sent: 16 March 2007 17:30 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Does Dom0 always get interrupts first > before they are delivered to other guest domains? > > Hello, > > It seems if HVM domains access device using emulation mode > w/ device model > in domain0, Xen hypervisor will send the interrupt event to > domain0 first > and then the device model in domain0 will send event to HVM domains.Ok, so let''s see if I''ve understood your question first: If we do a disk-read (for example), the actual disk-read operation itself will generate an interrupt, which goes into Xen HV where it''s converted to an event that goes to Dom0, which in turn wakes up the pending call to read (in this case) that was requesting the disk IO, and then when the read-call is finished an event is sent to the HVM DomU. Is this the sequence of events that you''re talking about? If that''s what you are talking about, it must be done this way.> > However, if I''m using split driver model and I only run BE driver on > domain0. Does domain0 still get the interrupt first (assume > this interupt is > not owned by the Xen hypervisor ,e.g. local APIC timer) or > Xen hypervisor > will send event directly to HVM domain bypass domain0 for > split driver > model?Not in the above type of scenario. The interrupt must go to the driver-domain (normally Dom0) to indicate that the hardware is ready to deliver the data. This will wake up the user-mode call that waited for the data, and then the data can be delivered to the guest domain from there (which in turn is awakened by the event sent from the driver domain). There is no difference in the number of events in these two cases. There is however a big difference in the number of hypervisor-to-dom0 events that occur: the HVM model will require something in the order of 5 writes to the IDE controller to perform one disk read/write operation. Each of those will incur one event to wake up qemu-dm, and one event to wake the domu (which will most likely just to one or two instructions forward to hit the next write to the IDE controller).> > Another question is: for interrupt delivery, does Xen treat > para-virtualized > domain differently from HVM domain considering using device > model and split > driver model?Not in interrupt delivery, no. Except for the fact that HVM domains obviously have full hardware interfaces for interrupt controllers etc, which adds a little bit of overhead (because each interrupt needs to be acknowledged/cancelled on the interrupt controller, for example). -- Mats> > Thanks a lot, > > Liang > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-16 18:48 UTC
Re: [Xen-devel] Does Dom0 always get interrupts first before they are delivered to other guest domains?
Hi Mats, Thanks. I still have two more questions: First, you once gave another excellent explanation about the communication between HVM domain and HV (15 Feb 2007 ). Here I quote part of it "...Since these IO events are synchronous in a real processor, the hypervisor will wait for a "return event" before the guest is allowed to continue. Qemu-dm runs as a normal user-process in Dom0..." My question is about those Synchronous I/O events. Why can''t we make them asynchronous? e.g. whenever I/O are done, we can interrupt HV again and let HV resume I/O processing. Is there any specific limiation to force Xen hypervisor do I/O in synchronous mode? Second, you just mentioned there is big difference between the number of HV-to-domain0 events for device model and split driver model. Could you elaborate the details about how split driver model can reduce the HV-to-domain0 events compared with using qemu device model? Have a wonderful weekend, Liang ----- Original Message ----- From: "Petersson, Mats" <Mats.Petersson@amd.com> To: "Liang Yang" <multisyncfe991@hotmail.com>; <xen-devel@lists.xensource.com> Sent: Friday, March 16, 2007 10:40 AM Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before they are delivered to other guest domains?> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang > Sent: 16 March 2007 17:30 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Does Dom0 always get interrupts first > before they are delivered to other guest domains? > > Hello, > > It seems if HVM domains access device using emulation mode > w/ device model > in domain0, Xen hypervisor will send the interrupt event to > domain0 first > and then the device model in domain0 will send event to HVM domains.Ok, so let''s see if I''ve understood your question first: If we do a disk-read (for example), the actual disk-read operation itself will generate an interrupt, which goes into Xen HV where it''s converted to an event that goes to Dom0, which in turn wakes up the pending call to read (in this case) that was requesting the disk IO, and then when the read-call is finished an event is sent to the HVM DomU. Is this the sequence of events that you''re talking about? If that''s what you are talking about, it must be done this way.> > However, if I''m using split driver model and I only run BE driver on > domain0. Does domain0 still get the interrupt first (assume > this interupt is > not owned by the Xen hypervisor ,e.g. local APIC timer) or > Xen hypervisor > will send event directly to HVM domain bypass domain0 for > split driver > model?Not in the above type of scenario. The interrupt must go to the driver-domain (normally Dom0) to indicate that the hardware is ready to deliver the data. This will wake up the user-mode call that waited for the data, and then the data can be delivered to the guest domain from there (which in turn is awakened by the event sent from the driver domain). There is no difference in the number of events in these two cases. There is however a big difference in the number of hypervisor-to-dom0 events that occur: the HVM model will require something in the order of 5 writes to the IDE controller to perform one disk read/write operation. Each of those will incur one event to wake up qemu-dm, and one event to wake the domu (which will most likely just to one or two instructions forward to hit the next write to the IDE controller).> > Another question is: for interrupt delivery, does Xen treat > para-virtualized > domain differently from HVM domain considering using device > model and split > driver model?Not in interrupt delivery, no. Except for the fact that HVM domains obviously have full hardware interfaces for interrupt controllers etc, which adds a little bit of overhead (because each interrupt needs to be acknowledged/cancelled on the interrupt controller, for example). -- Mats> > Thanks a lot, > > Liang > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-19 16:33 UTC
[Xen-devel] Does Xen also plan to move the back-end driver to the stub domain for HVM?
Hi, Based on the roadmap on Xen summit, there is a plan to move QEMU and let it run on the stub domain to improve HVM performance. However, comparing with QEMU device model, it will be much easier to move BE driver and let it run in stub domain instead of dom0 as BE part is running on the kernel space (QEMU is running on user space). but I''m little bit confused about the relationship between stub domain and guest domain. Is the stub domain part of guest domain? Does each guest domain have a stub domain which is created when the guest domain is created? If the stub domain is part of guest domain, does porting device model to stub domain compromise the orginial design purpose of isoloated devide domain? Thanks, Liang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2007-Mar-19 16:45 UTC
RE: [Xen-devel] Does Xen also plan to move the back-end driver to the stub domain for HVM?
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang > Sent: 19 March 2007 16:34 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Does Xen also plan to move the back-end > driver to the stub domain for HVM? > > Hi, > > Based on the roadmap on Xen summit, there is a plan to move > QEMU and let it > run on the stub domain to improve HVM performance. However, > comparing with > QEMU device model, it will be much easier to move BE driver > and let it run > in stub domain instead of dom0 as BE part is running on the > kernel space > (QEMU is running on user space).But that wouldn''t serve the same purpose. What would you solve with doing this? The purpose of the stub-domain is to ensure that QEMU-DM runs on the same CPU as the domain needing the device-model, which in turn serves several purposes: 1. It reduces the load on Dom0. Dom0 can end up being the bottleneck quite qucikly for a HVM system with many domains. 2. It reduces the latency in switching (because there is no OTHER processor to wake up, wait for qemu-dm to react, etc, etc). The back-end driver, on the other hand, is there to serve as a bridge between the virtual device in the guest and the hardware owner (dom0). Since there''s no plan to let guest-domains straight onto hardware (besides the what''s currently allowed with the pci-hide and pci-passthrough - where the guest domain OWNS that hardware exclusively), there''s still a need to communicate from DomU to Dom0 (or whichever domain it is that owns the hardware involved).> > but I''m little bit confused about the relationship between > stub domain and > guest domain. Is the stub domain part of guest domain? Does > each guest > domain have a stub domain which is created when the guest > domain is created?Yes, each guest domain will have a stub-domain, according to what I understand.> > If the stub domain is part of guest domain, does porting > device model to > stub domain compromise the orginial design purpose of > isoloated devide > domain?No, because the stub-domain will still communicate with Dom0 once it''s got a full packet of IO request (cf. our discussion on IDE controller for example). The purpose of the stub-domain is primarily to reduce the overhead of Dom0. There are quite a few IO requests that can be resolved almost entirely in the qemu-dm itself, which means that the Dom0 wouldn''t have to be bothered at all. Other requests do require that Dom0 is involved. But if 1 in 4 requests go to Dom0, that means that the stub-domain can solve 3 in 4 requests without going through Dom0 - that''s where the big saving is. -- Mats> > Thanks, > > Liang > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Mar-19 18:20 UTC
[Xen-devel] Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?
Liang Yang wrote:> Hi, > > Based on the roadmap on Xen summit, there is a plan to move QEMU and let > it run on the stub domain to improve HVM performance.Using a stub domain won''t improve HVM performance. It will improve accountability and scalability but running a single HVM guest shouldn''t see any improvement.> However, comparing > with QEMU device model, it will be much easier to move BE driver and let > it run in stub domain instead of dom0 as BE part is running on the > kernel space (QEMU is running on user space).Actually, this cannot make performance better since you''re technically adding another layer of indirection in the picture. Within dom0, qemu-dm has direct access to the hardware. Fortunately, the Xen BE/FE model is quite good performance wise so there shouldn''t be a performance regression here.> but I''m little bit confused about the relationship between stub domain > and guest domain. Is the stub domain part of guest domain? Does each > guest domain have a stub domain which is created when the guest domain > is created?A lot of this is still being worked out. From a user perspective, the idea would be that creating an HVM domain would be identical to how it''s done today. What happens under the covers though remains to be seen. Regards, Anthony Liguori> If the stub domain is part of guest domain, does porting device model to > stub domain compromise the orginial design purpose of isoloated devide > domain? > > Thanks, > > Liang_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-19 19:21 UTC
Re: [Xen-devel] Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?
"QEMU has direct access to hardware", does this mean the QEMU device model does not need to communicate with the native device driver which is also sitting in dom0? ----- Original Message ----- From: "Anthony Liguori" <aliguori@us.ibm.com> To: "Liang Yang" <multisyncfe991@hotmail.com> Cc: <xen-devel@lists.xensource.com> Sent: Monday, March 19, 2007 11:20 AM Subject: [Xen-devel] Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?> Liang Yang wrote: >> Hi, >> >> Based on the roadmap on Xen summit, there is a plan to move QEMU and let >> it run on the stub domain to improve HVM performance. > > Using a stub domain won''t improve HVM performance. It will improve > accountability and scalability but running a single HVM guest shouldn''t > see any improvement. > >> However, comparing with QEMU device model, it will be much easier to move >> BE driver and let it run in stub domain instead of dom0 as BE part is >> running on the kernel space (QEMU is running on user space). > > Actually, this cannot make performance better since you''re technically > adding another layer of indirection in the picture. Within dom0, qemu-dm > has direct access to the hardware. Fortunately, the Xen BE/FE model is > quite good performance wise so there shouldn''t be a performance regression > here. > >> but I''m little bit confused about the relationship between stub domain >> and guest domain. Is the stub domain part of guest domain? Does each >> guest domain have a stub domain which is created when the guest domain is >> created? > > A lot of this is still being worked out. From a user perspective, the > idea would be that creating an HVM domain would be identical to how it''s > done today. What happens under the covers though remains to be seen. > > Regards, > > Anthony Liguori > >> If the stub domain is part of guest domain, does porting device model to >> stub domain compromise the orginial design purpose of isoloated devide >> domain? >> >> Thanks, >> >> Liang > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Mar-19 20:20 UTC
Re: [Xen-devel] Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?
Liang Yang wrote:> "QEMU has direct access to hardware", does this mean the QEMU device > model does not need to communicate with the native device driver which > is also sitting in dom0? >No, it means that it communicates with the native device drivers directly instead of going through another indirection layer (namely, the front and backend drivers). Regards, Anthony Liguori> ----- Original Message ----- From: "Anthony Liguori" > <aliguori@us.ibm.com> > To: "Liang Yang" <multisyncfe991@hotmail.com> > Cc: <xen-devel@lists.xensource.com> > Sent: Monday, March 19, 2007 11:20 AM > Subject: [Xen-devel] Re: Does Xen also plan to move the back-end > driver to the stub domain for HVM? > > >> Liang Yang wrote: >>> Hi, >>> >>> Based on the roadmap on Xen summit, there is a plan to move QEMU and >>> let it run on the stub domain to improve HVM performance. >> >> Using a stub domain won''t improve HVM performance. It will improve >> accountability and scalability but running a single HVM guest >> shouldn''t see any improvement. >> >>> However, comparing with QEMU device model, it will be much easier to >>> move BE driver and let it run in stub domain instead of dom0 as BE >>> part is running on the kernel space (QEMU is running on user space). >> >> Actually, this cannot make performance better since you''re >> technically adding another layer of indirection in the picture. >> Within dom0, qemu-dm has direct access to the hardware. Fortunately, >> the Xen BE/FE model is quite good performance wise so there shouldn''t >> be a performance regression here. >> >>> but I''m little bit confused about the relationship between stub >>> domain and guest domain. Is the stub domain part of guest domain? >>> Does each guest domain have a stub domain which is created when the >>> guest domain is created? >> >> A lot of this is still being worked out. From a user perspective, >> the idea would be that creating an HVM domain would be identical to >> how it''s done today. What happens under the covers though remains to >> be seen. >> >> Regards, >> >> Anthony Liguori >> >>> If the stub domain is part of guest domain, does porting device >>> model to stub domain compromise the orginial design purpose of >>> isoloated devide domain? >>> >>> Thanks, >>> >>> Liang >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-19 21:56 UTC
[Xen-devel] Question about reserving one CPU for the Xen hypervisor in case of vm exit.
Hi, My platform has two dual-core processors with VT-x enabled. Suppose I use "xm vcpu-pin" command to set up a fixed mapping between each physical processor/core to virtual cpu (to avoid possible migration). I have three domains, one is dom0, the second is domUP and the third is domUF (HVM domain). I give each domain one CPU and reserve one for hypervisor. What I want to do is to always keep one CPU idle (reserving it for VMM), Xen hyperviso can thus always use this idle CPU whenever a "vm exit" happens and the guest HVM domain still has its own CPU to do some overlapping processing (to improve performance). Is this feasible? Thanks, Liang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2007-Mar-20 10:03 UTC
RE: [Xen-devel] Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang > Sent: 19 March 2007 19:21 > To: Anthony Liguori > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Re: Does Xen also plan to move the > back-end driver to the stub domain for HVM? > > "QEMU has direct access to hardware", does this mean the QEMU > device model > does not need to communicate with the native device driver > which is also > sitting in dom0?No, it needs the Dom0 device driver. -- Mats> > > ----- Original Message ----- > From: "Anthony Liguori" <aliguori@us.ibm.com> > To: "Liang Yang" <multisyncfe991@hotmail.com> > Cc: <xen-devel@lists.xensource.com> > Sent: Monday, March 19, 2007 11:20 AM > Subject: [Xen-devel] Re: Does Xen also plan to move the > back-end driver to > the stub domain for HVM? > > > > Liang Yang wrote: > >> Hi, > >> > >> Based on the roadmap on Xen summit, there is a plan to > move QEMU and let > >> it run on the stub domain to improve HVM performance. > > > > Using a stub domain won''t improve HVM performance. It will improve > > accountability and scalability but running a single HVM > guest shouldn''t > > see any improvement. > > > >> However, comparing with QEMU device model, it will be much > easier to move > >> BE driver and let it run in stub domain instead of dom0 as > BE part is > >> running on the kernel space (QEMU is running on user space). > > > > Actually, this cannot make performance better since you''re > technically > > adding another layer of indirection in the picture. Within > dom0, qemu-dm > > has direct access to the hardware. Fortunately, the Xen > BE/FE model is > > quite good performance wise so there shouldn''t be a > performance regression > > here. > > > >> but I''m little bit confused about the relationship between > stub domain > >> and guest domain. Is the stub domain part of guest domain? > Does each > >> guest domain have a stub domain which is created when the > guest domain is > >> created? > > > > A lot of this is still being worked out. From a user > perspective, the > > idea would be that creating an HVM domain would be > identical to how it''s > > done today. What happens under the covers though remains > to be seen. > > > > Regards, > > > > Anthony Liguori > > > >> If the stub domain is part of guest domain, does porting > device model to > >> stub domain compromise the orginial design purpose of > isoloated devide > >> domain? > >> > >> Thanks, > >> > >> Liang > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2007-Mar-20 10:13 UTC
RE: [Xen-devel] Question about reserving one CPU for the Xen hypervisor in case of vm exit.
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang > Sent: 19 March 2007 21:56 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Question about reserving one CPU for the > Xen hypervisor in case of vm exit. > > Hi, > > My platform has two dual-core processors with VT-x enabled. > Suppose I use > "xm vcpu-pin" command to set up a fixed mapping between each physical > processor/core to virtual cpu (to avoid possible migration). > > I have three domains, one is dom0, the second is domUP and > the third is > domUF (HVM domain). I give each domain one CPU and reserve one for > hypervisor. What I want to do is to always keep one CPU idle > (reserving it > for VMM), Xen hyperviso can thus always use this idle CPU > whenever a "vm > exit" happens and the guest HVM domain still has its own CPU > to do some > overlapping processing (to improve performance).That will leave you with one CPU sitting there doing absolutely nothing, as the VMEXIT handling is all done on the CPU that causes the VMEXIT in the first place. The same applies for hypercalls from the PV side. They all happen on the same CPU that the guest is running on. It''s a good idea to allow Dom0 to have it''s own CPU, but beyond that, you''re better off sharing the three CPU''s between your two guests in one way or another - obviously, you can''t give one and a half CPU to a guest, so you probably will have to give both guests two CPU''s to make efficient use of the system. Or give one CPU to one guest and two to the other guest. -- Mats> > Is this feasible? > > Thanks, > > Liang > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Mar-21 00:37 UTC
Re: [Xen-devel] Does Dom0 always get interrupts first before they are delivered to other guest domains?
Hi,> First, you once gave another excellent explanation about the communication > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it > "...Since these IO events are synchronous in a real processor, the > hypervisor will wait for a "return event" before the guest is allowed to > continue. Qemu-dm runs as a normal user-process in Dom0..." > My question is about those Synchronous I/O events. Why can''t we make them > asynchronous? e.g. whenever I/O are done, we can interrupt HV again and let > HV resume I/O processing. Is there any specific limiation to force Xen > hypervisor do I/O in synchronous mode?Was this talking about IO port reads / writes? The problem with IO port reads is that the guest expects the hardware to have responded to an IO port read and for the result to be available as soon as the inb (or whatever) instruction has finished... Therefore in a virtual machine, we can''t return to the guest until we''ve figured out (by emulating using the device model) what that read should return. Consecutive writes can potentially be batched, I believe, and there has been talk of implementing that. I don''t see any reason why other VCPUs shouldn''t keep running in the meantime, though.> Second, you just mentioned there is big difference between the number of > HV-to-domain0 events for device model and split driver model. Could you > elaborate the details about how split driver model can reduce the > HV-to-domain0 events compared with using qemu device model?The PV split drivers are designed to minimise events: they''ll queue up a load of IO requests in a batch and then notify dom0 that the IO requests are ready. In contrast, the FV device emulation can''t do this: we have to consult dom0 for the emulation of any device operations the guest does (e.g. each IO port read the guest does) so the batching is less efficient. Cheers, Mark> Have a wonderful weekend, > > Liang > > ----- Original Message ----- > From: "Petersson, Mats" <Mats.Petersson@amd.com> > To: "Liang Yang" <multisyncfe991@hotmail.com>; > <xen-devel@lists.xensource.com> > Sent: Friday, March 16, 2007 10:40 AM > Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before they > are delivered to other guest domains? > > > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang > > Sent: 16 March 2007 17:30 > > To: xen-devel@lists.xensource.com > > Subject: [Xen-devel] Does Dom0 always get interrupts first > > before they are delivered to other guest domains? > > > > Hello, > > > > It seems if HVM domains access device using emulation mode > > w/ device model > > in domain0, Xen hypervisor will send the interrupt event to > > domain0 first > > and then the device model in domain0 will send event to HVM domains. > > Ok, so let''s see if I''ve understood your question first: > If we do a disk-read (for example), the actual disk-read operation > itself will generate an interrupt, which goes into Xen HV where it''s > converted to an event that goes to Dom0, which in turn wakes up the > pending call to read (in this case) that was requesting the disk IO, and > then when the read-call is finished an event is sent to the HVM DomU. Is > this the sequence of events that you''re talking about? > > If that''s what you are talking about, it must be done this way. > > > However, if I''m using split driver model and I only run BE driver on > > domain0. Does domain0 still get the interrupt first (assume > > this interupt is > > not owned by the Xen hypervisor ,e.g. local APIC timer) or > > Xen hypervisor > > will send event directly to HVM domain bypass domain0 for > > split driver > > model? > > Not in the above type of scenario. The interrupt must go to the > driver-domain (normally Dom0) to indicate that the hardware is ready to > deliver the data. This will wake up the user-mode call that waited for > the data, and then the data can be delivered to the guest domain from > there (which in turn is awakened by the event sent from the driver > domain). > > There is no difference in the number of events in these two cases. > > There is however a big difference in the number of hypervisor-to-dom0 > events that occur: the HVM model will require something in the order of > 5 writes to the IDE controller to perform one disk read/write operation. > Each of those will incur one event to wake up qemu-dm, and one event to > wake the domu (which will most likely just to one or two instructions > forward to hit the next write to the IDE controller). > > > Another question is: for interrupt delivery, does Xen treat > > para-virtualized > > domain differently from HVM domain considering using device > > model and split > > driver model? > > Not in interrupt delivery, no. Except for the fact that HVM domains > obviously have full hardware interfaces for interrupt controllers etc, > which adds a little bit of overhead (because each interrupt needs to be > acknowledged/cancelled on the interrupt controller, for example). > > -- > Mats > > > Thanks a lot, > > > > Liang > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-21 01:23 UTC
RE: [Xen-devel] Does Dom0 always get interrupts first before they are delivered to other guest domains?
Hi Mark, Thanks. I have another question about using VT-X and Hypercall to support para-virtualized and full-virtualized domain simultaneously: It seems Xen does not need to use hypercall to replace all problematic instructions (e.g. HLT, POPF etc.). For example, there is an instruction called CLTS. Instead of replacing it with a hypercall, Xen hypervisor will first delegate it to ring 0 when a GP fault occurs and then run it from there to solve ring aliasing issue. (http://www.linuxjournal.com/comment/reply/8909 talked about this). Now my first question comes up: if I ''m running both para-virtualized and full-virtualized domain on single CPU (I think Xen hypervisor will set up the exception bitmap for CLTS instruction for HVM domain). Then Xen hypervisor will be confused and does not know how to handle it when running CLTS in ring 1. Does Xen hypervisor do a VM EXIT or still delegate CLTS to ring 0? How does Xen hypervisor distinguish the instruction is from para-virtualized domain or is from a full-virtualized domain? Does Xen have to replace all problematic instructions with hypercalls for Para-domain (even for CLTS)? Why does Xen need to use different strategies in para-virtualized domain to handle CLTS (delegation to ring 0) and other problematic instructions (hypercall)? My second question: It seems each processor has its own exception bitmap. If I have multi-processors (vt-x enabled), does Xen hypervisor use the same exception bitmap in all processors or does Xen allow different processor have its own (maybe different) exception bitmap? Best regards, Liang -----Original Message----- From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On Behalf Of Mark Williamson Sent: Tuesday, March 20, 2007 5:37 PM To: xen-devel@lists.xensource.com Cc: Liang Yang; Petersson, Mats Subject: Re: [Xen-devel] Does Dom0 always get interrupts first before they are delivered to other guest domains? Hi,> First, you once gave another excellent explanation about the communication > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it > "...Since these IO events are synchronous in a real processor, the > hypervisor will wait for a "return event" before the guest is allowed to > continue. Qemu-dm runs as a normal user-process in Dom0..." > My question is about those Synchronous I/O events. Why can''t we make them > asynchronous? e.g. whenever I/O are done, we can interrupt HV again andlet> HV resume I/O processing. Is there any specific limiation to force Xen > hypervisor do I/O in synchronous mode?Was this talking about IO port reads / writes? The problem with IO port reads is that the guest expects the hardware to have responded to an IO port read and for the result to be available as soon as the inb (or whatever) instruction has finished... Therefore in a virtual machine, we can''t return to the guest until we''ve figured out (by emulating using the device model) what that read should return. Consecutive writes can potentially be batched, I believe, and there has been talk of implementing that. I don''t see any reason why other VCPUs shouldn''t keep running in the meantime, though.> Second, you just mentioned there is big difference between the number of > HV-to-domain0 events for device model and split driver model. Could you > elaborate the details about how split driver model can reduce the > HV-to-domain0 events compared with using qemu device model?The PV split drivers are designed to minimise events: they''ll queue up a load of IO requests in a batch and then notify dom0 that the IO requests are ready. In contrast, the FV device emulation can''t do this: we have to consult dom0 for the emulation of any device operations the guest does (e.g. each IO port read the guest does) so the batching is less efficient. Cheers, Mark> Have a wonderful weekend, > > Liang > > ----- Original Message ----- > From: "Petersson, Mats" <Mats.Petersson@amd.com> > To: "Liang Yang" <multisyncfe991@hotmail.com>; > <xen-devel@lists.xensource.com> > Sent: Friday, March 16, 2007 10:40 AM > Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before they > are delivered to other guest domains? > > > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang > > Sent: 16 March 2007 17:30 > > To: xen-devel@lists.xensource.com > > Subject: [Xen-devel] Does Dom0 always get interrupts first > > before they are delivered to other guest domains? > > > > Hello, > > > > It seems if HVM domains access device using emulation mode > > w/ device model > > in domain0, Xen hypervisor will send the interrupt event to > > domain0 first > > and then the device model in domain0 will send event to HVM domains. > > Ok, so let''s see if I''ve understood your question first: > If we do a disk-read (for example), the actual disk-read operation > itself will generate an interrupt, which goes into Xen HV where it''s > converted to an event that goes to Dom0, which in turn wakes up the > pending call to read (in this case) that was requesting the disk IO, and > then when the read-call is finished an event is sent to the HVM DomU. Is > this the sequence of events that you''re talking about? > > If that''s what you are talking about, it must be done this way. > > > However, if I''m using split driver model and I only run BE driver on > > domain0. Does domain0 still get the interrupt first (assume > > this interupt is > > not owned by the Xen hypervisor ,e.g. local APIC timer) or > > Xen hypervisor > > will send event directly to HVM domain bypass domain0 for > > split driver > > model? > > Not in the above type of scenario. The interrupt must go to the > driver-domain (normally Dom0) to indicate that the hardware is ready to > deliver the data. This will wake up the user-mode call that waited for > the data, and then the data can be delivered to the guest domain from > there (which in turn is awakened by the event sent from the driver > domain). > > There is no difference in the number of events in these two cases. > > There is however a big difference in the number of hypervisor-to-dom0 > events that occur: the HVM model will require something in the order of > 5 writes to the IDE controller to perform one disk read/write operation. > Each of those will incur one event to wake up qemu-dm, and one event to > wake the domu (which will most likely just to one or two instructions > forward to hit the next write to the IDE controller). > > > Another question is: for interrupt delivery, does Xen treat > > para-virtualized > > domain differently from HVM domain considering using device > > model and split > > driver model? > > Not in interrupt delivery, no. Except for the fact that HVM domains > obviously have full hardware interfaces for interrupt controllers etc, > which adds a little bit of overhead (because each interrupt needs to be > acknowledged/cancelled on the interrupt controller, for example). > > -- > Mats > > > Thanks a lot, > > > > Liang > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Mar-21 08:31 UTC
RE: [Xen-devel] Does Dom0 always get interrupts first before they aredelivered to other guest domains?
>From: Liang Yang >Sent: 2007年3月21日 9:23 > >Now my first question comes up: if I ''m running both para-virtualized and >full-virtualized domain on single CPU (I think Xen hypervisor will set up >the exception bitmap for CLTS instruction for HVM domain). Then Xen >hypervisor will be confused and does not know how to handle it when >running >CLTS in ring 1.Whenever Xen hypervisor is running, there''s always a current vcpu context from which Xen can easily know whether current domain is para-virtualized or not. Para-virtualized and HVM guest has different entry point for the above CLTS example. For para-virtualized guest, it''s the GP fault handler of Xen to be invoked at the point. For HVM guest, it''s the VM-EXIT handler to be invoked with detail reason. When running within guest, hardware knows whether running environment is with hardware virtualization assist or not, and then can decide which path to enter when fault happens. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2007-Mar-21 09:13 UTC
RE: [Xen-devel] Does Dom0 always get interrupts first before they are delivered to other guest domains?
First of: Forgive me for top-posting, but I think this message should be seen by all, and isn''t really a response to the post below anyway. Could you (Liang Yang) please avoid sending the SAME question to both me privately and the mailing list. It''s called cross-posting and not a "nice" thing, as I may not realize that it''s been posted to two different places. To everyone else, I''ve already answered the below questions (aside from the bit that wasn''t in the mail to me, but that''s been answered by Kevin anyways). -- Mats> -----Original Message----- > From: Liang Yang [mailto:multisyncfe991@hotmail.com] > Sent: 21 March 2007 01:23 > To: ''Mark Williamson''; xen-devel@lists.xensource.com > Cc: Petersson, Mats > Subject: RE: [Xen-devel] Does Dom0 always get interrupts > first before they are delivered to other guest domains? > > Hi Mark, > > Thanks. > > I have another question about using VT-X and Hypercall to support > para-virtualized and full-virtualized domain simultaneously: > > It seems Xen does not need to use hypercall to replace all problematic > instructions (e.g. HLT, POPF etc.). For example, there is an > instruction > called CLTS. Instead of replacing it with a hypercall, Xen > hypervisor will > first delegate it to ring 0 when a GP fault occurs and then > run it from > there to solve ring aliasing issue. > (http://www.linuxjournal.com/comment/reply/8909 talked about this). > > Now my first question comes up: if I ''m running both > para-virtualized and > full-virtualized domain on single CPU (I think Xen hypervisor > will set up > the exception bitmap for CLTS instruction for HVM domain). Then Xen > hypervisor will be confused and does not know how to handle > it when running > CLTS in ring 1. > > Does Xen hypervisor do a VM EXIT or still delegate CLTS to > ring 0? How does > Xen hypervisor distinguish the instruction is from > para-virtualized domain > or is from a full-virtualized domain? Does Xen have to replace all > problematic instructions with hypercalls for Para-domain > (even for CLTS)? > Why does Xen need to use different strategies in > para-virtualized domain to > handle CLTS (delegation to ring 0) and other problematic instructions > (hypercall)? > > > My second question: > It seems each processor has its own exception bitmap. If I have > multi-processors (vt-x enabled), does Xen hypervisor use the > same exception > bitmap in all processors or does Xen allow different > processor have its own > (maybe different) exception bitmap? > > Best regards, > > Liang > > -----Original Message----- > From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On > Behalf Of Mark > Williamson > Sent: Tuesday, March 20, 2007 5:37 PM > To: xen-devel@lists.xensource.com > Cc: Liang Yang; Petersson, Mats > Subject: Re: [Xen-devel] Does Dom0 always get interrupts > first before they > are delivered to other guest domains? > > Hi, > > > First, you once gave another excellent explanation about > the communication > > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it > > "...Since these IO events are synchronous in a real processor, the > > hypervisor will wait for a "return event" before the guest > is allowed to > > continue. Qemu-dm runs as a normal user-process in Dom0..." > > My question is about those Synchronous I/O events. Why > can''t we make them > > asynchronous? e.g. whenever I/O are done, we can interrupt > HV again and > let > > HV resume I/O processing. Is there any specific limiation > to force Xen > > hypervisor do I/O in synchronous mode? > > Was this talking about IO port reads / writes? > > The problem with IO port reads is that the guest expects the > hardware to > have > responded to an IO port read and for the result to be > available as soon as > the inb (or whatever) instruction has finished... Therefore > in a virtual > machine, we can''t return to the guest until we''ve figured out > (by emulating > using the device model) what that read should return. > > Consecutive writes can potentially be batched, I believe, and > there has been > > talk of implementing that. > > I don''t see any reason why other VCPUs shouldn''t keep running in the > meantime, > though. > > > Second, you just mentioned there is big difference between > the number of > > HV-to-domain0 events for device model and split driver > model. Could you > > elaborate the details about how split driver model can reduce the > > HV-to-domain0 events compared with using qemu device model? > > The PV split drivers are designed to minimise events: they''ll > queue up a > load > of IO requests in a batch and then notify dom0 that the IO > requests are > ready. > > In contrast, the FV device emulation can''t do this: we have > to consult dom0 > for the emulation of any device operations the guest does > (e.g. each IO port > > read the guest does) so the batching is less efficient. > > Cheers, > Mark > > > Have a wonderful weekend, > > > > Liang > > > > ----- Original Message ----- > > From: "Petersson, Mats" <Mats.Petersson@amd.com> > > To: "Liang Yang" <multisyncfe991@hotmail.com>; > > <xen-devel@lists.xensource.com> > > Sent: Friday, March 16, 2007 10:40 AM > > Subject: RE: [Xen-devel] Does Dom0 always get interrupts > first before they > > are delivered to other guest domains? > > > > > -----Original Message----- > > > From: xen-devel-bounces@lists.xensource.com > > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf > Of Liang Yang > > > Sent: 16 March 2007 17:30 > > > To: xen-devel@lists.xensource.com > > > Subject: [Xen-devel] Does Dom0 always get interrupts first > > > before they are delivered to other guest domains? > > > > > > Hello, > > > > > > It seems if HVM domains access device using emulation mode > > > w/ device model > > > in domain0, Xen hypervisor will send the interrupt event to > > > domain0 first > > > and then the device model in domain0 will send event to > HVM domains. > > > > Ok, so let''s see if I''ve understood your question first: > > If we do a disk-read (for example), the actual disk-read operation > > itself will generate an interrupt, which goes into Xen HV where it''s > > converted to an event that goes to Dom0, which in turn wakes up the > > pending call to read (in this case) that was requesting the > disk IO, and > > then when the read-call is finished an event is sent to the > HVM DomU. Is > > this the sequence of events that you''re talking about? > > > > If that''s what you are talking about, it must be done this way. > > > > > However, if I''m using split driver model and I only run > BE driver on > > > domain0. Does domain0 still get the interrupt first (assume > > > this interupt is > > > not owned by the Xen hypervisor ,e.g. local APIC timer) or > > > Xen hypervisor > > > will send event directly to HVM domain bypass domain0 for > > > split driver > > > model? > > > > Not in the above type of scenario. The interrupt must go to the > > driver-domain (normally Dom0) to indicate that the hardware > is ready to > > deliver the data. This will wake up the user-mode call that > waited for > > the data, and then the data can be delivered to the guest > domain from > > there (which in turn is awakened by the event sent from the driver > > domain). > > > > There is no difference in the number of events in these two cases. > > > > There is however a big difference in the number of > hypervisor-to-dom0 > > events that occur: the HVM model will require something in > the order of > > 5 writes to the IDE controller to perform one disk > read/write operation. > > Each of those will incur one event to wake up qemu-dm, and > one event to > > wake the domu (which will most likely just to one or two > instructions > > forward to hit the next write to the IDE controller). > > > > > Another question is: for interrupt delivery, does Xen treat > > > para-virtualized > > > domain differently from HVM domain considering using device > > > model and split > > > driver model? > > > > Not in interrupt delivery, no. Except for the fact that HVM domains > > obviously have full hardware interfaces for interrupt > controllers etc, > > which adds a little bit of overhead (because each interrupt > needs to be > > acknowledged/cancelled on the interrupt controller, for example). > > > > -- > > Mats > > > > > Thanks a lot, > > > > > > Liang > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > -- > Dave: Just a question. What use is a unicyle with no seat? > And no pedals! > Mark: To answer a question with a question: What use is a skateboard? > Dave: Skateboards have wheels. > Mark: My wheel has a wheel! > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel