thr3ads.net - xen discuss - How xVM is integrated with Crossbow [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Sheng-Long Wu

2009-Jul-19 12:00 UTC

How xVM is integrated with Crossbow

Hi, all

    I want to get some information of How xVM is integrated with 
Crossbow project.
Do some buddies have  those  documents or blogs?

Thanks
--luke

David Edmondson

2009-Jul-20 08:05 UTC

head link

Re: How xVM is integrated with Crossbow

* Shenglong.Wu@Sun.COM [2009-07-19 13:00:28]> I want to get some information of How xVM is integrated with Crossbow
> project.  Do some buddies have those documents or blogs?
Slide 27 of http://dme.org/solaris/doc/xvm/xvm-ktde-20070917.pdf
includes a diagram.

In essence, each network interface in a guest domain uses a VNIC in the
IO domain. A chunk of code sits above the VNIC and relays packets
to/from the guest domain.

dme.
-- 
David Edmondson, Sun Microsystems, http://dme.org

Sheng-Long Wu

2009-Jul-20 08:50 UTC

head link

Re: How xVM is integrated with Crossbow

On 07/20/09 16:05, David Edmondson wrote:> * Shenglong.Wu@Sun.COM [2009-07-19 13:00:28]
>   
>> I want to get some information of How xVM is integrated with Crossbow
>> project.  Do some buddies have those documents or blogs?
>>     
>
> Slide 27 of http://dme.org/solaris/doc/xvm/xvm-ktde-20070917.pdf
> includes a diagram.
>
>   Hi, David

      First of all, Thanks for your reply. I have read it fully, I want 
to how could domU know the vnic have been added in the dom0 with the 
scripts /usr/lib/xen/scripts/vif-vnic ,
and Could I add two or more xnf in the same domU?
 How could I add two or more vnics used by the same domU?

And when the vnic has been created by /usr/lib/xen/scripts/vif-vnic, 
whether  it needs the Backend Driver for this vnic or not, if yes, how 
it loads in to the system.

Thanks in advance
-luke> In essence, each network interface in a guest domain uses a VNIC in the
> IO domain. A chunk of code sits above the VNIC and relays packets
> to/from the guest domain.
>
> dme.
>

David Edmondson

2009-Jul-20 09:07 UTC

head link

Re: How xVM is integrated with Crossbow

* Shenglong.Wu@Sun.COM [2009-07-20 09:50:08]> I want to how could domU know the vnic have been added in the dom0
> with the scripts /usr/lib/xen/scripts/vif-vnic ,
The domU has no way of knowing how the IO domain is providing
connectivity. It doesn''t know that there is a VNIC. Also consider that
if you migrate a domain from a Solaris dom0 to a Linux dom0 the
mechanism used in dom0 will change (Solaris uses VNICs, Linux uses a
bridge).

Why do you need to know this from within the domU?
> and Could I add two or more xnf in the same domU?  How could I add two
> or more vnics used by the same domU?
You can add network interfaces to a guest domain using ''virsh
attach-interface''.
> And when the vnic has been created by /usr/lib/xen/scripts/vif-vnic,
> whether  it needs the Backend Driver for this vnic or not, if yes, how
> it loads in to the system.
The script writes the name of the nic to be used by the backend driver
into xenstore. The backend driver reads xenstore to discover which nic
to open. See usr/src/uts/common/xen/io/xnbo.c.

Backend devices are created by monitoring xenstore for the creation of
nodes. When a domain is created the tools add entries indicating that a
network device (vif) should be made available. The kernel notices this
and creates a new backend device instance to handle it. See
usr/src/uts/common/xen/os/xvdi.c.

dme.
-- 
David Edmondson, Sun Microsystems, http://dme.org

luke woo

2009-Jul-21 03:27 UTC

head link

Re: How xVM is integrated with Crossbow

David Edmondson wrote:

* Shenglong.Wu@Sun.COM [2009-07-20 09:50:08]

I want to how could domU know the vnic have been added in the dom0
with the scripts /usr/lib/xen/scripts/vif-vnic ,

The domU has no way of knowing how the IO domain is providing
connectivity. It doesn''t know that there is a VNIC. Also consider that
if you migrate a domain from a Solaris dom0 to a Linux dom0 the
mechanism used in dom0 will change (Solaris uses VNICs, Linux uses a
bridge).

Why do you need to know this from within the domU?

Hi, David

    I want to know these mechanism for improving the performance of xVM
domU networking,

And find some methods to enhance the networking performance on the xVM

Do you have some suggestion?

Thanks

--luke

and Could I add two or more xnf in the same domU?  How could I add two
or more vnics used by the same domU?

You can add network interfaces to a guest domain using ''virsh
attach-interface''.

And when the vnic has been created by /usr/lib/xen/scripts/vif-vnic,
whether  it needs the Backend Driver for this vnic or not, if yes, how
it loads in to the system.

The script writes the name of the nic to be used by the backend driver
into xenstore. The backend driver reads xenstore to discover which nic
to open. See usr/src/uts/common/xen/io/xnbo.c.

Backend devices are created by monitoring xenstore for the creation of
nodes. When a domain is created the tools add entries indicating that a
network device (vif) should be made available. The kernel notices this
and creates a new backend device instance to handle it. See
usr/src/uts/common/xen/os/xvdi.c.

dme.

Max Zhen

2009-Jul-21 03:52 UTC

head link

Re: How xVM is integrated with Crossbow

luke woo on 2009-7-21 11:27 wrote:> David Edmondson wrote:
>> * Shenglong.Wu@Sun.COM [2009-07-20 09:50:08]
>>   
>>> I want to how could domU know the vnic have been added in the dom0
>>> with the scripts /usr/lib/xen/scripts/vif-vnic ,
>>>     
>>
>> The domU has no way of knowing how the IO domain is providing
>> connectivity. It doesn''t know that there is a VNIC. Also
consider that
>> if you migrate a domain from a Solaris dom0 to a Linux dom0 the
>> mechanism used in dom0 will change (Solaris uses VNICs, Linux uses a
>> bridge).
>>
>> Why do you need to know this from within the domU?
>>
>>   
> Hi, David
>
>     I want to know these mechanism for improving the performance of 
> xVM domU networking,
> And find some methods to enhance the networking performance on the xVM
> Do you have some suggestion?What is your current plan, then?

Max>
>
> Thanks
> --luke
>>> and Could I add two or more xnf in the same domU?  How could I add
two
>>> or more vnics used by the same domU?
>>>     
>>
>> You can add network interfaces to a guest domain using ''virsh
>> attach-interface''.
>>
>>   
>>> And when the vnic has been created by
/usr/lib/xen/scripts/vif-vnic,
>>> whether  it needs the Backend Driver for this vnic or not, if yes,
how
>>> it loads in to the system.
>>>     
>>
>> The script writes the name of the nic to be used by the backend driver
>> into xenstore. The backend driver reads xenstore to discover which nic
>> to open. See usr/src/uts/common/xen/io/xnbo.c.
>>
>> Backend devices are created by monitoring xenstore for the creation of
>> nodes. When a domain is created the tools add entries indicating that a
>> network device (vif) should be made available. The kernel notices this
>> and creates a new backend device instance to handle it. See
>> usr/src/uts/common/xen/os/xvdi.c.
>>
>> dme.
>>   
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> xen-discuss mailing list
> xen-discuss@opensolaris.org
>

David Edmondson

2009-Jul-21 05:31 UTC

head link

Re: How xVM is integrated with Crossbow

* Shenglong.Wu@Sun.COM [2009-07-21 04:27:58]> I want to know these mechanism for improving the performance of xVM
> domU networking,
It''s possible for both the frontend and backend drivers to declare
capabilities that can be discovered and used by their peer (the
''feature-'' and ''request-'' flags in
xenstore). If you have a specific
capability in mind then it should be possible to express it with this.
> And find some methods to enhance the networking performance on the xVM
> Do you have some suggestion?
My current performance work is at
   http://dme.org/solaris/webrev/txcopyop
You should use it as a basis for further work, as I''ll integrate it
into
Nevada soon after the Xen 3.3 putback.

The immediate things that I planned to do after this are:
    - scatter gather support,
    - large send/receive offload support.

The latter should help bulk throughput significantly (currently
struggling to reach 3Gbit/s).

After that the ideas are much more invasive:
      - add support for using multiple rings and interrupt channels,
      - implement netchannel2,
      - ...

Did you have any specific ideas in mind?

dme.
-- 
David Edmondson, Sun Microsystems, http://dme.org

David Edmondson

2009-Jul-21 05:35 UTC

head link

Re: How xVM is integrated with Crossbow

Here are some notes about performance that I prepared a while ago.
> TX is "packets from the guest", RX is "packets for the
guest".
> 
> For discussion purposes, here''s how the TX path works (this is the
fast
> case - if there are resource shortages, ring fills, etc. things are more
> complex):
> 
> domU: xnf is passed a packet chain (typically only a single packet). It:
>       - flattens the message to a single mblk which is contained in a
>         single page (this might be a no-op),
>       - allocates a grant reference,
>       - grants the backend access to the page containing the packet,
>       - gets a slot in the tx ring,
>       - updates the tx ring,
>       - hypercall notifies the backend that a packet is ready.
>       
>       The TX ring is cleaned lazily, usually when getting a slot from
>       the ring fails. Cleaning the ring results in freeing any buffers
>       that were used for transmit.
> 
> dom0: xnb receives an interrupt to say that the xnf sent one or more
>       packets. It:
>       - for each consumed slot in the ring:
>         - add the grant reference of the page containing the packet to a
>           list.
>       - hypercall to map all of the pages for which we have grant
>         references.
>       - for each consumed slot in the ring:
>         - allocate an mblk for the packet.
>         - copy data from the granted page to the mblk.
>         - store mblk in a list.
>       - hypercall to unmap all of the granted pages.
>       - pass the packet chain down to the NIC (typically a VNIC).
> 
> Simpler improvements:
>     - Add support for the scatter-gather extension to our
>       frontend/backend driver pair. This would mean that we don''t
need
>       to flatten mblk chains that belong to a single packet in the
>       frontend driver. I have a quick prototype of this based on some
>       work that Russ did (the Windows driver tends to use long packet
>       chains, so it''s wanted in our backend).
>     - Look at using the ''hypervisor copy'' hypercall to
move data from
>       guest pages into mblks in the backend driver. This would remove
>       the need to map the granted pages into dom0 (which is
>       expensive). Prototyping this should be straightforward and it may
>       provide a big win, but without trying we don''t know.
Certainly it
>       would push the dom0 CPU time down (by moving the work into the
>       hypervisor).
>     - Use the guest provide buffers directly (esballoc) rather than
>       copying the data into more buffers. I had an implementation of
>       this and it suffered in three ways:
>       - The buffer management was poor, causing a lot of lock contention
>         over the ring (the tx completion freed the buffer and this
>         contended with the tx lock used to reap packets from the
>         ring). This could be fixed with a little time.
>       - There are a limited number of ring entries (256) and they cannot
>         be reused until the associated buffer is freed. If the dom0
>         stack or a driver holds on to transmit buffers for a long time,
>         we see ring exhaustion. The Neptune driver was particularly bad
>         for this.
>       - Guests grant read-only mappings for these pages. Unfortunately
>         the Solaris IP stack expects to be able to modify packets which
>         causes page faults. There are a couple of workarounds available:
>         - Modify Solaris guests to grant read/write mappings and
>           indicate this. I have this implemented and it works, but
it''s
>           somewhat undesirable (and doesn''t help with Linux or
Windows
>           guests).
>         - Indicate to the MAC layer that these packets are ''read
only''
>           and have it copy them if they are for the local stack.
>         - Implement an address space manager for the pages used for
>           these packets and handle faults as they occur - somewhat
>           blue-sky this one :-)
> 
> More complex improvements:
>     - Avoid mapping the guest pages into dom0 completely if the packet
>       is not destined for dom0. If the guest is sending a packet to a
>       third party, dom0 doesn''t need to map in the packet at all -
it
>       can pass the MA[1] to the DMA engine of the NIC without ever
>       acquiring a VA. Issues:
>       - We need the destination MAC address of the packet to be included
>         in the TX ring so that we can route the packet (e.g. decide if
>         it''s for dom0, another domU or external). There''s
no room for it
>         in the current ring structures, see "netchannel2"
comments
>         further on.
>       - The MAC layer and any interested drivers would need to learn
>         about packets for which there is currently no VA. This will
>         require *big* changes.
>     - Cache mappings of the granted pages from the guest domain.
It''s
>       not clear how much benefit this would have for the transmit path -
>       we''d need to see how often the same pages for transmit
buffers by
>       the guest.
> 
> Here''s the RX path (again, simpler case):
> 
> domU: When the interface is created, domU:
>       - for each entry in the RX ring:
>         - allocate an MTU sized buffer,
>         - find the PA and MFN[2] of the buffer,
>         - allocate a grant reference for the buffer,
>         - update the ring with the details of the buffer (gref and id)
>       - signal the backend that RX buffers are available
> 
> dom0: When a packet arrives[3]:
>       - driver calls mac_rx() having prepared a packet,
>       - MAC layer classifies the packet (if not for free from the ring
>         used),
>       - MAC layer passes packet chain (usually just one packet) to xnb
>         RX function
>       - xnb RX function:
>         - for each packet in the chain (b_next):
>           - get a slot in the RX ring
>           - for each mblk in the packet (b_cont):
>             - for each page in the mblk[4]:
>               - fill in a hypervisor copy request for this chunk
>           - hypercall to perform the copies
>           - mark the RX ring entry completed
>         - notify the frontend of new packets (if required[5]).
>         - free the packet chain.
> 
> domU: When a packet arrives (notified by the backend):
>       - for each dirty entry in the RX ring:
>         - allocate an mblk for the data
>         - copy the data from the RX buffer to the mblk
>         - add the mblk to the packet chain
>         - mark the ring entry free (e.g. re-post the buffer)
>       - notify the backend that the ring has free entries (if required).
>       - pass the packet chain to mac_rx().
> 
> Simpler improvements:
>     - Don''t allocate a new mblk and copy the data in the domU
interrupt
>       path, rather wrap around the buffer and re-post a new one. This
>       looks like it would be a good win - definitely worth building
>       something to see how it behaves. Obviously the buffer management
>       gets a little more complicated, but it may be worth it. The
>       downside is that it reduces the likely benefit of having the
>       backend cache mappings for the pre-posted RX buffers, as we are
>       much less likely to recycle the same buffers over and over again
>       (which is what happens today).
>     - Update the frontend driver to use the Crossbow polling
>       implementation, significantly reducing the interrupt load on the
>       guest. Max started on this but it has languished since he left
>       us.
> 
> More complex improvements:
>     - Given that the guest pre-posts the buffers that it will use for
>       received data, push these buffers down into the MAC layer,
>       allowing the driver to directly place packets into guest
>       buffers. This presumes that we can get an RX ring in the driver
>       assigned for the MAC address of the guest.
> 
> General things (TX and RX):
>     - Implementing scatter gather should improve some cases, but
it''s
>       not that big a win. It allows us to implement jumbo-frames, which
>       will show improvements in benchmarks. It also leads to...
>     - Implementing LSO/LRO between dom0 and domU could have big
>       benefits, as it will reduce the number of interrupts and the
>       number of hypercalls.
>     - All of the backend xnb instances currently operate independently -
>       they share no state. If there are a large number of active guests
>       it will probably be worth looking at a scheme where we shift to a
>       worker thread per CPU and have that thread responsible for
>       multiple xnb instances. This would allow us to reduce the
>       hypercall count even more.
>     - netchannel2 is a new inter-domain protocol implementation intended
>       to address some of the shortcomings in the current protocol. It
>       includes:
>       - multiple pages of TX/RX descriptors which can either be just
>         bigger rings or independent rings,
>       - multiple event channels (which means multiple interrupts),
>       - improved ring structure (space for MAC addresses, ...).
>       With it there is a proposal for a soft IOMMU implementation to
>       improve the use of grant mappings.
> 
>       We''ve done nothing with netchannel2 so far. In Linux
it''s
>       currently a prototype with changes to an Intel driver to use it
>       with VMDQ.
> 
> Footnotes: 
> [1]  Machine address. In Xen it''s no longer the case that all
memory is
>      mapped into the dom0 kernel - you may not even have a physical
>      mapping for the memory.
> [2]  Machine frame number, analogous to PFN.
> [3]  This assumes packets from an external source. Locally generated
>      packets destined for a guest jump into the flow a couple of items
>      down the list.
> [5]  The frontend controls whether or not notification takes place using
>      a watermark in the ring.
> [4]  Each chunk passed to the hypervisor copy routine must only contain
>      a single page, as we don''t know that the pages are machine
>      contiguous (and it''s pretty expensive to find out).
dme.
-- 
David Edmondson, Sun Microsystems, http://dme.org

Possibly Parallel Threads

Search for more maybe matching threads

xen discuss - Jul 2009 - How xVM is integrated with Crossbow

How xVM is integrated with Crossbow

Re: How xVM is integrated with Crossbow

Re: How xVM is integrated with Crossbow

Re: How xVM is integrated with Crossbow

Re: How xVM is integrated with Crossbow

Re: How xVM is integrated with Crossbow

Re: How xVM is integrated with Crossbow

Re: How xVM is integrated with Crossbow

Possibly Parallel Threads