Nicolas Droux
2007-Aug-28 06:40 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Folks, I just posted an updated Crossbow virtualization architecture document. The new revision is available at: http://opensolaris.org/os/project/crossbow/Docs/crossbow-virt.pdf The main changes are the addition of support for multiple MAC addresses per client, and an explicit separation between consolidation private and project private MAC API entry points. See in particular the updated section 4.3 and chapter 5. Nicolas. -- Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
James Carlson
2007-Aug-28 13:44 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Nicolas Droux writes:> http://opensolaris.org/os/project/crossbow/Docs/crossbow-virt.pdfI have A few questions about this. I''ve also read through as much of the crossbow-discuss archives as seemed to be related to these topics, and didn''t find answers there. - Why are bandwidth, CPU control, and MAC address assignment exclusively a VNIC feature, at least at the administrative level? Section 4.7 seems to say that MAC instances will get these features, so shouldn''t this be "modify-dev" instead? Needing to create a "dummy" VNIC on top of a regular interface just to interpose these new features seems like an implementation artifact. - I assume we need a redesign of the VLAN code in order to get per-VLAN bandwidth control. Is that redesign part of Crossbow, or is it some later project? In reading the archives, it seems that it''s been proposed as part of Crossbow, but in reading this document it seems to be part of something else. - If per-VLAN control appears, do the units of administration change? Does it then become reasonable to talk about bandwidth and CPU control using "set-linkprop"? - Do bandwidth and CPU controls rely on squeues? If so, then VNICs may not be able to control utilization from non-IP traffic, such as with bridging. - I''m not sure I understand the (undocumented? -- not in summary) "-F" option for move-vnic. If I''m using a factory address on one NIC and I move a VNIC to another NIC, does this cause the VNIC to continue using the _same_ address but just on a new NIC? If so, how is duplication avoided if that factory address is ever reused from the original NIC? I would have expected that a VNIC using a factory address would just get a *new* address during a forced move to a new NIC. Changing MAC address during reconfiguration doesn''t seem like a disaster to me -- in fact, it seems expected. Why should it try to retain the address? - For showing statistics with "show-vnic -s", are these the same as "show-link -s"? If so, wouldn''t the existing "show-link -s" do the job? - What do "up" and "down" mean? Are these equivalent to controlling the "RUNNING" bit from user space (i.e., some way of marking link up and link down manually)? Or are they something else? Should regular MAC instances (other than VNICs) have the ability to be set administratively up and down? What would happen if VNICs were always "up?" - What happens if a NIC is oversubscribed by the amount of bandwidth configured for the VNICs? Is the result proportionate (and thus "fair") allocation, or do they compete on some other grounds? What kind of bandwidth control exists here? How granular is it, and what effects do clients see from restricted bandwidth? Are packets dropped (they have to be, if bandwidth limits apply to forwarded traffic)? If so, is it tail drop or something more sophisticated? - Can a VNIC be built atop another non-anchor VNIC? (Seems like the answer is "yes.") - When VNICs share rings due to a lack of hardware resources, what happens when the client of one VNIC is using polling and the client of the other one is not? Won''t one client end up blanking the interrupts for another? - Instead of adding more arguments to mac_open() to handle priority and bandwidth, I''d suggest making these separate calls. You''ll need the separate call anyway to implement the "modify" mechanism. - What exactly does exclusive MAC access do? If mac_exclusive_set is called, are other client requests blocked (sleeping)? Or are they rejected (return error)? Or are they just let through, and all clients are expected to bracket requests with exclusive set/clear calls? - MAC_UNICAST_AUTO seems unnecessary to me. Why not just call first with MAC_UNICAST_FACTORY and, if that fails, call again with MAC_UNICAST_RANDOM? Doing that would even have better functionality as MAC_UNICAST_AUTO seems to omit the possibility of desiring a particular factory address when available. I think having MAC_UNICAST_AUTO in the mix ends up pushing some of the control-path complexity out of the user space and into the kernel. It''d be better to simplify the kernel parts. - What sorts of privileges are required to create and administer VNICs? Are these things that can be delegated to non-global zones? - Why is [V]NIC the right level of bandwidth control? If I want to give a zone 100Mbps worth of bandwidth, but I''m giving it multiple VNICs, how do I do that -- can the bandwidth control logic do accounting based on multiple interfaces (aggregate control, rather than individual interface control)? If I have application-level controls, such as HTTP virtual servers or a sendmail configuration handling multiple domains, how can I control bandwidth for those things? Won''t the application need to be involved? -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Nicolas Droux
2007-Aug-28 22:47 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Jim, Thanks for the comments. James Carlson wrote:> Nicolas Droux writes: >> http://opensolaris.org/os/project/crossbow/Docs/crossbow-virt.pdf > > I have A few questions about this. I''ve also read through as much of > the crossbow-discuss archives as seemed to be related to these > topics, and didn''t find answers there. > > - Why are bandwidth, CPU control, and MAC address assignment > exclusively a VNIC feature, at least at the administrative level? > Section 4.7 seems to say that MAC instances will get these > features, so shouldn''t this be "modify-dev" instead?Bandwidth control, CPU mapping, fanout are not exclusive to VNICs. They will be expressed as properties, and applicable to non-VNIC data-links as well. This will be described in details by another upcoming document. I''ll see what I can do to make that clearer in the virtualization document I sent out for review. From the administration interface point of view, there are two ways to associate properties with data-links. For data-links that are created through a dladm subcommand like create-vnic, the initial set of properties can be specified during the creation of the data-link itself through an dedicated option. In addition the properties can be set on any data-link through the set-linkprop subcommand. The former allows the administrator to create a VNIC with bandwidth control in a single command instead of having to go through a two step dance.> > Needing to create a "dummy" VNIC on top of a regular interface > just to interpose these new features seems like an implementation > artifact.No, that won''t be needed, see above.> > - I assume we need a redesign of the VLAN code in order to get > per-VLAN bandwidth control. Is that redesign part of Crossbow, or > is it some later project? In reading the archives, it seems that > it''s been proposed as part of Crossbow, but in reading this > document it seems to be part of something else.Yes, we''re currently planning to move VLAN processing down to the MAC layer itself, and the VLAN processing currently in the DLS layer will be removed. This still needs to be properly documented.> > - If per-VLAN control appears, do the units of administration > change? Does it then become reasonable to talk about bandwidth > and CPU control using "set-linkprop"?Yes, the properties will apply to VLAN data-links as well, see above.> > - Do bandwidth and CPU controls rely on squeues? If so, then VNICs > may not be able to control utilization from non-IP traffic, such > as with bridging.There is a level of bandwidth control done by squeue, but there''s also a bandwidth control done by the MAC layer itself. Which is useful when there''s a need to do bandwidth control before fanout to multiple CPUs at the MAC layer, and also for non-IP protocols, or when the MAC is being used by a virtual machines back-end drivers in the host OS. See also Sunay''s writeup at http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt for more details on this topic.> - I''m not sure I understand the (undocumented? -- not in summary) > "-F" option for move-vnic. If I''m using a factory address on one > NIC and I move a VNIC to another NIC, does this cause the VNIC to > continue using the _same_ address but just on a new NIC? > > If so, how is duplication avoided if that factory address is ever > reused from the original NIC? > > I would have expected that a VNIC using a factory address would > just get a *new* address during a forced move to a new NIC. > Changing MAC address during reconfiguration doesn''t seem like a > disaster to me -- in fact, it seems expected. Why should it try > to retain the address?I was trying to allow the system administrator to minimize the impact on the existing MAC address assignment when moving a VNIC to be moved off and back to a device. But I agree that it''s not optimal. If the folks on this list feel that the MAC address changing is not an issue, I''ve no problem using the simpler scheme of reassigning a new MAC address to the VNIC/MAC client.> - For showing statistics with "show-vnic -s", are these the same as > "show-link -s"? If so, wouldn''t the existing "show-link -s" do > the job?Agreed, show-link -s should do fine here.> - What do "up" and "down" mean? Are these equivalent to controlling > the "RUNNING" bit from user space (i.e., some way of marking link > up and link down manually)? Or are they something else? Should > regular MAC instances (other than VNICs) have the ability to be > set administratively up and down? > > What would happen if VNICs were always "up?"Here it means causing the VNIC MACs to register with the framework. The same functionality already exists for link aggregations. Meem suggested init-vnic instead, which would be fine to me and avoid potential confusions with ifconfig up. I still need to update that part of the document.> > - What happens if a NIC is oversubscribed by the amount of bandwidth > configured for the VNICs? Is the result proportionate (and thus > "fair") allocation, or do they compete on some other grounds?In that case it will depend on other factors such as the type of traffic, the CPU(s) processing that traffic, etc.> > What kind of bandwidth control exists here? How granular is it, > and what effects do clients see from restricted bandwidth? Are > packets dropped (they have to be, if bandwidth limits apply to > forwarded traffic)? If so, is it tail drop or something more > sophisticated?In general if a SRS or flow is assigned its own hardware ring, then the polling thread will poll packets directly from the ring, and there''s no dropping from the host. Packets will be polled from the rings when allowed as per bandwidth limits and consumption. The polling thread is scheduled every tick, and we compute a maximum number of bytes per tick. If more than one SRS/squeue share a ring, there''s no polling of the ring. Instead, traffic will be interrupt driven, and packets will be deposited on queues associated with the SRS/squeue. Packets are then pulled from these queues based on bandwidth limits. If the maximum number of packets in these queues is exceeded, then there''s tail drop. Again, see the SRS design doc.> - Can a VNIC be built atop another non-anchor VNIC? (Seems like the > answer is "yes.")Correct.> > - When VNICs share rings due to a lack of hardware resources, what > happens when the client of one VNIC is using polling and the > client of the other one is not?> Won''t one client end up blanking the interrupts for another? If there''s one ring shared by multiple VNICs, traffic arrival will be interrupt based, and after software classification, traffic will be deposited to software rings. If there are multiple hardware rings but only one interrupt, then the driver does not disable the hardware interrupt. Instead, it takes note of the request from the stack to not interrupt for specific rings. When a hardware interrupt is received, it avoids consuming packets from these rings, and continues delivering traffic to the MAC layer otherwise. Again, see the document on SRS and bandwidth control for more details.> - Instead of adding more arguments to mac_open() to handle priority > and bandwidth, I''d suggest making these separate calls. You''ll > need the separate call anyway to implement the "modify" mechanism.Having the parameters specified in mac_open() is useful since they allow these parameters to be specified when the resources are allocated to the MAC client. This avoids allocating a set of default resources and then immediately changing these resources through a separate modify mechanism. If we can specify through 2-3 arguments I don''t think this should be an issue.> - What exactly does exclusive MAC access do? If mac_exclusive_set > is called, are other client requests blocked (sleeping)? Or are > they rejected (return error)? Or are they just let through, and > all clients are expected to bracket requests with exclusive > set/clear calls?This is basically the equivalent of the mac_active_set()/mac_active_clear() we have in Nevada today. I''m looking into whether the same semantics could be implemented indirectly through the mac_unicst_set() with the primary MAC address, since there''s only one and it can be assigned only to one MAC client.> > - MAC_UNICAST_AUTO seems unnecessary to me. Why not just call first > with MAC_UNICAST_FACTORY and, if that fails, call again with > MAC_UNICAST_RANDOM? Doing that would even have better > functionality as MAC_UNICAST_AUTO seems to omit the possibility of > desiring a particular factory address when available.The intent was for AUTO to allow the slot to be specified. That option should allow the slot number to be specified via addr_slot.> I think having MAC_UNICAST_AUTO in the mix ends up pushing some of > the control-path complexity out of the user space and into the > kernel. It''d be better to simplify the kernel parts.This is very simple logic we''re talking about here, I don''t see the problem doing that selection in kernel space. In addition, it avoids having two system calls per VNIC created on top of NICs which do not provide multiple factory MAC addresses.> - What sorts of privileges are required to create and administer > VNICs? Are these things that can be delegated to non-global > zones?Basically the same that are needed for administrating other data-links, i.e. sys_net_config and net_rawaccess. In a zones environment data-link administration is limited to the global zone.> - Why is [V]NIC the right level of bandwidth control? If I want to > give a zone 100Mbps worth of bandwidth, but I''m giving it multiple > VNICs, how do I do that -- can the bandwidth control logic do > accounting based on multiple interfaces (aggregate control, rather > than individual interface control)?No, the bandwidth control is on a per-interface on a per-flow basis. This is because the bandwidth is basically controlled by polling on a per ring (software or hardware) basis, not across a set of rings.> If I have application-level controls, such as HTTP virtual servers > or a sendmail configuration handling multiple domains, how can I > control bandwidth for those things? Won''t the application need to > be involved?Then you will use flowadm(1M) which we are also introducing as part of Crossbow, and will be described separately. My document focuses on the virtualization aspects of the project. Nicolas. -- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
James Carlson
2007-Aug-29 13:27 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Nicolas Droux writes:> From the administration interface point of view, there are two ways to > associate properties with data-links. For data-links that are created > through a dladm subcommand like create-vnic, the initial set of > properties can be specified during the creation of the data-link itself > through an dedicated option. In addition the properties can be set on > any data-link through the set-linkprop subcommand. The former allows the > administrator to create a VNIC with bandwidth control in a single > command instead of having to go through a two step dance.Does this mean that the same properties will be accessible via both "modify-vnic" and "set-linkprop"? I can understand wanting to set some initial properties at create time, but it seems odd that the new general properties are segregated into VNIC-specific commands.> > - Do bandwidth and CPU controls rely on squeues? If so, then VNICs > > may not be able to control utilization from non-IP traffic, such > > as with bridging. > > There is a level of bandwidth control done by squeue, but there''s also a > bandwidth control done by the MAC layer itself. Which is useful when > there''s a need to do bandwidth control before fanout to multiple CPUs at > the MAC layer, and also for non-IP protocols, or when the MAC is being > used by a virtual machines back-end drivers in the host OS. See also > Sunay''s writeup at > http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt > for more details on this topic.I had found and read that document before writing my comment. I still don''t quite see the relationship here. What are the responsibilities of the two mechanisms (the mac layer and the squeues)? To put the question in another way: suppose I have a non-IP protocol using a VNIC with a bandwidth control set on it. What happens? Are there features that were related to squeues that I won''t be able to use? If so, then what are those features? Or, to put it another way still: are there things that non-IP protocols should or could be doing in order to "cooperate" with this bandwidth control so that they behave as well as IP''s squeues will?> I was trying to allow the system administrator to minimize the impact on > the existing MAC address assignment when moving a VNIC to be moved off > and back to a device. But I agree that it''s not optimal. If the folks on > this list feel that the MAC address changing is not an issue, I''ve no > problem using the simpler scheme of reassigning a new MAC address to the > VNIC/MAC client.If I (as a system administrator) say "factory" as part of the configuration of the interface, then I''d expect to get a factory- supplied address. My expectation would be that when the factory- supplied components are swapped out underneath, the address changes. Having the factory-supplied address come unmoored from the device itself seems odd to me, and almost certain to cause trouble. I suppose it could be possible to create a "adopt the factory address and treat it as though it were my own statically-configured address" option, but I''d certainly want to see it come with adequate warnings about the dangers and a clear user interface (not "factory" but "steal-from-factory" ;-}). I''m not sure that it''d be administratively interesting, though.> > - What do "up" and "down" mean? Are these equivalent to controlling > > the "RUNNING" bit from user space (i.e., some way of marking link > > up and link down manually)? Or are they something else? Should > > regular MAC instances (other than VNICs) have the ability to be > > set administratively up and down? > > > > What would happen if VNICs were always "up?" > > Here it means causing the VNIC MACs to register with the framework. The > same functionality already exists for link aggregations. Meem suggested > init-vnic instead, which would be fine to me and avoid potential > confusions with ifconfig up. I still need to update that part of the > document.Ah, ok. Yes, that would make this a lot clearer.> > - What happens if a NIC is oversubscribed by the amount of bandwidth > > configured for the VNICs? Is the result proportionate (and thus > > "fair") allocation, or do they compete on some other grounds? > > In that case it will depend on other factors such as the type of > traffic, the CPU(s) processing that traffic, etc.I suggest putting more effort into characterizing this, because oversubscribing is a common and fairly well understood way to balance risk versus utilization and occurs often in handling failure scenarios (such as with aggregation). I''ve seen similar schemes for access servers (most have proprietary RADIUS extensions for setting bandwidth limits), and the usual way this works is that once the link is saturated, the configured limits become shares. Thus, the clients are all hurt in proportion to the amount of bandwidth they''re given.> > What kind of bandwidth control exists here? How granular is it, > > and what effects do clients see from restricted bandwidth? Are > > packets dropped (they have to be, if bandwidth limits apply to > > forwarded traffic)? If so, is it tail drop or something more > > sophisticated? > > In general if a SRS or flow is assigned its own hardware ring, then the > polling thread will poll packets directly from the ring, and there''s no > dropping from the host. Packets will be polled from the rings when > allowed as per bandwidth limits and consumption. The polling thread is > scheduled every tick, and we compute a maximum number of bytes per tick. > > If more than one SRS/squeue share a ring, there''s no polling of the > ring. Instead, traffic will be interrupt driven, and packets will be > deposited on queues associated with the SRS/squeue. Packets are then > pulled from these queues based on bandwidth limits. If the maximum > number of packets in these queues is exceeded, then there''s tail drop. > Again, see the SRS design doc."Tail drop" looks like the answer I was looking for. In that case, you might want to consider (at least as an RFE) including basic RED support here. There can be a big difference in behavior between hardware-imposed limits (ones that presumably affect both the sender and receiver in most cases) and artificial limits because the network behavior is quite different, and tail-drop is known to cause poor TCP performance.> > - When VNICs share rings due to a lack of hardware resources, what > > happens when the client of one VNIC is using polling and the > > client of the other one is not? > > Won''t one client end up blanking the interrupts for another? > > If there''s one ring shared by multiple VNICs, traffic arrival will be > interrupt based, and after software classification, traffic will be > deposited to software rings.OK; that''s the part I was looking for.> > - Instead of adding more arguments to mac_open() to handle priority > > and bandwidth, I''d suggest making these separate calls. You''ll > > need the separate call anyway to implement the "modify" mechanism. > > Having the parameters specified in mac_open() is useful since they allow > these parameters to be specified when the resources are allocated to > the MAC client. This avoids allocating a set of default resources and > then immediately changing these resources through a separate modify > mechanism. If we can specify through 2-3 arguments I don''t think this > should be an issue.I think it''s much more flexible and easier to do it later. You''re going to need a function to change the values after mac_open() time. By supplying the same values during mac_open(), you''re just duplicating that functionality. Worse, mac_open() is a core function, while resource control is at the periphery. If you need to modify mac_open() every time resource controls are tweaked -- consider what happens when shared resources are introduced (allowing control of multiple interfaces as a group), or when more advanced queuing disciplines are allowed -- then this interface will never settle down and never be appropriate as a DDI function. Separating these two allows you to add new control functions in the future without having to modify every mac_open() caller. It''s as though every fcntl(2) feature needed to be supplied in open(2). Why is the resource allocation itself an important thing to optimize versus the interface stability and scalability?> > - What exactly does exclusive MAC access do? If mac_exclusive_set > > is called, are other client requests blocked (sleeping)? Or are > > they rejected (return error)? Or are they just let through, and > > all clients are expected to bracket requests with exclusive > > set/clear calls? > > This is basically the equivalent of the > mac_active_set()/mac_active_clear() we have in Nevada today. I''m looking > into whether the same semantics could be implemented indirectly through > the mac_unicst_set() with the primary MAC address, since there''s only > one and it can be assigned only to one MAC client.I thought that the "active" flag was there to allow passive users (such as snoop) to monitor interfaces that would otherwise be off-bounds, such as aggregation members. It''s not clear to me how there''s an equivalent of that here. Maybe this section just needs more explanation or a usage scenario.> > - MAC_UNICAST_AUTO seems unnecessary to me. Why not just call first > > with MAC_UNICAST_FACTORY and, if that fails, call again with > > MAC_UNICAST_RANDOM? Doing that would even have better > > functionality as MAC_UNICAST_AUTO seems to omit the possibility of > > desiring a particular factory address when available. > > The intent was for AUTO to allow the slot to be specified. That option > should allow the slot number to be specified via addr_slot.The document says it must be -1.> > I think having MAC_UNICAST_AUTO in the mix ends up pushing some of > > the control-path complexity out of the user space and into the > > kernel. It''d be better to simplify the kernel parts. > > This is very simple logic we''re talking about here, I don''t see the > problem doing that selection in kernel space. In addition, it avoids > having two system calls per VNIC created on top of NICs which do not > provide multiple factory MAC addresses.It''s also duplicate logic. Why optimize for system call counts versus kernel code complexity?> > - What sorts of privileges are required to create and administer > > VNICs? Are these things that can be delegated to non-global > > zones? > > Basically the same that are needed for administrating other data-links, > i.e. sys_net_config and net_rawaccess. In a zones environment data-link > administration is limited to the global zone.That latter part might not be right for IP Instances, particularly since VNICs can be built atop other VNICs. (Maybe that''s just an issue for the future, though.)> > - Why is [V]NIC the right level of bandwidth control? If I want to > > give a zone 100Mbps worth of bandwidth, but I''m giving it multiple > > VNICs, how do I do that -- can the bandwidth control logic do > > accounting based on multiple interfaces (aggregate control, rather > > than individual interface control)? > > No, the bandwidth control is on a per-interface on a per-flow basis. > This is because the bandwidth is basically controlled by polling on a > per ring (software or hardware) basis, not across a set of rings.That''s quite different from what most QoS implementations I''ve seen do. The usual model is to map interfaces and flows into a "QoS group," which is then controlled as a single unit, as in Cisco''s "qos-group" feature and policy maps. I''d suggest making sure that potential customers of this new bandwidth control feature are keenly aware of the no-resource-aggregation limitation. It sounds like it''s intended as a fundamental design feature, and not something that might be a temporary feature limitation that could be removed later. (As a user, I wouldn''t be surprised to find that the controls at initial release don''t match what I actually need, but I''d be very surprised if the controls couldn''t be fixed later.)> > If I have application-level controls, such as HTTP virtual servers > > or a sendmail configuration handling multiple domains, how can I > > control bandwidth for those things? Won''t the application need to > > be involved? > > Then you will use flowadm(1M) which we are also introducing as part of > Crossbow, and will be described separately. My document focuses on the > virtualization aspects of the project.OK. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Nicolas Droux
2007-Aug-29 22:15 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Jim, James Carlson wrote:> Nicolas Droux writes: >> From the administration interface point of view, there are two ways to >> associate properties with data-links. For data-links that are created >> through a dladm subcommand like create-vnic, the initial set of >> properties can be specified during the creation of the data-link itself >> through an dedicated option. In addition the properties can be set on >> any data-link through the set-linkprop subcommand. The former allows the >> administrator to create a VNIC with bandwidth control in a single >> command instead of having to go through a two step dance. > > Does this mean that the same properties will be accessible via both > "modify-vnic" and "set-linkprop"? > > I can understand wanting to set some initial properties at create > time, but it seems odd that the new general properties are segregated > into VNIC-specific commands.No, only set-linkprop will be used to change these properties, not modify-vnic. We''ll send out updated man pages to reflect these changes, and they will be different than the man pages that were published as part of our current bits.>>> - Do bandwidth and CPU controls rely on squeues? If so, then VNICs >>> may not be able to control utilization from non-IP traffic, such >>> as with bridging. >> There is a level of bandwidth control done by squeue, but there''s also a >> bandwidth control done by the MAC layer itself. Which is useful when >> there''s a need to do bandwidth control before fanout to multiple CPUs at >> the MAC layer, and also for non-IP protocols, or when the MAC is being >> used by a virtual machines back-end drivers in the host OS. See also >> Sunay''s writeup at >> http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt >> for more details on this topic. > > I had found and read that document before writing my comment. > > I still don''t quite see the relationship here. What are the > responsibilities of the two mechanisms (the mac layer and the > squeues)? > > To put the question in another way: suppose I have a non-IP protocol > using a VNIC with a bandwidth control set on it. What happens? Are > there features that were related to squeues that I won''t be able to > use? If so, then what are those features?The client will see a MAC which has a bandwidth limit, nothing else is required.> Or, to put it another way still: are there things that non-IP > protocols should or could be doing in order to "cooperate" with this > bandwidth control so that they behave as well as IP''s squeues will?No, no special requirements. The bandwidth limits set on a MAC will be enforced by the MAC layer SRS. We''ll also have a flow API which will be available to MAC clients to define bandwidth limits for services, etc, and used by clients like IP when needed.>> I was trying to allow the system administrator to minimize the impact on >> the existing MAC address assignment when moving a VNIC to be moved off >> and back to a device. But I agree that it''s not optimal. If the folks on >> this list feel that the MAC address changing is not an issue, I''ve no >> problem using the simpler scheme of reassigning a new MAC address to the >> VNIC/MAC client. > > If I (as a system administrator) say "factory" as part of the > configuration of the interface, then I''d expect to get a factory- > supplied address. My expectation would be that when the factory- > supplied components are swapped out underneath, the address changes.Actually there are three sub-cases to this I think: 1. If the administrator does not specify an address (automatic assignment), and a factory MAC address is assigned to the VNIC. In this case, I think it''s fine to assign a different MAC address, e.g. a random one, to the VNIC if the VNIC is moved to a NIC which does not have available factory MAC addresses. 2. If the administrator requested a factory MAC addresses explicitly, then the VNIC could be moved to a different NIC which has an available factory MAC address. Otherwise the operation would fail unless a force flag is set. 3. If the administrator requested a factory MAC address of a specific slot, then there''s a clear intent of using a specific MAC address of the device underneath. In that case the move operation would fail unless a force flag is set.> Having the factory-supplied address come unmoored from the device > itself seems odd to me, and almost certain to cause trouble. I > suppose it could be possible to create a "adopt the factory address > and treat it as though it were my own statically-configured address" > option, but I''d certainly want to see it come with adequate warnings > about the dangers and a clear user interface (not "factory" but > "steal-from-factory" ;-}). I''m not sure that it''d be administratively > interesting, though.Yes, there''s a risk of duplicate addresses if that option was chosen, and the source NIC ends-up being recycled later, that''s less than ideal.>>> - What happens if a NIC is oversubscribed by the amount of bandwidth >>> configured for the VNICs? Is the result proportionate (and thus >>> "fair") allocation, or do they compete on some other grounds? >> In that case it will depend on other factors such as the type of >> traffic, the CPU(s) processing that traffic, etc. > > I suggest putting more effort into characterizing this, because > oversubscribing is a common and fairly well understood way to balance > risk versus utilization and occurs often in handling failure scenarios > (such as with aggregation). > > I''ve seen similar schemes for access servers (most have proprietary > RADIUS extensions for setting bandwidth limits), and the usual way > this works is that once the link is saturated, the configured limits > become shares. Thus, the clients are all hurt in proportion to the > amount of bandwidth they''re given.The limits are really used to clamp down on bandwidth utilization by a MAC, but they do not imply any guaranteed bandwidth. As a future deliverable we''re also planning to provide bandwidth guarantees which is what you seem to be referring to here.>>> What kind of bandwidth control exists here? How granular is it, >>> and what effects do clients see from restricted bandwidth? Are >>> packets dropped (they have to be, if bandwidth limits apply to >>> forwarded traffic)? If so, is it tail drop or something more >>> sophisticated? >> In general if a SRS or flow is assigned its own hardware ring, then the >> polling thread will poll packets directly from the ring, and there''s no >> dropping from the host. Packets will be polled from the rings when >> allowed as per bandwidth limits and consumption. The polling thread is >> scheduled every tick, and we compute a maximum number of bytes per tick. >> >> If more than one SRS/squeue share a ring, there''s no polling of the >> ring. Instead, traffic will be interrupt driven, and packets will be >> deposited on queues associated with the SRS/squeue. Packets are then >> pulled from these queues based on bandwidth limits. If the maximum >> number of packets in these queues is exceeded, then there''s tail drop. >> Again, see the SRS design doc. > > "Tail drop" looks like the answer I was looking for. > > In that case, you might want to consider (at least as an RFE) > including basic RED support here. There can be a big difference in > behavior between hardware-imposed limits (ones that presumably affect > both the sender and receiver in most cases) and artificial limits > because the network behavior is quite different, and tail-drop is > known to cause poor TCP performance.Agreed. We still to document in more details our existing scheme here, and we should discuss alternatives as part of that text.>>> - Instead of adding more arguments to mac_open() to handle priority >>> and bandwidth, I''d suggest making these separate calls. You''ll >>> need the separate call anyway to implement the "modify" mechanism. >> Having the parameters specified in mac_open() is useful since they allow >> these parameters to be specified when the resources are allocated to >> the MAC client. This avoids allocating a set of default resources and >> then immediately changing these resources through a separate modify >> mechanism. If we can specify through 2-3 arguments I don''t think this >> should be an issue. > > I think it''s much more flexible and easier to do it later. > > You''re going to need a function to change the values after mac_open() > time. By supplying the same values during mac_open(), you''re just > duplicating that functionality.It might be a single "piece of code" which can be called to allocate resources according to these parameters from both the open and modify functions. I think the duplication can be avoided.> Worse, mac_open() is a core function, while resource control is at the > periphery. If you need to modify mac_open() every time resource > controls are tweaked -- consider what happens when shared resources > are introduced (allowing control of multiple interfaces as a group), > or when more advanced queuing disciplines are allowed -- then this > interface will never settle down and never be appropriate as a DDI > function. > > Separating these two allows you to add new control functions in the > future without having to modify every mac_open() caller. > > It''s as though every fcntl(2) feature needed to be supplied in > open(2). > > Why is the resource allocation itself an important thing to optimize > versus the interface stability and scalability?I don''t agree with the "core function" vs "periphery" argument. The resource control is becoming an integral part of the MAC layer, and there shouldn''t be a need to do "extra steps" to enable that functionality. But I agree with your point about designing an API which allows more options to be added in the future without breaking backward compatibility. However I think this can be made to work without requiring a separate call. I''ll need to take a closer look at this.>>> - MAC_UNICAST_AUTO seems unnecessary to me. Why not just call first >>> with MAC_UNICAST_FACTORY and, if that fails, call again with >>> MAC_UNICAST_RANDOM? Doing that would even have better >>> functionality as MAC_UNICAST_AUTO seems to omit the possibility of >>> desiring a particular factory address when available. >> The intent was for AUTO to allow the slot to be specified. That option >> should allow the slot number to be specified via addr_slot. > > The document says it must be -1.Yes, and I need to fix the document to allow a slot number to be passed when that MAC address type is specified.>>> I think having MAC_UNICAST_AUTO in the mix ends up pushing some of >>> the control-path complexity out of the user space and into the >>> kernel. It''d be better to simplify the kernel parts. >> This is very simple logic we''re talking about here, I don''t see the >> problem doing that selection in kernel space. In addition, it avoids >> having two system calls per VNIC created on top of NICs which do not >> provide multiple factory MAC addresses. > > It''s also duplicate logic. Why optimize for system call counts versus > kernel code complexity?There''s additional code in the kernel, but that logic is very simple.>>> - What sorts of privileges are required to create and administer >>> VNICs? Are these things that can be delegated to non-global >>> zones? >> Basically the same that are needed for administrating other data-links, >> i.e. sys_net_config and net_rawaccess. In a zones environment data-link >> administration is limited to the global zone. > > That latter part might not be right for IP Instances, particularly > since VNICs can be built atop other VNICs. (Maybe that''s just an > issue for the future, though.)Even with IP instances, data-link control remains in the global zone.>>> - Why is [V]NIC the right level of bandwidth control? If I want to >>> give a zone 100Mbps worth of bandwidth, but I''m giving it multiple >>> VNICs, how do I do that -- can the bandwidth control logic do >>> accounting based on multiple interfaces (aggregate control, rather >>> than individual interface control)? >> No, the bandwidth control is on a per-interface on a per-flow basis. >> This is because the bandwidth is basically controlled by polling on a >> per ring (software or hardware) basis, not across a set of rings. > > That''s quite different from what most QoS implementations I''ve seen > do. The usual model is to map interfaces and flows into a "QoS > group," which is then controlled as a single unit, as in Cisco''s > "qos-group" feature and policy maps. > > I''d suggest making sure that potential customers of this new bandwidth > control feature are keenly aware of the no-resource-aggregation > limitation. It sounds like it''s intended as a fundamental design > feature, and not something that might be a temporary feature > limitation that could be removed later. (As a user, I wouldn''t be > surprised to find that the controls at initial release don''t match > what I actually need, but I''d be very surprised if the controls > couldn''t be fixed later.)Yes, this will be of course fully documented. If we find an efficient way to do banwidth control across multiple rings in the future, I don''t see why we wouldn''t be able to made use of that functionality. Thanks, Nicolas. -- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
James Carlson
2007-Aug-30 19:01 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Nicolas Droux writes:> James Carlson wrote: > > I can understand wanting to set some initial properties at create > > time, but it seems odd that the new general properties are segregated > > into VNIC-specific commands. > > No, only set-linkprop will be used to change these properties, not > modify-vnic. We''ll send out updated man pages to reflect these changes, > and they will be different than the man pages that were published as > part of our current bits.OK; thanks.> > To put the question in another way: suppose I have a non-IP protocol > > using a VNIC with a bandwidth control set on it. What happens? Are > > there features that were related to squeues that I won''t be able to > > use? If so, then what are those features? > > The client will see a MAC which has a bandwidth limit, nothing else is > required.That''s what I wanted to know.> > If I (as a system administrator) say "factory" as part of the > > configuration of the interface, then I''d expect to get a factory- > > supplied address. My expectation would be that when the factory- > > supplied components are swapped out underneath, the address changes. > > Actually there are three sub-cases to this I think: > > 1. If the administrator does not specify an address (automatic > assignment), and a factory MAC address is assigned to the VNIC. In this > case, I think it''s fine to assign a different MAC address, e.g. a random > one, to the VNIC if the VNIC is moved to a NIC which does not have > available factory MAC addresses.Yes, I agree with that. That''d be the "auto" case, and I was talking about "factory."> 2. If the administrator requested a factory MAC addresses explicitly, > then the VNIC could be moved to a different NIC which has an available > factory MAC address. Otherwise the operation would fail unless a force > flag is set.Why would "force" be useful in this case? What exactly happens if the operation is "forced," and why couldn''t I configure the interface in that way in the first place? I''m very leery of administrative options that leave the system in a state where I couldn''t have configured it that way in the first place. In this case, I can''t configure a VNIC as "factory" if there are no addresses available, but I can "force" a "factory" VNIC into an interface with no addresses available. Does the configuration pop loose (become something other than "factory") during such a forced move, or does the configuration just become incorrect, saying "factory" but meaning something else?> 3. If the administrator requested a factory MAC address of a specific > slot, then there''s a clear intent of using a specific MAC address of the > device underneath. In that case the move operation would fail unless a > force flag is set.I still think this is too divorced from administrative expectations. When do I move VNICs around and what do I need and expect? I think this document should work through some actual usage scenarios and then come up with usable interfaces based on that, because the current interfaces seem to be self-referential: they do what they do because that''s what they do. The "force" flag seems particularly problematic, as it indicates that things the administrator should be able to do aren''t doable. The scenarios I can see are: - User configures VNIC for the first time on a given NIC. What happens when the "factory" address desired doesn''t exist or is in use? - User wants a VNIC to move from one NIC to another. Forget about "forcing" the operation, and look at the need. Why am I moving it from one to another and what should I expect? - The system needs to move a VNIC from one NIC to another (or to none at all!) due to DR removal of the assigned NIC. There might be other variations here. Here''s one possible answer that I think would make a bit more sense, at least to me, and would be much simpler. The "auto" keyword and the "-F" flag go away. All configurations that specify "factory" are implicitly automatic: if the requested factory address isn''t available, then you get an auto-generated one and perhaps a warning message. If you really care which kind of address you get, then look at the MAC address -- it''ll have the "local" flag set if it was auto-generated. When moving from one interface to another, if "factory" is selected, it''s the same as configuring the interface for the very first time. If the requested address is available on the new (destination) NIC, then it''s used. If it''s not, then an auto- generated address is used instead. The system doesn''t have a way to be obstinate about using a particular factory-assigned address, and failing otherwise. If you need to have a never-changing address, then assign one manually or use the "random" option, as neither of these options relies on data supplied by the hardware itself. Factory addresses are, by definition, "ephemeral" from the point of view of a VNIC -- they''re tied to the hardware, not to the VNIC.> > Having the factory-supplied address come unmoored from the device > > itself seems odd to me, and almost certain to cause trouble. I > > suppose it could be possible to create a "adopt the factory address > > and treat it as though it were my own statically-configured address" > > option, but I''d certainly want to see it come with adequate warnings > > about the dangers and a clear user interface (not "factory" but > > "steal-from-factory" ;-}). I''m not sure that it''d be administratively > > interesting, though. > > Yes, there''s a risk of duplicate addresses if that option was chosen, > and the source NIC ends-up being recycled later, that''s less than ideal.Actually, it''s potentially a disaster if it happens. If moving factory addresses around among NICs is actually an important administrative requirement, then, in terms of ARC review, I''d feel TCR-strong that the system _must_ prevent duplicates from forming somehow. Or just not include that feature.> > I''ve seen similar schemes for access servers (most have proprietary > > RADIUS extensions for setting bandwidth limits), and the usual way > > this works is that once the link is saturated, the configured limits > > become shares. Thus, the clients are all hurt in proportion to the > > amount of bandwidth they''re given. > > The limits are really used to clamp down on bandwidth utilization by a > MAC, but they do not imply any guaranteed bandwidth. As a future > deliverable we''re also planning to provide bandwidth guarantees which is > what you seem to be referring to here.Actually, no, that''s not quite what I''m referring to. A bandwidth limit is an upper bound. If the user tries to send more than that, then he''ll experience delay and loss. There''s no guarantee that he''ll be able to send that much, but he won''t be able to send more. A bandwidth guarantee is a lower bound. It''s a reservation. The user must always be able to get at least a given amount. This project doesn''t supply guarantees. Quite apart from those definitions, though, is the issue of fairness. In this case, I *am* talking about limits, but I''m also talking about what happens when the limit is unachievable. In the implementations I''ve seen (Cisco and Ascend are pretty good references for this), the limit becomes a share because this sort of behavior preserves fairness. Suppose we have twenty users with 10Mbps limits, and one user with a 50Mbps limit. They''re all on a 100Mbps pipe. If ten of those 10Mbps users can together lock out all of the others from using any of the pipe bandwidth at all, then that''s an "unfair" result. A very simple, but "fair," result would be that, in the limit with everyone sending flat-out, the 50Mbps-limited user would get 20% of the bandwidth, or 20Mbps. The 10Mbps users would get the remaining 80%, or 4Mbps. Thus, each user would end up with 40% (which is 100/250 and 20/50 and 4/10) of his maximum. Other results are possible, including splitting the various kinds of users into priority classes. I assume that''s not what''s going on here, though it''s not clear. The point is that, although the answer could just be that it''s inherently unfair, and that''s how it is, I don''t see how an inherently unfair system is something that people could use in practice. Does it make sense to do that?> > You''re going to need a function to change the values after mac_open() > > time. By supplying the same values during mac_open(), you''re just > > duplicating that functionality. > > It might be a single "piece of code" which can be called to allocate > resources according to these parameters from both the open and modify > functions. I think the duplication can be avoided.Then the duplication is only in the API.> > Why is the resource allocation itself an important thing to optimize > > versus the interface stability and scalability? > > I don''t agree with the "core function" vs "periphery" argument. The > resource control is becoming an integral part of the MAC layer, and > there shouldn''t be a need to do "extra steps" to enable that functionality.Opening the device is clearly core functionality -- you can''t do much if you can''t open it. It sounds like you agree that if those arguments weren''t present, then some "default" set of resources would need to be allocated. Thus, I argue that the functionality isn''t core to the goal of getting access to the mac layer. So, the disagreement is on whether every consumer needs to set up resource controls. I''m not sure that they do. But if they do, aren''t there other things they also "need to" set up, and should all of those things be mac_open() arguments?> But I agree with your point about designing an API which allows more > options to be added in the future without breaking backward > compatibility. However I think this can be made to work without > requiring a separate call. I''ll need to take a closer look at this.OK.> > The document says it must be -1. > > Yes, and I need to fix the document to allow a slot number to be passed > when that MAC address type is specified.OK.> > It''s also duplicate logic. Why optimize for system call counts versus > > kernel code complexity? > > There''s additional code in the kernel, but that logic is very simple.I''ll give up on this point. I don''t think the duplication is worthwhile, even if it''s "simple," as this sort of thing often leads to trouble when alternate policies are devised, but it''s something hidden in the implementation that can be ripped back up later if necessary.> Yes, this will be of course fully documented. If we find an efficient > way to do banwidth control across multiple rings in the future, I don''t > see why we wouldn''t be able to made use of that functionality.Not just "fully documented," but the design constraint around the units of control (being individual NIC instances) needs to be clearly described. Maybe I''m atypical but, as a user, this wouldn''t be obvious to me. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Nicolas Droux
2007-Aug-31 20:58 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
James Carlson wrote:> When do I move VNICs around and what do I need and expect? I think > this document should work through some actual usage scenarios and then > come up with usable interfaces based on that, because the current > interfaces seem to be self-referential: they do what they do because > that''s what they do. The "force" flag seems particularly problematic, > as it indicates that things the administrator should be able to do > aren''t doable.<snip> We need the ability to assign a factory MAC address or fail the operation is none is available. I don''t see a problem with that. There''s the "auto" (also the default) option for administrators who want to have a factory address if one is available, but don''t mind to have a random address assigned if no factory address is available. I''m fine for not having a "force" option during the move operation. This means that if a VNIC was created with a "factory" option, and the destination NIC doesn''t have a factory MAC address available, then the move will fail. It also means that if a user explicitly specified a factory MAC address slot, the move will fail.>>> I''ve seen similar schemes for access servers (most have proprietary >>> RADIUS extensions for setting bandwidth limits), and the usual way >>> this works is that once the link is saturated, the configured limits >>> become shares. Thus, the clients are all hurt in proportion to the >>> amount of bandwidth they''re given. >> The limits are really used to clamp down on bandwidth utilization by a >> MAC, but they do not imply any guaranteed bandwidth. As a future >> deliverable we''re also planning to provide bandwidth guarantees which is >> what you seem to be referring to here. > > Actually, no, that''s not quite what I''m referring to. > > A bandwidth limit is an upper bound. If the user tries to send more > than that, then he''ll experience delay and loss. There''s no guarantee > that he''ll be able to send that much, but he won''t be able to send > more. > > A bandwidth guarantee is a lower bound. It''s a reservation. The user > must always be able to get at least a given amount. This project > doesn''t supply guarantees.Right.> Quite apart from those definitions, though, is the issue of fairness. > In this case, I *am* talking about limits, but I''m also talking about > what happens when the limit is unachievable. In the implementations > I''ve seen (Cisco and Ascend are pretty good references for this), the > limit becomes a share because this sort of behavior preserves > fairness. > > Suppose we have twenty users with 10Mbps limits, and one user with a > 50Mbps limit. They''re all on a 100Mbps pipe. If ten of those 10Mbps > users can together lock out all of the others from using any of the > pipe bandwidth at all, then that''s an "unfair" result. > > A very simple, but "fair," result would be that, in the limit with > everyone sending flat-out, the 50Mbps-limited user would get 20% of > the bandwidth, or 20Mbps. The 10Mbps users would get the remaining > 80%, or 4Mbps. Thus, each user would end up with 40% (which is > 100/250 and 20/50 and 4/10) of his maximum.I think this is an RFE we should consider. I''ll let the rest of the team chime in if they disagree or have more to add.>>> Why is the resource allocation itself an important thing to optimize >>> versus the interface stability and scalability? >> I don''t agree with the "core function" vs "periphery" argument. The >> resource control is becoming an integral part of the MAC layer, and >> there shouldn''t be a need to do "extra steps" to enable that functionality. > > Opening the device is clearly core functionality -- you can''t do much > if you can''t open it. It sounds like you agree that if those > arguments weren''t present, then some "default" set of resources would > need to be allocated. Thus, I argue that the functionality isn''t core > to the goal of getting access to the mac layer.If we have the hint at open time, then we can do it right there instead of doing a first default allocation followed by a new allocation when the hint is specified.>> Yes, this will be of course fully documented. If we find an efficient >> way to do banwidth control across multiple rings in the future, I don''t >> see why we wouldn''t be able to made use of that functionality. > > Not just "fully documented," but the design constraint around the > units of control (being individual NIC instances) needs to be clearly > described. Maybe I''m atypical but, as a user, this wouldn''t be > obvious to me.Sure, we''re planning to document this already. Nicolas. -- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
James Carlson
2007-Sep-04 14:45 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Nicolas Droux writes:> James Carlson wrote: > > > When do I move VNICs around and what do I need and expect? I think > > this document should work through some actual usage scenarios and then > > come up with usable interfaces based on that, because the current > > interfaces seem to be self-referential: they do what they do because > > that''s what they do. The "force" flag seems particularly problematic, > > as it indicates that things the administrator should be able to do > > aren''t doable. > > <snip> > > We need the ability to assign a factory MAC address or fail the > operation is none is available.The question I''m asking here is: "why?" Under what circumstances does it make sense to provide this failure mode for administrators? How does it help rather than hinder? That''s what I''d like to see in the document -- some explanation that shows what administrative problem is being addressed by the functionality that''s provided. One way to do that is by providing usage scenarios. (Preferably ones that don''t assume the outcome. I.e., not "the user wants to make sure configuration of vnic0 fails if a factory address in slot 2 isn''t available, so ...")> I''m fine for not having a "force" option during the move operation. This > means that if a VNIC was created with a "factory" option, and the > destination NIC doesn''t have a factory MAC address available, then the > move will fail. It also means that if a user explicitly specified a > factory MAC address slot, the move will fail.Unless there''s some reason to believe that factory address slot numbers are allocated in a common way across NICs, I think that moving a VNIC and preserving the slot number is a sketchy idea. It''s akin to moving a zone from one machine to another and expecting that "qfe1" will be the same interface on the same network on the destination machine. It might be, with sufficient advance planning. It probably isn''t though. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Nicolas Droux
2007-Sep-06 05:42 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
On Sep 4, 2007, at 8:45 AM, James Carlson wrote:> The question I''m asking here is: "why?" > > Under what circumstances does it make sense to provide this failure > mode for administrators? How does it help rather than hinder?It provides the user the ability to preserve and enforce the assignment of factory MAC addresses to virtual machines in a consolidated environment. If the administrator specifically asks for a factory MAC address but none are available, then the operation would fail. The (default) automatic mode is also there for the users who don''t care if a random address is assigned to the VNIC instead.>> I''m fine for not having a "force" option during the move >> operation. This >> means that if a VNIC was created with a "factory" option, and the >> destination NIC doesn''t have a factory MAC address available, then >> the >> move will fail. It also means that if a user explicitly specified a >> factory MAC address slot, the move will fail. > > Unless there''s some reason to believe that factory address slot > numbers are allocated in a common way across NICs, I think that moving > a VNIC and preserving the slot number is a sketchy idea.I was not proposing preserving the slot number. The factory slot number after the move could be different. But it will be a factory MAC address if such an address was specifically requested on the source. Nicolas. -- Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
James Carlson
2007-Sep-06 11:34 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Nicolas Droux writes:> On Sep 4, 2007, at 8:45 AM, James Carlson wrote: > > > The question I''m asking here is: "why?" > > > > Under what circumstances does it make sense to provide this failure > > mode for administrators? How does it help rather than hinder? > > It provides the user the ability to preserve and enforce the > assignment of factory MAC addresses to virtual machines in a > consolidated environment. If the administrator specifically asks for > a factory MAC address but none are available, then the operation > would fail. The (default) automatic mode is also there for the users > who don''t care if a random address is assigned to the VNIC instead.Yes, I understand what it would do. I still don''t see why that''s a helpful operation. It clearly provides a special failure mode. What isn''t clear is why users of this feature would prefer to have the operation fail rather than having the system provide a best attempt (perhaps with warnings) instead. What are the administrators actually doing with these MAC addresses that causes them to prefer failure? -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Sunay Tripathi
2007-Sep-06 17:11 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
James Carlson wrote:> Nicolas Droux writes: >> On Sep 4, 2007, at 8:45 AM, James Carlson wrote: >> >>> The question I''m asking here is: "why?" >>> >>> Under what circumstances does it make sense to provide this failure >>> mode for administrators? How does it help rather than hinder? >> It provides the user the ability to preserve and enforce the >> assignment of factory MAC addresses to virtual machines in a >> consolidated environment. If the administrator specifically asks for >> a factory MAC address but none are available, then the operation >> would fail. The (default) automatic mode is also there for the users >> who don''t care if a random address is assigned to the VNIC instead. > > Yes, I understand what it would do. I still don''t see why that''s a > helpful operation. > > It clearly provides a special failure mode. What isn''t clear is why > users of this feature would prefer to have the operation fail rather > than having the system provide a best attempt (perhaps with warnings) > instead.> > What are the administrators actually doing with these MAC addresses > that causes them to prefer failure? > Factory assigned MAC addresses are inventoried entities in some companies. They keep track of the MAC address(s) the machine has along with other information (like physical location etc). Sparc''s have a hostid but on x86, this is the only unique way to identify the physical machines from the packet on the network. Random MAC addresses are random at best and have no guarantees that they are unique across different machines. The virtualization crowd has adopted random mac address but a sizable set of customers are still skeptical about duplication etc. A user assigned MAC address is cumbersome at best and some customers are not prepared to pay the overheads of assigning them and tracking them across their data center(s). As such, the NIC which have multiple factory assigned MAC addresses becauses a very useful resource for a set of customer wanting to play in virtualization space (but not caring about live migration - specially zones). They don''t have to deal with user assigned addresses or random addresses and they can inventory the factory assigned mac addresses just as they used to before. Yes, they get limited by the number of VNIC they can create but 8-16 factory assigned mac address gives them sufficient headroom to play. Thats why you need to either use the ''auto'' flag where you don''t care but if user specified factory, then he does care and we can''t get him a factory mac address, we fail the operation. Perhaps there is a better administrative interface to express this and we are open to suggestions. But hopefully you get the idea what we are trying to achieve. On a different note, having examples is always very helpful. Cheers, Sunay -- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
James Carlson
2007-Sep-06 17:17 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Sunay Tripathi writes:> James Carlson wrote: > > What are the administrators actually doing with these MAC addresses > > that causes them to prefer failure? > > > > Factory assigned MAC addresses are inventoried entities in some > companies. They keep track of the MAC address(s) the machine has along > with other information (like physical location etc). Sparc''s have a > hostid but on x86, this is the only unique way to identify the physical > machines from the packet on the network.Sure. And you can tell which address you''ve got (if you care) by using the status command. And I''d point out that after any move, you would *need* to look at the address on the interface, because the new physical interface likely has a different set of MAC addresses on it, and you''re going to need to update those crufty tables (such as /etc/ethers). I''d even see no problem with issuing a warning when the use-random-fallback event occurs: Warning: you asked for a factory address, but I couldn''t get one. I''ve assigned a random address instead. If that''s not ok, then you''ll probably want to reconfigure this interface. (Or perhaps something more professional-looking than that.) The problem I have is with the failure mode. I don''t see a purpose.> Random MAC addresses are random at best and have no guarantees that > they are unique across different machines. The virtualization crowd > has adopted random mac address but a sizable set of customers are > still skeptical about duplication etc.I understand why users would want to prefer factory addresses. I wasn''t questioning that at all. I don''t understand why they would prefer to see failure. It doesn''t seem helpful. Would users actually be inconvenienced if an interface worked because it fell back to a random address, where they''d actually have an advantage if it failed instead? -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Sunay Tripathi
2007-Sep-06 18:00 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
James Carlson wrote:> Sunay Tripathi writes: >> James Carlson wrote: >> > What are the administrators actually doing with these MAC addresses >> > that causes them to prefer failure? >> > >> >> Factory assigned MAC addresses are inventoried entities in some >> companies. They keep track of the MAC address(s) the machine has along >> with other information (like physical location etc). Sparc''s have a >> hostid but on x86, this is the only unique way to identify the physical >> machines from the packet on the network. > > Sure. And you can tell which address you''ve got (if you care) by > using the status command. > > And I''d point out that after any move, you would *need* to look at the > address on the interface, because the new physical interface likely > has a different set of MAC addresses on it, and you''re going to need > to update those crufty tables (such as /etc/ethers). > > I''d even see no problem with issuing a warning when the > use-random-fallback event occurs: > > Warning: you asked for a factory address, but I couldn''t get > one. I''ve assigned a random address instead. If that''s not > ok, then you''ll probably want to reconfigure this interface. > > (Or perhaps something more professional-looking than that.)Huh? The guy only wants to deal with factory assigned MAC address and you would still assign a random MAC address and create a VNIC?? What does the guy do after that? Run delete-vnic since he doesn''t want it in the first place? I was with you till earlier email that there might be a better way of expressing the requirement that I am only interested in factory assigned mac addresses and *don''t* want to deal with random or user created things. But assigning a random MAC address when he asked for factory is almost ignoring the request.> The problem I have is with the failure mode. I don''t see a purpose.Perhaps if you try to understand the difference between a unique identifier (factory MAC) that is inventoried vs a randomly generated non-unique identifier, it will be clear to you.> >> Random MAC addresses are random at best and have no guarantees that >> they are unique across different machines. The virtualization crowd >> has adopted random mac address but a sizable set of customers are >> still skeptical about duplication etc. > > I understand why users would want to prefer factory addresses. I > wasn''t questioning that at all. > > I don''t understand why they would prefer to see failure. It doesn''t > seem helpful.Failure happen all the time when you run out of resources. We fail a process creation when we are out of memory? We fail a socket open when we run out of descriptors ...> Would users actually be inconvenienced if an interface worked because > it fell back to a random address, where they''d actually have an > advantage if it failed instead?Yes. And yes. When they see a packet on the wire, they need to know which physical machine is sending the packet and what is its location. HTH. Sunay -- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
James Carlson
2007-Sep-06 18:10 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
Sunay Tripathi writes:> James Carlson wrote: > > Warning: you asked for a factory address, but I couldn''t get > > one. I''ve assigned a random address instead. If that''s not > > ok, then you''ll probably want to reconfigure this interface. > > > > (Or perhaps something more professional-looking than that.) > > Huh? The guy only wants to deal with factory assigned MAC address > and you would still assign a random MAC address and create a VNIC?? > What does the guy do after that? Run delete-vnic since he doesn''t > want it in the first place?The original context of this was with a "move" operation, where failure seems quite strange. But, yes, that''s exactly what I''d expect to see as a user. In the scenario you cite, I''ve asked for two things. I''ve asked to have a VNIC created, and I''ve asked that it have a factory address. Your assertion is that if I can''t get one of those two things (the factory address), then I get nothing. You seem to be assuming that my request for "factory address" is more important than my request for a VNIC, such that my request for the VNIC can be ignored or rejected. I''m not so sure that''s a useful semantic, and I''m asking whether this sort of failure is what users *desire* to see.> I was with you till earlier email that there might be a better way > of expressing the requirement that I am only interested in factory > assigned mac addresses and *don''t* want to deal with random or user > created things. But assigning a random MAC address when he asked for > factory is almost ignoring the request.See above. I''m not ignoring the request. I''m saying: A. Honor the request if you can. B. If you can''t honor it, then at least create a usable interface. C. For bonus points, you can _always_ issue a warning message for users who somehow think "factory > random."> > The problem I have is with the failure mode. I don''t see a purpose. > > Perhaps if you try to understand the difference between a unique > identifier (factory MAC) that is inventoried vs a randomly generated > non-unique identifier, it will be clear to you.That''s still not the question I''m asking. I know the difference between random and factory assignment. I want to know why forcing failure is preferable to warning (if necessary) and driving on, particularly when forcing failure simply creates brand new points of annoyance. If you''re not interested in covering that in the document, then that''s fine by me. Just say you''re rejecting my comments. There''s no need to assume that I''m ignorant.> > I don''t understand why they would prefer to see failure. It doesn''t > > seem helpful. > > Failure happen all the time when you run out of resources. We fail a > process creation when we are out of memory? We fail a socket open when > we run out of descriptors ...This is an adminstrative interface, not a dynamic failure mode.> > Would users actually be inconvenienced if an interface worked because > > it fell back to a random address, where they''d actually have an > > advantage if it failed instead? > > Yes. And yes. When they see a packet on the wire, they need to know > which physical machine is sending the packet and what is its location.Specifying "factory" doesn''t actually tell you what address you will necessarily have. You still must use the status interfaces to tell which address you''ve gotten. As long as you''re doing that anyway, I don''t see much of a useful difference. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Sunay Tripathi
2007-Sep-06 20:50 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
James Carlson wrote:> Sunay Tripathi writes: >> James Carlson wrote: >>> Warning: you asked for a factory address, but I couldn''t get >>> one. I''ve assigned a random address instead. If that''s not >>> ok, then you''ll probably want to reconfigure this interface. >>> >>> (Or perhaps something more professional-looking than that.) >> Huh? The guy only wants to deal with factory assigned MAC address >> and you would still assign a random MAC address and create a VNIC?? >> What does the guy do after that? Run delete-vnic since he doesn''t >> want it in the first place? > > The original context of this was with a "move" operation, where > failure seems quite strange. > > But, yes, that''s exactly what I''d expect to see as a user. In the > scenario you cite, I''ve asked for two things. I''ve asked to have a > VNIC created, and I''ve asked that it have a factory address. > > Your assertion is that if I can''t get one of those two things (the > factory address), then I get nothing. You seem to be assuming that my > request for "factory address" is more important than my request for a > VNIC, such that my request for the VNIC can be ignored or rejected. > > I''m not so sure that''s a useful semantic, and I''m asking whether this > sort of failure is what users *desire* to see.Yes, we have about a dozen big customers that do exactly this.>> I was with you till earlier email that there might be a better way >> of expressing the requirement that I am only interested in factory >> assigned mac addresses and *don''t* want to deal with random or user >> created things. But assigning a random MAC address when he asked for >> factory is almost ignoring the request. > > See above. > > I''m not ignoring the request. I''m saying: > > A. Honor the request if you can. > > B. If you can''t honor it, then at least create a usable > interface.If that was the intent, the user can use the auto flag and let the system decide. The fact that he specified factory means he cares about it. Your definition of usable doesn''t match how this is done today. Users assert that if I can''t match a packet to a physical machine, it creates more problems. Same as I don''t have a IP address so let me snoop and see what address on the subnet is not in use and use that instead. One could argue that did create a usable interface. Understand that randomly created MAC addresses are just that - random. The probability of duplication in todays data center is a very finite probability and customers understand that and its an issue for them.> C. For bonus points, you can _always_ issue a warning message > for users who somehow think "factory > random." > >>> The problem I have is with the failure mode. I don''t see a purpose. >> Perhaps if you try to understand the difference between a unique >> identifier (factory MAC) that is inventoried vs a randomly generated >> non-unique identifier, it will be clear to you. > > That''s still not the question I''m asking. > > I know the difference between random and factory assignment. > > I want to know why forcing failure is preferable to warning (if > necessary) and driving on, particularly when forcing failure simply > creates brand new points of annoyance.Use the *auto* flag and not specify a specific mode if you don''t care.> If you''re not interested in covering that in the document, then that''s > fine by me. Just say you''re rejecting my comments. There''s no need > to assume that I''m ignorant. > >>> I don''t understand why they would prefer to see failure. It doesn''t >>> seem helpful. >> Failure happen all the time when you run out of resources. We fail a >> process creation when we are out of memory? We fail a socket open when >> we run out of descriptors ... > > This is an adminstrative interface, not a dynamic failure mode. > >>> Would users actually be inconvenienced if an interface worked because >>> it fell back to a random address, where they''d actually have an >>> advantage if it failed instead? >> Yes. And yes. When they see a packet on the wire, they need to know >> which physical machine is sending the packet and what is its location. > > Specifying "factory" doesn''t actually tell you what address you will > necessarily have. You still must use the status interfaces to tell > which address you''ve gotten.But if you do your inventory properly, when you see a packet on the wire, you can map it to a physical machine. This is how its done in real life buy some of our customers. Sunay -- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Nicolas Droux
2007-Sep-07 03:58 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture document
On Sep 6, 2007, at 12:10 PM, James Carlson wrote:> But, yes, that''s exactly what I''d expect to see as a user. In the > scenario you cite, I''ve asked for two things. I''ve asked to have a > VNIC created, and I''ve asked that it have a factory address. > > Your assertion is that if I can''t get one of those two things (the > factory address), then I get nothing. You seem to be assuming that my > request for "factory address" is more important than my request for a > VNIC, such that my request for the VNIC can be ignored or rejected.If it''s acceptable to you to have a random address assigned to your VNIC if no factory addresses are available, then don''t use the "factory" option. Use the default "auto" option. If the administrator specifically asks for a factory MAC address, I don''t see why it would be a problem to fail that operation if no factory MAC addresses are available. Nicolas. -- Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Ben Rockwood
2007-Sep-13 09:09 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture
To what extent will VNIC configuration be included with a Zone configuration? This is similar to Duckhorn, in order to make zones portable there needs to be little to no pre-configuration required in the global zone. I''m unclear if this is implied by "Functional Specification Summery" item 7: "Allow VNICs to be plumbed by Solaris zones". Ideally, within a Zone definition (/etc/zones/myzone.xml) all information required to configure the VNIC including bandwidth attributes, MAC settings, VLAN tags, etc, would be present so that if a zone was cloned or migrated to another system pre-configuration of the VNIC''s needed to support the zones wasn''t necessary. benr. This message posted from opensolaris.org
David Edmondson
2007-Sep-13 09:51 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture
On Thu, Sep 13, 2007 at 02:09:08AM -0700, Ben Rockwood wrote:> Ideally, within a Zone definition (/etc/zones/myzone.xml) all > information required to configure the VNIC including bandwidth > attributes, MAC settings, VLAN tags, etc, would be present so that > if a zone was cloned or migrated to another system pre-configuration > of the VNIC''s needed to support the zones wasn''t necessary.I agree that this would be ideal. Of course, You _might_ have to change the name of the underlying physical NIC if the two machines are different(ly connected).
Peter Memishian
2007-Sep-13 10:25 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture
> > Ideally, within a Zone definition (/etc/zones/myzone.xml) all> > information required to configure the VNIC including bandwidth > > attributes, MAC settings, VLAN tags, etc, would be present so that > > if a zone was cloned or migrated to another system pre-configuration > > of the VNIC''s needed to support the zones wasn''t necessary. > > I agree that this would be ideal. Of course, You _might_ have to > change the name of the underlying physical NIC if the two machines are > different(ly connected). Hopefully not post-Clearview-UV. -- meem
Ben Rockwood
2007-Sep-14 10:15 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture
If the physical NICs are diffrent thats a given, barring clearview. This message posted from opensolaris.org
Nicolas Droux
2007-Sep-14 20:10 UTC
[crossbow-discuss] Updated Crossbow virtualization architecture
Ben, Ben Rockwood wrote:> To what extent will VNIC configuration be included with a Zone > configuration? This is similar to Duckhorn, in order to make zones > portable there needs to be little to no pre-configuration required in > the global zone. I''m unclear if this is implied by "Functional > Specification Summery" item 7: "Allow VNICs to be plumbed by Solaris > zones".Our long term goal is to fully integrate the configuration of VNICs with Zone configuration. For now they are done separately, you basically create a VNIC with its properties with dladm(1M), and then assign the VNIC to the zone via zonecfg(1M) (assuming the zone has its own IP instance), the zone can then plumb vnic<x> directly. This is what we will integrate as part of our first Crossbow putback. For the long term, we''d like to do all of this from zonecfg(1M) itself directly, a la Duckhorn. For example, while the zone is being configured, you''ll be able to simply specify that you want a VNIC created on top of data-link "x0", call it "y0", with VLAN id <i>, and by default assign it the same CPUs that are used for the zone. This will be done as a follow-on putback.> Ideally, within a Zone definition (/etc/zones/myzone.xml) all > information required to configure the VNIC including bandwidth > attributes, MAC settings, VLAN tags, etc, would be present so that if > a zone was cloned or migrated to another system pre-configuration of > the VNIC''s needed to support the zones wasn''t necessary.Yes, that would be the case once we more closely integrate VNIC and Zone configuration. The configuration information would be kept on a per zone basis, and the VNIC would be instantiated dynamically when the zone is booted. Nicolas.> > benr. > > > This message posted from opensolaris.org > _______________________________________________ crossbow-discuss > mailing list crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux