Folks, The semantics of mac_disable() were modified by Crossbow such that dls_devnet_destroy() needs to be called before mac_disable() can succeed. This is because of the implicit reference that dls has on the mac_impl_t. This doesn''t work, because if mac_disable() fails, then the state of the world needs to be rewound to before the call to dls_devnet_destroy(), which is to say that dls_devnet_create() needs to be called to re-create the link. Of course dls_devnet_create() can itself fail. In that case, the system is irreparably broken. Prior to Crossbow, mac_disable() could be called prior to dls_devnet_destroy(). A successful call to mac_disable() would ensure that a subsequent call to dls_devnet_destroy() couldn''t fail since we no longer had references to the link. For an example of what I''m talking about, see the code in vnic_dev_delete(): if ((rc = dls_devnet_destroy(vnic->vn_mh, &tmpid, B_TRUE)) != 0) { rw_exit(&vnic_lock); return (rc); } ASSERT(vnic_id == tmpid); /* * We cannot unregister the MAC yet. Unregistering would * free up mac_impl_t which should not happen at this time. * So disable mac_impl_t by calling mac_disable(). This will prevent * any new claims on mac_impl_t. */ if ((rc = mac_disable(vnic->vn_mh)) != 0) { (void) dls_devnet_create(vnic->vn_mh, vnic_id); rw_exit(&vnic_lock); return (rc); } Similar code exists in aggr_grp_delete(). If the call to dls_devnet_create() fails above, the system is hosed. Was this problem considered, or is this an unintended side-effect of the way in which the mi_ref reference count was changed? -Seb
On Wed, Jan 07, 2009 at 05:51:51PM -0500, Sebastien Roy wrote:> > Similar code exists in aggr_grp_delete(). If the call to > dls_devnet_create() fails above, the system is hosed. Was this problem > considered, or is this an unintended side-effect of the way in which the > mi_ref reference count was changed? >this was an unintended side effect of the changes in dls_devnet_create. but the code snippet you pointed out above existed prior to crossbow, which means the system could still be hosed in pre-crossbow bits if dls_devnet_create() fails during the unwind. I understand that if we *could* call mac_disable() before dls_devnet_destroy(), the unwind would not be necessary. but given that this code has existed in nevada for so long, do you think there''s a need to fix this now? or do you think the risk of dls_devnet_create() failure has increased? eric
Hi Eric, On Wed, 2009-01-07 at 16:08 -0800, Eric Cheng wrote:> On Wed, Jan 07, 2009 at 05:51:51PM -0500, Sebastien Roy wrote: > > > > Similar code exists in aggr_grp_delete(). If the call to > > dls_devnet_create() fails above, the system is hosed. Was this problem > > considered, or is this an unintended side-effect of the way in which the > > mi_ref reference count was changed? > > > > this was an unintended side effect of the changes in dls_devnet_create. > > but the code snippet you pointed out above existed prior to crossbow, > which means the system could still be hosed in pre-crossbow bits if > dls_devnet_create() fails during the unwind.I understand that the vnic code I pointed out was broken prior to Crossbow, but I''m pointing out that we now have no choice but to use mac_disable() in this broken way in other modules. For example before Crossbow, in my development version of the iptun module, I was able to use mac_disable()/dls_devnet_destroy() in a safe way before by calling mac_disable() prior to calling dls_devnet_destroy(). Now, I''m unable to do that.> I understand that if we *could* call mac_disable() before > dls_devnet_destroy(), the unwind would not be necessary. > but given that this code has existed in nevada for so long, do you > think there''s a need to fix this now?I''m specifically talking about the ability to call mac_disable() before dls_devnet_destroy(). Before Crossbow, that was possible. After Crossbow, it''s no longer possible, and we''re forced to implement broken code like the code I pointed out in vnic_dev.c. So the behavior I''m referring to is new with Crossbow and has not existed in Nevada for very long.> or do you think the risk of dls_devnet_create() failure has increased?It hasn''t increased, but ignoring the possibillity that dls_devnet_create() could fail is a little bit like building a bridge that can only withstand 10 tons, and ignoring the possibility that five mac trucks full of bricks could one day cross it. -Seb
On Wed, Jan 07, 2009 at 08:12:44PM -0500, Sebastien Roy wrote:> > I understand that if we *could* call mac_disable() before > > dls_devnet_destroy(), the unwind would not be necessary. > > but given that this code has existed in nevada for so long, do you > > think there''s a need to fix this now? > > I''m specifically talking about the ability to call mac_disable() before > dls_devnet_destroy(). Before Crossbow, that was possible. After > Crossbow, it''s no longer possible, and we''re forced to implement broken > code like the code I pointed out in vnic_dev.c. So the behavior I''m > referring to is new with Crossbow and has not existed in Nevada for very > long. > > > or do you think the risk of dls_devnet_create() failure has increased? > > It hasn''t increased, but ignoring the possibillity that > dls_devnet_create() could fail is a little bit like building a bridge > that can only withstand 10 tons, and ignoring the possibility that five > mac trucks full of bricks could one day cross it. >I agree with you that this error case should not be ignored but what I don''t understand is why is there an urgency to fix this now given that this code existed since snv_83 in aggr/vnic and hasn''t given us problems. would it be possible for iptun to reuse this "broken" code for the time being? a bug could be opened to track the dls_devnet_create() failure issue and the fix will fix all modules (aggr/vnic/iptun) affected by this. eric
On Wed, 2009-01-07 at 18:09 -0800, Eric Cheng wrote:> On Wed, Jan 07, 2009 at 08:12:44PM -0500, Sebastien Roy wrote: > > > I understand that if we *could* call mac_disable() before > > > dls_devnet_destroy(), the unwind would not be necessary. > > > but given that this code has existed in nevada for so long, do you > > > think there''s a need to fix this now? > > > > I''m specifically talking about the ability to call mac_disable() before > > dls_devnet_destroy(). Before Crossbow, that was possible. After > > Crossbow, it''s no longer possible, and we''re forced to implement broken > > code like the code I pointed out in vnic_dev.c. So the behavior I''m > > referring to is new with Crossbow and has not existed in Nevada for very > > long. > > > > > or do you think the risk of dls_devnet_create() failure has increased? > > > > It hasn''t increased, but ignoring the possibillity that > > dls_devnet_create() could fail is a little bit like building a bridge > > that can only withstand 10 tons, and ignoring the possibility that five > > mac trucks full of bricks could one day cross it. > > > > I agree with you that this error case should not be ignored but what I > don''t understand is why is there an urgency to fix this now given that > this code existed since snv_83 in aggr/vnic and hasn''t given us problems.It''s not urgent. Although we haven''t experienced problems with this code in the field, there''s a probability that it could be problematic in the future. A bug should be filed at least.> would it be possible for iptun to reuse this "broken" code for the time > being?Yes, it could.> a bug could be opened to track the dls_devnet_create() failure > issue and the fix will fix all modules (aggr/vnic/iptun) affected by > this.Yes, that''s a fine resolution to this discussion. I wanted to be sure that there wasn''t anything I was missing regarding the design. Since there''s agreement that this isn''t right, I''ll file a bug. Thanks Eric, -Seb
On Wed, 2009-01-07 at 21:27 -0500, Sebastien Roy wrote:> On Wed, 2009-01-07 at 18:09 -0800, Eric Cheng wrote: > > a bug could be opened to track the dls_devnet_create() failure > > issue and the fix will fix all modules (aggr/vnic/iptun) affected by > > this. > > Yes, that''s a fine resolution to this discussion. I wanted to be sure > that there wasn''t anything I was missing regarding the design. Since > there''s agreement that this isn''t right, I''ll file a bug.I''ve filed the bug: 6791335 mac_disable() and dls_devnet_destroy() are mutually incompatible Thanks, -Seb
On Wed, Jan 07, 2009 at 10:10:44PM -0500, Sebastien Roy wrote:> > On Wed, 2009-01-07 at 21:27 -0500, Sebastien Roy wrote: > > On Wed, 2009-01-07 at 18:09 -0800, Eric Cheng wrote: > > > a bug could be opened to track the dls_devnet_create() failure > > > issue and the fix will fix all modules (aggr/vnic/iptun) affected by > > > this. > > > > Yes, that''s a fine resolution to this discussion. I wanted to be sure > > that there wasn''t anything I was missing regarding the design. Since > > there''s agreement that this isn''t right, I''ll file a bug. > > I''ve filed the bug: > > 6791335 mac_disable() and dls_devnet_destroy() are mutually incompatible >thanks, seb. eric