The primary development phase of the BPF/PF_PACKET project is now completed and I''d like to ask people for feedback and to review the code as it currently stands. The sockmod_pfp.c file has already been reviewed by Anders (thanks!) and changes according to his comments integrated. For reference, the PSARC case information relating to this work can be found here: http://arc.opensolaris.org/caselog/PSARC/2009/232/commitment.materials/ There are two gates that will require putbacks in the immediate future, ON and SFW. The timeout for review of these changes is the 12th of August, 2009. The changes required for onnv are here and include: - adaption of BPF from NetBSD to Solaris - delivery of PF_PACKET socket module - linking together dls, ipnet, mac and bpf - modification of ipnet to support delivery of packets via libpcap http://cr.opensolaris.org/~darrenr/onnv-pcapture/ A small number of changes are required to convert libpcap shipped with SFW to function with BPF as supplied with Solaris. The most important being allowing more than one DLT to be reported with the DLT list function. The change to using BPF for libpcap removes support for using DLPI with libpcap. The changes can be found here: http://cr.opensolaris.org/~darrenr/sfw-pcap/ Cheers, Darren
Sebastien Roy
2009-Aug-04 20:17 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
Darren, On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:> http://cr.opensolaris.org/~darrenr/onnv-pcapture/ >Do you have a cscope database for this code-review repo somewhere to help with browsing the code? -Seb
Darren Reed
2009-Aug-04 20:47 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 4/08/09 01:17 PM, Sebastien Roy wrote:> Darren, > > On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote: > >> http://cr.opensolaris.org/~darrenr/onnv-pcapture/ >> >> > > Do you have a cscope database for this code-review repo somewhere to > help with browsing the code? >Is there a way to upload a cut-down version of that to opensolaris or does this remain a SWAN thing only? /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/ btw, in reviewing the code myself, I had some questions about if I should do some renaming in ipnet.c... For example, there is "ipnetif_refhold/ipnetif_refrele" but then there is "ipnet_create_if" and "ipnet_free_if". Also, in some places "ipnetif" is the "local" variable for an ipnetif_t and in others it is "ifp". Whilst seemingly minor, it doesn''t feel like it is all the product of a single design? During working on that file, I got annoyed enough with the different code styles used for local variables that I converted them all to be one because otherwise it was not clear to which style new code should be added! And in review, I''ve come up against the same issue with adding new functions but it has bothered me less. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090804/23f9f07c/attachment.html>
Sebastien Roy
2009-Aug-04 20:53 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Tue, 2009-08-04 at 13:47 -0700, Darren Reed wrote:> On 4/08/09 01:17 PM, Sebastien Roy wrote: > > Darren, > > > > On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote: > > > > > http://cr.opensolaris.org/~darrenr/onnv-pcapture/ > > > > > > > > > > Do you have a cscope database for this code-review repo somewhere to > > help with browsing the code? > > > > Is there a way to upload a cut-down version of that to opensolaris > or does this remain a SWAN thing only?cscope needs to co-exist with an actual repo on a filesystem. I suppose you could post cscope databases on opensolaris in a repo and require people to pull that repo to a local filesystem to use it...> > /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/ >Thanks. Could you run xref under usr/src/uts/ as well please?> btw, in reviewing the code myself, I had some questions about if > I should do some renaming in ipnet.c... > > For example, there is "ipnetif_refhold/ipnetif_refrele" but then > there is "ipnet_create_if" and "ipnet_free_if". Also, in some places > "ipnetif" is the "local" variable for an ipnetif_t and in others it is > "ifp".Feel free to fix those inconsistencies.> > Whilst seemingly minor, it doesn''t feel like it is all the product > of a single design?It''s not a design-level issue, but rather an implementation issue. It''s a product of having multiple people working on the file over time, and perhaps not synchronizing totally on nomenclature.> During working on that file, I got annoyed > enough with the different code styles used for local variables > that I converted them all to be one because otherwise it was not > clear to which style new code should be added!The style is, and should be (for multiple variables): <type> <name>; <type> <name>; Where each name is aligned at a common tab-stop. I don''t see many inconsistencies here, I''m not sure what you''re referring to. I''d refrain from making sweeping changes to the code style in the file, as that detracts from reviewing your actual substantive changes.> And in review, > I''ve come up against the same issue with adding new functions > but it has bothered me less.Okay, -Seb
Darren Reed
2009-Aug-04 21:09 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 4/08/09 01:53 PM, Sebastien Roy wrote:> On Tue, 2009-08-04 at 13:47 -0700, Darren Reed wrote: > >> On 4/08/09 01:17 PM, Sebastien Roy wrote: >> >>> Darren, >>> >>> On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote: >>> >>> >>>> http://cr.opensolaris.org/~darrenr/onnv-pcapture/ >>>> >>>> >>>> >>> Do you have a cscope database for this code-review repo somewhere to >>> help with browsing the code? >>> >>> >> Is there a way to upload a cut-down version of that to opensolaris >> or does this remain a SWAN thing only? >> > > cscope needs to co-exist with an actual repo on a filesystem. I suppose > you could post cscope databases on opensolaris in a repo and require > people to pull that repo to a local filesystem to use it... > > >> /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/ >> >> > > Thanks. Could you run xref under usr/src/uts/ as well please? >I''ve previously only built cscope''s for usr/src with "make cscope.out". It looks like it should work for uts, but it fails for me?>> btw, in reviewing the code myself, I had some questions about if >> I should do some renaming in ipnet.c... >> >> For example, there is "ipnetif_refhold/ipnetif_refrele" but then >> there is "ipnet_create_if" and "ipnet_free_if". Also, in some places >> "ipnetif" is the "local" variable for an ipnetif_t and in others it is >> "ifp". >> > > Feel free to fix those inconsistencies. >ok.>> Whilst seemingly minor, it doesn''t feel like it is all the product >> of a single design? >> > > It''s not a design-level issue, but rather an implementation issue. It''s > a product of having multiple people working on the file over time, and > perhaps not synchronizing totally on nomenclature. >Yeah, I figured as much. C''est la vie.>> During working on that file, I got annoyed >> enough with the different code styles used for local variables >> that I converted them all to be one because otherwise it was not >> clear to which style new code should be added! >> > > The style is, and should be (for multiple variables): > > <type> <name>; > <type> <name>; > > Where each name is aligned at a common tab-stop. I don''t see many > inconsistencies here, I''m not sure what you''re referring to. I''d > refrain from making sweeping changes to the code style in the file, as > that detracts from reviewing your actual substantive changes. >There weren''t many ... just enough in the wrong places ... but if that is the style, then that''s what I''ll use. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090804/11af043f/attachment.html>
Sebastien Roy
2009-Aug-04 21:13 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Tue, 2009-08-04 at 14:09 -0700, Darren Reed wrote:> On 4/08/09 01:53 PM, Sebastien Roy wrote: > > > > > /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/ > > > > > > > > > > Thanks. Could you run xref under usr/src/uts/ as well please? > > > > I''ve previously only built cscope''s for usr/src with "make > cscope.out". > It looks like it should work for uts, but it fails for me?I usually just type "xref", and that works fine.> > > During working on that file, I got annoyed > > > enough with the different code styles used for local variables > > > that I converted them all to be one because otherwise it was not > > > clear to which style new code should be added! > > > > > > > The style is, and should be (for multiple variables): > > > > <type> <name>; > > <type> <name>; > > > > Where each name is aligned at a common tab-stop. I don''t see many > > inconsistencies here, I''m not sure what you''re referring to. I''d > > refrain from making sweeping changes to the code style in the file, as > > that detracts from reviewing your actual substantive changes. > > > > There weren''t many ... just enough in the wrong places ... > but if that is the style, then that''s what I''ll use.Great, thanks. -Seb
Sebastien Roy
2009-Aug-04 21:29 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Tue, 2009-08-04 at 14:29 -0700, Darren Reed wrote:> It helped having $SRC set. > > Fixed.Thank you. -Seb
Darren Reed
2009-Aug-04 21:29 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 4/08/09 02:13 PM, Sebastien Roy wrote:> On Tue, 2009-08-04 at 14:09 -0700, Darren Reed wrote: > >> On 4/08/09 01:53 PM, Sebastien Roy wrote: >> >>> >>> >>>> /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/ >>>> >>>> >>>> >>> Thanks. Could you run xref under usr/src/uts/ as well please? >>> >>> >> I''ve previously only built cscope''s for usr/src with "make >> cscope.out". >> It looks like it should work for uts, but it fails for me? >> > > I usually just type "xref", and that works fine. >It helped having $SRC set. Fixed. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090804/a739519d/attachment.html>
Sebastien Roy
2009-Aug-12 13:37 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
Darren, On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:> http://cr.opensolaris.org/~darrenr/onnv-pcapture/ >I reviewed the changes to ipnet and ip: General: There are quite a bit of changes related to the style of local variable declaration that was changed, thus adding to the volume of the review. In general, it would be preferable to restrict the set of changes to the functionality being added, and to cstyle and lint issues found while making these changes. Otherwise, sticking with the existing style of the file without making sweeping changes to style based on personal preference should be the modus operandi. I won''t comment on each instance, I''ve restricted my specific comments to things not related to local variable declarations. :-) uts/common/inet/ipnet.h * 58: This may need to be atomic (atomic_inc_64()) * 58: Wrap _y in parens uts/common/inet/ipnet/ipnet.c * 216: s/instnace/instance * 539: Please restore ipnet_sap code as discussed on clearview-discuss. * 572: This is a spurious change, please restore the code as it was. ips is derived from ns->netstack_ipnet at 532. Why go through ips to get at ns again when ns can ge used directly? * 787: As we discussed on clearview-discuss, the ipnet SAP space was fine. * 1168: What''s the purpose of this addition? This function only gets called for ipnet_t streams that are on the /dev/lo0 device. * 1327-1328,2300-2304: This is not correct. The IPNETIF_LOOPBACK code breaks the semantics defined in PSARC 2006/475 as we talked about on clearview-discuss. * 1959: ipobs_bounce_func() isn''t obvious as far as names go. How about ipobs_hook_cb()? The convention of _cb() for a callback is already used elsewhere in the file. * 1959: What is the reason for this callback having to duplicate every packet passed to it? Why wouldn''t the code that dispatches packets to callbacks always simply pass a duplicate? An observability hook should always have those semantics (it should never manipulate the original or block while processing it), so hard-coding such semantics for all NH_OBSERVE hooks would make sense to me. * 2203-2232: datalink_id_t is not the same as IP interface index. They don''t represent the same objects, and they don''t exist at the same layer. if_index is an IP interface index, and a datalink_id_t is a linkid as handled in the GLDv3 framework. This code has nothing to do with GLDv3, and is strictly contained at the IP-layer, so it shouldn''t be handling datalink_id_t. -Seb
Peter Memishian
2009-Aug-12 18:29 UTC
[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review
> General: There are quite a bit of changes related to the style of> local variable declaration that was changed, thus adding to the volume > of the review. In general, it would be preferable to restrict the set > of changes to the functionality being added, and to cstyle and lint > issues found while making these changes. Otherwise, sticking with the > existing style of the file without making sweeping changes to style > based on personal preference should be the modus operandi. I won''t > comment on each instance, I''ve restricted my specific comments to > things not related to local variable declarations. :-) I''d go a step further and demand that all changes that are related to separating initializers from declarations be reverted. For instance, for no apparent reason, this: 537 static int 538 ipnet_close(queue_t *rq) 539 { 540 ipnet_t *ipnet = rq->q_ptr; 541 ipnet_stack_t *ips = ipnet->ipnet_ns->netstack_ipnet; ... was expanded to: 588 static int 589 ipnet_close(queue_t *rq) 590 { 591 ipnet_stack_t *ips; 592 ipnet_t *ipnet; 593 594 ipnet = rq->q_ptr; 595 ips = ipnet->ipnet_ns->netstack_ipnet; The original code was correct and wholly unobjectionable -- and the latter form suggests to the reader that there is something subtle going on that required it, which is not true. -- meem
Sebastien Roy
2009-Aug-12 18:43 UTC
[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Wed, 2009-08-12 at 11:29 -0700, Peter Memishian wrote:> > General: There are quite a bit of changes related to the style of > > local variable declaration that was changed, thus adding to the volume > > of the review. In general, it would be preferable to restrict the set > > of changes to the functionality being added, and to cstyle and lint > > issues found while making these changes. Otherwise, sticking with the > > existing style of the file without making sweeping changes to style > > based on personal preference should be the modus operandi. I won''t > > comment on each instance, I''ve restricted my specific comments to > > things not related to local variable declarations. :-) > > I''d go a step further and demand that all changes that are related to > separating initializers from declarations be reverted.Yes, I would also include such modifications in the category of style changes that I wish to be reverted. -Seb
Darren Reed
2009-Aug-12 20:54 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 12/08/09 06:37 AM, Sebastien Roy wrote:> Darren, > > On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote: >> http://cr.opensolaris.org/~darrenr/onnv-pcapture/ >> > > I reviewed the changes to ipnet and ip: > > General: There are quite a bit of changes related to the style of > local variable declaration that was changed, thus adding to the volume > of the review. In general, it would be preferable to restrict the set > of changes to the functionality being added, and to cstyle and lint > issues found while making these changes. Otherwise, sticking with the > existing style of the file without making sweeping changes to style > based on personal preference should be the modus operandi. I won''t > comment on each instance, I''ve restricted my specific comments to > things not related to local variable declarations. :-)As we discussed, i want there to be *1* style for the file. I have no interest in which style that is. I just don''t like working on source code with mixed styles because it isn''t clear which one should be used going forward... Is it to be: <type><tab><variable> or <type><n tabs><variables> .. where all the variables are aligned and an arbitrary number of tabs are inserted. The difference is (what I''ve currently done): struct ip ip; int foo; one tab between type and name, vs: struct ip ip int foo; where short type names have extra tabs. At present, compare _init()''s variables (space between type and variable name) with those in ipnet_if_init(). _init()''s use the latter of the above - multiple tabs to bring variables out to the same position, which is different to what I interpreted your earlier suggestion (single tab) as being.> > uts/common/inet/ipnet.h > > * 58: This may need to be atomic (atomic_inc_64())Discuss. In a similar vein to many of the counters in IP not being atomic increments, so to is this. There''s a real cost for using atomic_() ops and it seems that unless there is an absolute need for it to be 100% reliable (reference count, etc), the "danger" in just using "++" for general statistics seems to be accepted. Compare with MIB_BUMP() and other _BUMP() macros in $SRC/uts/common/inet/*.h.> * 58: Wrap _y in parensReject. Does not work - code will not compile like that, I wish it did. Not good, I know, but that''s life.> uts/common/inet/ipnet/ipnet.c > > * 216: s/instnace/instanceAccept.> * 539: Please restore ipnet_sap code as discussed on > clearview-discuss.Accept.> * 572: This is a spurious change, please restore the code as it was. > ips is derived from ns->netstack_ipnet at 532. Why go through ips > to get at ns again when ns can ge used directly?Accept.> > * 787: As we discussed on clearview-discuss, the ipnet SAP space was fine.Accept.> > * 1168: What''s the purpose of this addition? This function only gets > called for ipnet_t streams that are on the /dev/lo0 device.Discuss. ipnet_loaccept() is used twice: once in ipnet_open() and once in ipnet_promisc_add().> > * 1327-1328,2300-2304: This is not correct. The IPNETIF_LOOPBACK code > breaks the semantics defined in PSARC 2006/475 as we talked about on > clearview-discuss.Accept. IPNETIF_LOOPBACK is now only used by the BPF code path.> * 1959: ipobs_bounce_func() isn''t obvious as far as names go. How > about ipobs_hook_cb()? The convention of _cb() for a callback is > already used elsewhere in the file.Accept.> * 1959: What is the reason for this callback having to duplicate every > packet passed to it? Why wouldn''t the code that dispatches packets > to callbacks always simply pass a duplicate? An observability hook > should always have those semantics (it should never manipulate the > original or block while processing it), so hard-coding such > semantics for all NH_OBSERVE hooks would make sense to me.Discuss. The duplication here has been moved from ip.c`ipobs_hook() to here. Whereas ipobs_hook() was previously just walking a list of functions to call and doing the duplication there, as required, now the list walking is separated (pfhooks) from the packet duplication. Thus the dispatching comes out of pfhooks and jumps through ipobs_bounce_func_cb() to do the duplication for the real recipient (func). But, if your comments above are correct, then dupmsg() should be removed and only copymsg() used, otherwise the observe is presented with the real packet data (in dblk_t''s off of its own mblk_t.) There is a performance cost with this, though: it enforces a deep copy of all observed packets, copying all of the dblk_t''s, not just a shallow one (creating new mblk_t''s.) The current implementation only does a deep copy if the shallow one fails.> * 2203-2232: datalink_id_t is not the same as IP interface index. > They don''t represent the same objects, and they don''t exist at the > same layer. if_index is an IP interface index, and a datalink_id_t > is a linkid as handled in the GLDv3 framework. This code has > nothing to do with GLDv3, and is strictly contained at the IP-layer, > so it shouldn''t be handling datalink_id_t.Discuss. The index used is the index from IP, but seeing as there is no ipindex_t, I used datalink_id_t to put the index in. It would seem that using datalink_id_t as a type here is cause for confusion so it might be better to just use uint_t. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090812/e0adc720/attachment.html>
Sebastien Roy
2009-Aug-12 21:14 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Wed, 2009-08-12 at 13:54 -0700, Darren Reed wrote:> On 12/08/09 06:37 AM, Sebastien Roy wrote: > > Darren, > > > > On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote: > > > http://cr.opensolaris.org/~darrenr/onnv-pcapture/ > > > > > > > I reviewed the changes to ipnet and ip: > > > > General: There are quite a bit of changes related to the style of > > local variable declaration that was changed, thus adding to the volume > > of the review. In general, it would be preferable to restrict the set > > of changes to the functionality being added, and to cstyle and lint > > issues found while making these changes. Otherwise, sticking with the > > existing style of the file without making sweeping changes to style > > based on personal preference should be the modus operandi. I won''t > > comment on each instance, I''ve restricted my specific comments to > > things not related to local variable declarations. :-) > > > As we discussed, i want there to be *1* style for the file. > I have no interest in which style that is. I just don''t like > working on source code with mixed styles because it isn''t > clear which one should be used going forward... > > Is it to be: > <type><tab><variable> > or > <type><n tabs><variables> > .. where all the variables are aligned and an arbitrary number > of tabs are inserted. > > The difference is (what I''ve currently done): > > struct ip ip; > int foo; > > one tab between type and name, vs: > > struct ip ip > int foo; > > where short type names have extra tabs. > > At present, compare _init()''s variables (space between type and > variable name) with those in ipnet_if_init(). > _init()''s use the latter of the above - multiple tabs to bring > variables out to the same position, which is different to what > I interpreted your earlier suggestion (single tab) as being.All variable names should be aligned at a tab boundary, and initialized in the declaration where possible. That''s the style used throughout the file. There may be a couple of exceptions (i.e. mistakes), but that''s not reason enough to change the entire file to use an entirely new style. If there''s a function that isn''t using this style, then let''s fix it rather than fix every other function. e.g. netstack_t *ns = netstack_find_by_zoneid(ifp->if_zoneid); ip_stack_t *ipst = ns->netstack_ip; ipnet_t *ipnet; char name[32]; int error = 0;> > > > > uts/common/inet/ipnet.h > > > > * 58: This may need to be atomic (atomic_inc_64()) > > Discuss. > In a similar vein to many of the counters in IP not being > atomic increments, so to is this. There''s a real cost for > using atomic_() ops and it seems that unless there is an > absolute need for it to be 100% reliable (reference count, > etc), the "danger" in just using "++" for general statistics > seems to be accepted. Compare with MIB_BUMP() and > other _BUMP() macros in $SRC/uts/common/inet/*.h.I would think that the cost of the atomic_inc_*() APIs are quite minimal (and they''re relatively widely used in IP), but I can''t say that I care very much either way. Having an accurate error count would seem like an important thing, and performance when recording an error would seem like a secondary concern (and again, I doubt that atomic operations would affect performance of ipnet).> > > > * 1168: What''s the purpose of this addition? This function only gets > > called for ipnet_t streams that are on the /dev/lo0 device. > > Discuss. > ipnet_loaccept() is used twice: once in ipnet_open() and once in > ipnet_promisc_add().I don''t follow. ipnet_acceptfn should only be set to ipnet_loaccept() if /dev/lo0 is opened, and in no other case. Therefore the check you added in the function itself: "if (getminor(ipnet->ipnet_if->if_dev) =IPNET_MINOR_LO)" is redundant. If ipnet_loaccept() is getting called when it shouldn''t be, then something else is wrong.> > > > * 1327-1328,2300-2304: This is not correct. The IPNETIF_LOOPBACK code > > breaks the semantics defined in PSARC 2006/475 as we talked about on > > clearview-discuss. > > Accept. > IPNETIF_LOOPBACK is now only used by the BPF code path.In what case, and what does it mean? Remember that /dev/lo0 and /dev/ipnet/lo0 have different semantics, and that ipnet_loaccept() is only strictly implementing the semantics of /dev/lo0.> > * 1959: What is the reason for this callback having to duplicate every > > packet passed to it? Why wouldn''t the code that dispatches packets > > to callbacks always simply pass a duplicate? An observability hook > > should always have those semantics (it should never manipulate the > > original or block while processing it), so hard-coding such > > semantics for all NH_OBSERVE hooks would make sense to me. > > Discuss. > The duplication here has been moved from ip.c`ipobs_hook() to here. > Whereas ipobs_hook() was previously just walking a list of functions > to call and doing the duplication there, as required, now the list > walking is separated (pfhooks) from the packet duplication. Thus the > dispatching comes out of pfhooks and jumps through ipobs_bounce_func_cb() > to do the duplication for the real recipient (func).Ah, I see.> But, if your > comments above are correct, then dupmsg() should be removed and only > copymsg() used, otherwise the observe is presented with the real > packet data (in dblk_t''s off of its own mblk_t.) There is a performance > cost with this, though: it enforces a deep copy of all observed packets, > copying all of the dblk_t''s, not just a shallow one (creating new mblk_t''s.) > The current implementation only does a deep copy if the shallow one > fails.Okay.> > * 2203-2232: datalink_id_t is not the same as IP interface index. > > They don''t represent the same objects, and they don''t exist at the > > same layer. if_index is an IP interface index, and a datalink_id_t > > is a linkid as handled in the GLDv3 framework. This code has > > nothing to do with GLDv3, and is strictly contained at the IP-layer, > > so it shouldn''t be handling datalink_id_t. > > Discuss. > The index used is the index from IP, but seeing as there is no > ipindex_t, I used datalink_id_t to put the index in. It would > seem that using datalink_id_t as a type here is cause for confusion > so it might be better to just use uint_t.Yep, I think so. -Seb
Darren Reed
2009-Aug-12 21:58 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 12/08/09 02:14 PM, Sebastien Roy wrote:> On Wed, 2009-08-12 at 13:54 -0700, Darren Reed wrote: > >> On 12/08/09 06:37 AM, Sebastien Roy wrote: >> >>> Darren, >>> >>> On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote: >>> >>>> http://cr.opensolaris.org/~darrenr/onnv-pcapture/ >>>> >>>> >>> I reviewed the changes to ipnet and ip: >>> >>> General: There are quite a bit of changes related to the style of >>> local variable declaration that was changed, thus adding to the volume >>> of the review. In general, it would be preferable to restrict the set >>> of changes to the functionality being added, and to cstyle and lint >>> issues found while making these changes. Otherwise, sticking with the >>> existing style of the file without making sweeping changes to style >>> based on personal preference should be the modus operandi. I won''t >>> comment on each instance, I''ve restricted my specific comments to >>> things not related to local variable declarations. :-) >>> >> As we discussed, i want there to be *1* style for the file. >> I have no interest in which style that is. I just don''t like >> working on source code with mixed styles because it isn''t >> clear which one should be used going forward... >> >> Is it to be: >> <type><tab><variable> >> or >> <type><n tabs><variables> >> .. where all the variables are aligned and an arbitrary number >> of tabs are inserted. >> >> The difference is (what I''ve currently done): >> >> struct ip ip; >> int foo; >> >> one tab between type and name, vs: >> >> struct ip ip >> int foo; >> >> where short type names have extra tabs. >> >> At present, compare _init()''s variables (space between type and >> variable name) with those in ipnet_if_init(). >> _init()''s use the latter of the above - multiple tabs to bring >> variables out to the same position, which is different to what >> I interpreted your earlier suggestion (single tab) as being. >> > > All variable names should be aligned at a tab boundary, and initialized > in the declaration where possible. That''s the style used throughout the > file. There may be a couple of exceptions (i.e. mistakes), but that''s > not reason enough to change the entire file to use an entirely new > style. If there''s a function that isn''t using this style, then let''s > fix it rather than fix every other function. e.g. > > netstack_t *ns = netstack_find_by_zoneid(ifp->if_zoneid); > ip_stack_t *ipst = ns->netstack_ip; > ipnet_t *ipnet; > char name[32]; > int error = 0; >Not a problem.>>> uts/common/inet/ipnet.h >>> >>> * 58: This may need to be atomic (atomic_inc_64()) >>> >> Discuss. >> In a similar vein to many of the counters in IP not being >> atomic increments, so to is this. There''s a real cost for >> using atomic_() ops and it seems that unless there is an >> absolute need for it to be 100% reliable (reference count, >> etc), the "danger" in just using "++" for general statistics >> seems to be accepted. Compare with MIB_BUMP() and >> other _BUMP() macros in $SRC/uts/common/inet/*.h. >> > > I would think that the cost of the atomic_inc_*() APIs are quite minimal > (and they''re relatively widely used in IP),No, they''re not as inexpensive as we might think, especially on multi CPU systems. There was an internal talk on the cost of mutex''s and such earlier in the year: a much smaller number of them can be done than we might expect (the other point made from that talk was to avoid locking whenever possible because of their impact on the bus.)>>> * 1168: What''s the purpose of this addition? This function only gets >>> called for ipnet_t streams that are on the /dev/lo0 device. >>> >> Discuss. >> ipnet_loaccept() is used twice: once in ipnet_open() and once in >> ipnet_promisc_add(). >> > > I don''t follow. ipnet_acceptfn should only be set to ipnet_loaccept() > if /dev/lo0 is opened, and in no other case. Therefore the check you > added in the function itself: "if (getminor(ipnet->ipnet_if->if_dev) => IPNET_MINOR_LO)" is redundant. If ipnet_loaccept() is getting called > when it shouldn''t be, then something else is wrong. >So lets examine ipnet_accept vs ipnet_loaccept for the purpose of lo0 packets going through ipnet, into BPF, in a local zone with a shared stack. If we set ipnet_acceptfn to ipnet_accept for lo0 packets that are going to end up in BPF: - the IP address of the packet isn''t important, so what address type it is does not matter, thus calls to ipnet_get_addrtype() are not required; - the lo0 interface has no special behaviour for promiscuous mode, so that flag and the SAP one are meaningless; - thus there''s lots of code in that function that gets executed for seemingly no reason. The goal of using BPF in a local zone with a shared stack is to see all of the traffic in that zone which is associated with the lo0 "interface". The ipnet_loaccept function provides a filter that does exactly that - except for the filter on the hook type. I suppose i could add an ipnet_bpf_loaccept but it would be a carbon copy of ipnet_loaccept with that check removed. I think what I''m saying is that BPF on lo0 in a zone has the same semantics as /dev/lo0 and not /dev/ipnet/lo0.>>> * 1959: What is the reason for this callback having to duplicate every >>> packet passed to it? Why wouldn''t the code that dispatches packets >>> to callbacks always simply pass a duplicate? An observability hook >>> should always have those semantics (it should never manipulate the >>> original or block while processing it), so hard-coding such >>> semantics for all NH_OBSERVE hooks would make sense to me. >>> >> Discuss. >> The duplication here has been moved from ip.c`ipobs_hook() to here. >> Whereas ipobs_hook() was previously just walking a list of functions >> to call and doing the duplication there, as required, now the list >> walking is separated (pfhooks) from the packet duplication. Thus the >> dispatching comes out of pfhooks and jumps through ipobs_bounce_func_cb() >> to do the duplication for the real recipient (func). >> > > Ah, I see. > > >> But, if your >> comments above are correct, then dupmsg() should be removed and only >> copymsg() used, otherwise the observe is presented with the real >> packet data (in dblk_t''s off of its own mblk_t.) There is a performance >> cost with this, though: it enforces a deep copy of all observed packets, >> copying all of the dblk_t''s, not just a shallow one (creating new mblk_t''s.) >> The current implementation only does a deep copy if the shallow one >> fails. >> > > Okay. >Just to confirm, it''s ok to leave the code to do both dupmsg() and copymsg() for performance reasons.>>> * 2203-2232: datalink_id_t is not the same as IP interface index. >>> They don''t represent the same objects, and they don''t exist at the >>> same layer. if_index is an IP interface index, and a datalink_id_t >>> is a linkid as handled in the GLDv3 framework. This code has >>> nothing to do with GLDv3, and is strictly contained at the IP-layer, >>> so it shouldn''t be handling datalink_id_t. >>> >> Discuss. >> The index used is the index from IP, but seeing as there is no >> ipindex_t, I used datalink_id_t to put the index in. It would >> seem that using datalink_id_t as a type here is cause for confusion >> so it might be better to just use uint_t. >> > > Yep, I think so. >Ok, I''ll fix that. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090812/dacb2c87/attachment.html>
Sebastien Roy
2009-Aug-13 01:20 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Wed, 2009-08-12 at 14:58 -0700, Darren Reed wrote:> > > > * 1168: What''s the purpose of this addition? This function only gets > > > > called for ipnet_t streams that are on the /dev/lo0 device. > > > > > > > Discuss. > > > ipnet_loaccept() is used twice: once in ipnet_open() and once in > > > ipnet_promisc_add(). > > > > > > > I don''t follow. ipnet_acceptfn should only be set to ipnet_loaccept() > > if /dev/lo0 is opened, and in no other case. Therefore the check you > > added in the function itself: "if (getminor(ipnet->ipnet_if->if_dev) => > IPNET_MINOR_LO)" is redundant. If ipnet_loaccept() is getting called > > when it shouldn''t be, then something else is wrong. > > > > So lets examine ipnet_accept vs ipnet_loaccept for the purpose of > lo0 packets going through ipnet, into BPF, in a local zone with a > shared stack. > > If we set ipnet_acceptfn to ipnet_accept for lo0 packets that are > going to end up in BPF: > - the IP address of the packet isn''t important, so what address type > it is does not matter, thus calls to ipnet_get_addrtype() are not > required;The address is important. A consumer of an ipnet device receives packets if they contain an IP address assigned to that interface (this is described in the IP Observability PSARC case). That''s the distinction between /dev/lo0 and /dev/ipnet/lo0 (and thus the difference between ipnet_accept() and ipnet_loaccept()).> - the lo0 interface has no special behaviour for promiscuous mode, > so that flag and the SAP one are meaningless; > - thus there''s lots of code in that function that gets executed for > seemingly no reason.ipnet_accept() implements the semantics of _all_ ipnet observability nodes including /dev/ipnet/lo0. ipnet_loaccept() only implements the semantics of /dev/lo0 (which are deliberately different as described in the PSARC case). If you feel that ipnet_accept() does not perform well enough for the special case of /dev/ipnet/lo0 in a shared stack zone, then the answer isn''t to break its semantic by using ipnet_loaccept(), but rather to address its performance issues (if indeed there are any).> The goal of using BPF in a local zone with a shared stack is to > see all of the traffic in that zone which is associated with the > lo0 "interface". The ipnet_loaccept function provides a filter > that does exactly that - except for the filter on the hook type. > I suppose i could add an ipnet_bpf_loaccept but it would be > a carbon copy of ipnet_loaccept with that check removed. > > I think what I''m saying is that BPF on lo0 in a zone has the > same semantics as /dev/lo0 and not /dev/ipnet/lo0.I don''t understand. I thought that there were different "levels" of observability for BPF, where if one enabled DLT_IPNET, one would get the ipnet semantics (/dev/ipnet/* for any device). I would then assume that for "lo0", if one does not use DLT_IPNET, then one gets the /dev/lo0 semantics, but if one enables DLT_IPNET, one gets /dev/ipnet/lo0 semantics. In any case, we''re currently talking about the setting of ipnet_loaccept as ipnet_acceptfn, which definitely needs to be worked out for BPF, but the comment I made was slightly different. The code change you made in ipnet_loaccept() itself seems to be tangential to that, and I still don''t know why you made that change. Returning B_FALSE if /dev/lo0 wasn''t the device opened still seems redundant regardless.> > > > * 1959: What is the reason for this callback having to duplicate every > > > > packet passed to it? Why wouldn''t the code that dispatches packets > > > > to callbacks always simply pass a duplicate? An observability hook > > > > should always have those semantics (it should never manipulate the > > > > original or block while processing it), so hard-coding such > > > > semantics for all NH_OBSERVE hooks would make sense to me. > > > > > > > Discuss. > > > The duplication here has been moved from ip.c`ipobs_hook() to here. > > > Whereas ipobs_hook() was previously just walking a list of functions > > > to call and doing the duplication there, as required, now the list > > > walking is separated (pfhooks) from the packet duplication. Thus the > > > dispatching comes out of pfhooks and jumps through ipobs_bounce_func_cb() > > > to do the duplication for the real recipient (func). > > > > > > > Ah, I see. > > > > > > > But, if your > > > comments above are correct, then dupmsg() should be removed and only > > > copymsg() used, otherwise the observe is presented with the real > > > packet data (in dblk_t''s off of its own mblk_t.) There is a performance > > > cost with this, though: it enforces a deep copy of all observed packets, > > > copying all of the dblk_t''s, not just a shallow one (creating new mblk_t''s.) > > > The current implementation only does a deep copy if the shallow one > > > fails. > > > > > > > Okay. > > > > Just to confirm, it''s ok to leave the code to do both dupmsg() and > copymsg() > for performance reasons.Sure, that''s fine. -Seb
Darren Reed
2009-Aug-13 04:40 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 12/08/09 06:20 PM, Sebastien Roy wrote:> On Wed, 2009-08-12 at 14:58 -0700, Darren Reed wrote: > >>>>> * 1168: What''s the purpose of this addition? This function only gets >>>>> called for ipnet_t streams that are on the /dev/lo0 device. >>>>> >>>>> >>>> Discuss. >>>> ipnet_loaccept() is used twice: once in ipnet_open() and once in >>>> ipnet_promisc_add(). >>>> >>>> >>> I don''t follow. ipnet_acceptfn should only be set to ipnet_loaccept() >>> if /dev/lo0 is opened, and in no other case. Therefore the check you >>> added in the function itself: "if (getminor(ipnet->ipnet_if->if_dev) =>>> IPNET_MINOR_LO)" is redundant. If ipnet_loaccept() is getting called >>> when it shouldn''t be, then something else is wrong. >>> >>> >> So lets examine ipnet_accept vs ipnet_loaccept for the purpose of >> lo0 packets going through ipnet, into BPF, in a local zone with a >> shared stack. >> >> If we set ipnet_acceptfn to ipnet_accept for lo0 packets that are >> going to end up in BPF: >> - the IP address of the packet isn''t important, so what address type >> it is does not matter, thus calls to ipnet_get_addrtype() are not >> required; >> > > The address is important. A consumer of an ipnet device receives > packets if they contain an IP address assigned to that interface (this > is described in the IP Observability PSARC case). That''s the > distinction between /dev/lo0 and /dev/ipnet/lo0 (and thus the difference > between ipnet_accept() and ipnet_loaccept()). > > >> - the lo0 interface has no special behaviour for promiscuous mode, >> so that flag and the SAP one are meaningless; >> - thus there''s lots of code in that function that gets executed for >> seemingly no reason. >> > > ipnet_accept() implements the semantics of _all_ ipnet observability > nodes including /dev/ipnet/lo0. ipnet_loaccept() only implements the > semantics of /dev/lo0 (which are deliberately different as described in > the PSARC case). If you feel that ipnet_accept() does not perform well > enough for the special case of /dev/ipnet/lo0 in a shared stack zone, > then the answer isn''t to break its semantic by using ipnet_loaccept(), > but rather to address its performance issues (if indeed there are any). > > >> The goal of using BPF in a local zone with a shared stack is to >> see all of the traffic in that zone which is associated with the >> lo0 "interface". The ipnet_loaccept function provides a filter >> that does exactly that - except for the filter on the hook type. >> I suppose i could add an ipnet_bpf_loaccept but it would be >> a carbon copy of ipnet_loaccept with that check removed. >> >> I think what I''m saying is that BPF on lo0 in a zone has the >> same semantics as /dev/lo0 and not /dev/ipnet/lo0. >> > > I don''t understand. I thought that there were different "levels" of > observability for BPF, where if one enabled DLT_IPNET, one would get the > ipnet semantics (/dev/ipnet/* for any device). I would then assume that > for "lo0", if one does not use DLT_IPNET, then one gets the /dev/lo0 > semantics, but if one enables DLT_IPNET, one gets /dev/ipnet/lo0 > semantics. >Ah, I see what you''re driving at. So I suppose the task then is to implement both DLT_NULL and DLT_IPNET for lo0 as opposed to just DLT_IPNET. I hadn''t considered that... Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090812/cfaa7e6f/attachment.html>
Sebastien Roy
2009-Aug-13 12:05 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Wed, 2009-08-12 at 21:40 -0700, Darren Reed wrote:> Ah, I see what you''re driving at. > > So I suppose the task then is to implement both DLT_NULL and > DLT_IPNET for lo0 as opposed to just DLT_IPNET. > > I hadn''t considered that...Or DLT_RAW for /dev/lo0 (I think this implies that packets begin with an IP header). /dev/lo0 is simply there for compatibility with OSs that have a similar device, and those OSs certainly don''t use the ipnet header on /dev/lo0. Perhaps that would be acceptable. The BPF functionality that switches the device used based on DLT value has its drawbacks, and one of them is clear here. For one, selecting the "ipnet" lo0 (/dev/ipnet/lo0) with DLT_IPNET also implies that the packets received will have an ipnet header. Using DLT_RAW (or DLT_NULL) to select the /dev/lo0 semantics also implies that there isn''t an ipnet header, which is unfortunate. Perhaps another way to handle this would be to have a Solaris "loopback" DLT type that both implies /dev/lo0 _and_ ipnet headers, but I''m not sure that''s really necessary given the nature of /dev/lo0. Do others have alternate suggestions? -Seb
Darren Reed
2009-Aug-13 15:46 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
Sebastien Roy wrote:> On Wed, 2009-08-12 at 21:40 -0700, Darren Reed wrote: >> Ah, I see what you''re driving at. >> >> So I suppose the task then is to implement both DLT_NULL and >> DLT_IPNET for lo0 as opposed to just DLT_IPNET. >> >> I hadn''t considered that... > > Or DLT_RAW for /dev/lo0 (I think this implies that packets begin with an > IP header). /dev/lo0 is simply there for compatibility with OSs that > have a similar device, and those OSs certainly don''t use the ipnet > header on /dev/lo0. Perhaps that would be acceptable. > > The BPF functionality that switches the device used based on DLT value > has its drawbacks, and one of them is clear here. For one, selecting > the "ipnet" lo0 (/dev/ipnet/lo0) with DLT_IPNET also implies that the > packets received will have an ipnet header. Using DLT_RAW (or DLT_NULL) > to select the /dev/lo0 semantics also implies that there isn''t an ipnet > header, which is unfortunate.Well, it is up to us to decide what the behaviour should be. My take on this is that there is no point in us having another DLT for /dev/lo0. The reason that other OS''s use DLT_NULL is that the only useful information that needs to be prepended is the address family of the packet. Maybe a different way to look at this is if OpenSolaris did not have zones or IPNET then the choice would be whether or not to provide a DLT_NULL packet type for packets captured from lo0 (assuming that the choice was made to prsent lo0 packets through BPF as well as normal packets from below IP.) I think that there is a reasonably good chance that this decision would be made. Thus if zones and later ipnet was delivered on top of that, we would end up with DLT_NULL and DLT_IPNET for ipnet on lo0.> Perhaps another way to handle this would be to have a Solaris "loopback" > DLT type that both implies /dev/lo0 _and_ ipnet headers, but I''m not > sure that''s really necessary given the nature of /dev/lo0.Right. Darren
Eric Cheng
2009-Aug-14 03:44 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Mon, Jul 27, 2009 at 07:01:58PM -0700, Darren Reed wrote:> http://cr.opensolaris.org/~darrenr/onnv-pcapture/ >Hi darren, Here are my comments for the mac and sockmod_pfp components: dls_link.c: 930: dls_link_getzid() seems to be unused. mac_client.c: 3253-3254: I think you can''t assume that the mp_copy is still valid after invoking mpip->mpi_fn(). mp_copy could already be freed. to fix this, you''ll probably need to move the mpi_no_copy logic to mac_promisc_dispatch() and mac_promisc_client_dispatch(). the b_next should be saved before calling mac_promisc_dispatch_one(). also, if mpi_no_copy is set, the mp_chain has to be freed right after mac_promisc_dispatch()/mac_promisc_client_dispatch() because you can''t let the same mp_chain come up two different paths. bpf.c: 1710-1720: if I read this correctly, this is passing the mac name to mac_client_open() which will result in the mac name being copied to bif_ifname. I think you want the real link name right? to get the real link name, the name arg to mac_client_open() should be NULL and flags should include MAC_OPEN_FLAGS_USE_DATALINK_NAME. also, in several places in this file you have something like: if (bpf_debug) cmn_err(CE_WARN, "..."); it may be better to put this into a macro. bpf_mac.c: 102: do you need to return the correct size for non-ether macs? the bpr_hdr_length entry point also seems to be unused. sockmod_pfp.c: 1211: you should consider adding a pfp_close() function symmetrical to pfp_open_index() for closing mh/mch. 1223: similar problem as bpf.c. you should pass a NULL name and MAC_OPEN_FLAGS_USE_DATALINK_NAME flag to mac_client_open(), then use mac_client_name() to get the link name and pass it to zone_access_datalink(). 716,719: if you did the pfp_open_index() at 660, these conditions must be true. you could set a boolean at 660 and use it instead of checking mh/mch. something like: if (new_open) { ASSERT(mch != ps->ps_mch); ASSERT(mh != ps->ps_mh); pfp_close(mch, mh); } 722: mp could be uninitialized if you jumped from 674 and this would free invalid memory. 330: you need to specify MAC_PROMISC_FLAGS_VLAN_TAG_STRIP in mac_promisc_add() otherwise the mhi_bindsap here will be incorrect for vlan packets. 356: shouldn''t you use msgdsize() instead of MBLKL? 408: mhi_saddr could possibly be NULL (for IB links) 976: why doesn''t this function use dls_mgmt_get_linkid()? 955,982: should this be copyin() instead of bcopy()? 1150: if possible, try to limit bf_len/bf_insns to some reasonable number so you don''t allocate too much kernel memory. 1180: need to return an error if bpf_validate() fails. 1160: maybe pfp_release_bpf() should be done after bpf_validate(). eric
Peter Memishian
2009-Aug-14 04:06 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
> 1710-1720:> if I read this correctly, this is passing the mac name to > mac_client_open() which will result in the mac name being copied > to bif_ifname. I think you want the real link name right? > > to get the real link name, the name arg to mac_client_open() > should be NULL and flags should include > MAC_OPEN_FLAGS_USE_DATALINK_NAME. Given that DLS also uses MAC_OPEN_FLAGS_USE_DATALINK_NAME, would we end up with multiple MAC clients that have the same name? -- meem
Darren Reed
2009-Aug-14 05:04 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 13/08/09 08:44 PM, Eric Cheng wrote:> On Mon, Jul 27, 2009 at 07:01:58PM -0700, Darren Reed wrote: > >> http://cr.opensolaris.org/~darrenr/onnv-pcapture/ >> >> > > Hi darren, > > Here are my comments for the mac and sockmod_pfp components: > > dls_link.c: > > 930: > dls_link_getzid() seems to be unused. >Discuss. In testing, zone_access_datalink() didn''t work as expected, thus it has been thrown to /dev/null and dls_link_getzid() used instead (code path now works as expected.)> mac_client.c: > > 3253-3254: > I think you can''t assume that the mp_copy is still valid after > invoking mpip->mpi_fn(). mp_copy could already be freed. >Discuss. The basis of "mp_copy == mp" is the mpi_no_copy being true. Users of the interface that do not request the mblk to be copied (by using MAC_PROMISC_FLAGS_NO_COPY) are not entitled to modify the mblk_t that is passed in. That includes prohibition from free''ing them. Or is there some other code path that will free the mblk_t whilst mpi_fn() is being called?> to fix this, you''ll probably need to move the mpi_no_copy logic > to mac_promisc_dispatch() and mac_promisc_client_dispatch(). the > b_next should be saved before calling mac_promisc_dispatch_one(). > > also, if mpi_no_copy is set, the mp_chain has to be freed > right after mac_promisc_dispatch()/mac_promisc_client_dispatch() > because you can''t let the same mp_chain come up two different paths. > > > bpf.c: > > 1710-1720: > if I read this correctly, this is passing the mac name to > mac_client_open() which will result in the mac name being copied > to bif_ifname. I think you want the real link name right? >Yes.> to get the real link name, the name arg to mac_client_open() > should be NULL and flags should include > MAC_OPEN_FLAGS_USE_DATALINK_NAME. >Accept.> also, in several places in this file you have something like: > if (bpf_debug) > cmn_err(CE_WARN, "..."); > > it may be better to put this into a macro. >Reject. Putting debugging things in a macro is not friendly to having multiple arguments to the "printf" function without ugliness like BPF_DEBUG1(), BPF_DEBUG2(), etc. At least as far as I know, there are no stdargs macro variations.> bpf_mac.c: > > 102: > do you need to return the correct size for non-ether macs? > the bpr_hdr_length entry point also seems to be unused. >Accept. (It will be removed.)> sockmod_pfp.c: > > 1211: > you should consider adding a pfp_close() function symmetrical > to pfp_open_index() for closing mh/mch. >Reject. There would only be one use of pfp_close(), whilst there are multiple places that need pfp_open_index()> 1223: > similar problem as bpf.c. > you should pass a NULL name and MAC_OPEN_FLAGS_USE_DATALINK_NAME > flag to mac_client_open(), then use mac_client_name() to get > the link name and pass it to zone_access_datalink(). >Accept.> 716,719: > if you did the pfp_open_index() at 660, these conditions must > be true. you could set a boolean at 660 and use it instead > of checking mh/mch. something like: > > if (new_open) { > ASSERT(mch != ps->ps_mch); > ASSERT(mh != ps->ps_mh); > pfp_close(mch, mh); > } >Discuss. So you only wanted pfp_close() to do the mac_client_close and the mac_close? That seems hardly worth it... but the new_open change is worthwhile.> 722: > mp could be uninitialized if you jumped from 674 and this would > free invalid memory. >Accept. (I''m surprised neither lint nor the compiler found this, well done!)> 330: > you need to specify MAC_PROMISC_FLAGS_VLAN_TAG_STRIP in > mac_promisc_add() otherwise the mhi_bindsap here will be > incorrect for vlan packets. >Discuss. Does this comment apply to all uses of mac_promisc_add()? I couldn''t find any that are at line 330...I think you mean 1330?> 356: > shouldn''t you use msgdsize() instead of MBLKL? >Reject. No. The code wants to know if the entire packet is in one mblk_t or not so that it can optimise later functions - for example, if it is, then it is safe to use bcopy() as the "copy function" for packet data and if it is not, then you need to use a function that can walk mblk_t''s.> 408: > mhi_saddr could possibly be NULL (for IB links) >Accept.> 976: > why doesn''t this function use dls_mgmt_get_linkid()? >Accept.> 955,982: > should this be copyin() instead of bcopy()? >Accept.> 1150: > if possible, try to limit bf_len/bf_insns to some reasonable > number so you don''t allocate too much kernel memory. >Reject. This is enforced by bpf_validate.> 1180: > need to return an error if bpf_validate() fails. >Accept.> 1160: > maybe pfp_release_bpf() should be done after bpf_validate(). >Accept. And in addition, there needs to be a lock used to guard against pfp_release_bpf() being called whilst bpf_filter() is executing using that buffer. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090813/0baea59e/attachment.html>
Peter Memishian
2009-Aug-14 05:20 UTC
[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review
> Or DLT_RAW for /dev/lo0 (I think this implies that packets begin with an> IP header). /dev/lo0 is simply there for compatibility with OSs that > have a similar device, and those OSs certainly don''t use the ipnet > header on /dev/lo0. Perhaps that would be acceptable. > > The BPF functionality that switches the device used based on DLT value > has its drawbacks, and one of them is clear here. For one, selecting > the "ipnet" lo0 (/dev/ipnet/lo0) with DLT_IPNET also implies that the > packets received will have an ipnet header. Using DLT_RAW (or DLT_NULL) > to select the /dev/lo0 semantics also implies that there isn''t an ipnet > header, which is unfortunate. > > Perhaps another way to handle this would be to have a Solaris "loopback" > DLT type that both implies /dev/lo0 _and_ ipnet headers, but I''m not > sure that''s really necessary given the nature of /dev/lo0. > > Do others have alternate suggestions? While I agree that /dev/lo0 provides semantic compatibility with other systems, it seems a shame to lose the ipnet header since that renders it of limited use as a way to monitor zone loopback activity -- and monitoring zones was of course one of our core objectives. -- meem
Eric Cheng
2009-Aug-14 07:01 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Thu, Aug 13, 2009 at 09:06:16PM -0700, Peter Memishian wrote:> > 1710-1720: > > if I read this correctly, this is passing the mac name to > > mac_client_open() which will result in the mac name being copied > > to bif_ifname. I think you want the real link name right? > > > > to get the real link name, the name arg to mac_client_open() > > should be NULL and flags should include > > MAC_OPEN_FLAGS_USE_DATALINK_NAME. > > Given that DLS also uses MAC_OPEN_FLAGS_USE_DATALINK_NAME, would we end up > with multiple MAC clients that have the same name? >I checked the code. looks like its safe to allow this. mci_name need not be unique. we identify the primary mac client (the one used by DLS) using the flag MAC_CLIENT_FLAGS_PRIMARY, not mci_name. pf_packet doesn''t use this flag. when someone does a rename, these pf_packet mac clients will get renamed as well (by mac_rename_primary()). also, pf_packet mac clients do not create kstats (since they do not call mac_unicast_add()) so there won''t be kstat collisions. eric
Peter Memishian
2009-Aug-14 07:19 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
> I checked the code. looks like its safe to allow this. mci_name need not> be unique. we identify the primary mac client (the one used by DLS) using > the flag MAC_CLIENT_FLAGS_PRIMARY, not mci_name. So what is the intended purpose of mci_name? Clearly it is not to uniquely identify an object. -- meem
Eric Cheng
2009-Aug-14 07:58 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Thu, Aug 13, 2009 at 10:04:01PM -0700, Darren Reed wrote:> >dls_link.c: > > > >930: > >dls_link_getzid() seems to be unused. > > > > Discuss. > In testing, zone_access_datalink() didn''t work as expected, thus it has > been thrown to /dev/null and dls_link_getzid() used instead (code path > now works as expected.) >I''m fine with keeping dls_link_getzid() if it''s needed but do you know if the zone_access_datalink() problem is related to the macname issue I mentioned in sockmod_pfp:1223?> > >mac_client.c: > > > >3253-3254: > >I think you can''t assume that the mp_copy is still valid after > >invoking mpip->mpi_fn(). mp_copy could already be freed. > > > > Discuss. > The basis of "mp_copy == mp" is the mpi_no_copy being true. > Users of the interface that do not request the mblk to be copied > (by using MAC_PROMISC_FLAGS_NO_COPY) are not entitled > to modify the mblk_t that is passed in. That includes prohibition > from free''ing them. Or is there some other code path that will > free the mblk_t whilst mpi_fn() is being called? >no, it won''t get freed. I just think it''s better to not change the expected behavior of mpi_fn() because someone passed in a certain flag. I think it''s more consistent with other mac callbacks to let it consume the packet. If you''re concerned about copying cost, you could do dupmsg() instead of copymsg() if mpi_no_copy is set.> > >sockmod_pfp.c: > > > >1211: > >you should consider adding a pfp_close() function symmetrical > >to pfp_open_index() for closing mh/mch. > > > > Reject. > There would only be one use of pfp_close(), whilst there are multiple > places that need pfp_open_index() >I don''t mind if you keep it this way. but from cscope, I see two uses of pfp_open_index(), in both these cases I see mac_client_close()/mac_close() as well. so you should have more than one use of pfp_close().> > >716,719: > >if you did the pfp_open_index() at 660, these conditions must > >be true. you could set a boolean at 660 and use it instead > >of checking mh/mch. something like: > > > >if (new_open) { > > ASSERT(mch != ps->ps_mch); > > ASSERT(mh != ps->ps_mh); > > pfp_close(mch, mh); > >} > > > > Discuss. > So you only wanted pfp_close() to do the mac_client_close and the > mac_close? That seems hardly worth it... but the new_open change > is worthwhile. >I found two cases but as I said I don''t mind either way.> >330: > >you need to specify MAC_PROMISC_FLAGS_VLAN_TAG_STRIP in > >mac_promisc_add() otherwise the mhi_bindsap here will be > >incorrect for vlan packets. > > > > Discuss. > Does this comment apply to all uses of mac_promisc_add()? > I couldn''t find any that are at line 330...I think you mean 1330? >I was talking about this: if ((ps->ps_proto != 0) && (ps->ps_proto != hdr.mhi_bindsap)) { } e.g.: if someone binds to the ip sap, all inbound packets with a vlan header will be dropped. yes, this affects all calls to mac_promisc_add(). an alternative is to not strip the vlan tag, and skip to the real sap yourself, like dls_link_header_info(). I''m not sure what''s the correct semantics for pf_packet. it maybe better to pass up a full raw header. eric
Eric Cheng
2009-Aug-14 08:06 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Fri, Aug 14, 2009 at 12:19:30AM -0700, Peter Memishian wrote:> > > I checked the code. looks like its safe to allow this. mci_name need not > > be unique. we identify the primary mac client (the one used by DLS) using > > the flag MAC_CLIENT_FLAGS_PRIMARY, not mci_name. > > So what is the intended purpose of mci_name? Clearly it is not to uniquely > identify an object. >right now there is a kstat and flow name that''s tied with this. all of these get renamed together. as long as the mac client doesn''t call mac_unicast_add(), the kstat won''t be created. the flow name doesn''t matter since this is an internal, not user-defined, flow. eric
Peter Memishian
2009-Aug-14 08:29 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
> > So what is the intended purpose of mci_name? Clearly it is not to uniquely> > identify an object. > > right now there is a kstat and flow name that''s tied with this. all of > these get renamed together. as long as the mac client doesn''t call > mac_unicast_add(), the kstat won''t be created. the flow name doesn''t > matter since this is an internal, not user-defined, flow. This seems pretty fragile -- e.g., it is not at all obvious that introducing a mac_unicast_add() call can cause kstat problems because of previously-overlapping mac client names. Also, if the flow name doesn''t matter, why are we assigning a name at all? It also seems odd that this name is serving double-duty, as it''s alternately either provided by the consumer of the client API or set by mac_client_open() to something that may be of interest to the consumer (such as the associated datalink). One alternative would be to have the name always identify the consumer (mci_client_user or whatever) and have mci_name always identify the associated datalink, and internally use the concatenation of mci_client_user and mci_name name the various related objects like kstats and flows. -- meem
Eric Cheng
2009-Aug-14 09:08 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Fri, Aug 14, 2009 at 01:29:53AM -0700, Peter Memishian wrote:> > > > So what is the intended purpose of mci_name? Clearly it is not to uniquely > > > identify an object. > > > > right now there is a kstat and flow name that''s tied with this. all of > > these get renamed together. as long as the mac client doesn''t call > > mac_unicast_add(), the kstat won''t be created. the flow name doesn''t > > matter since this is an internal, not user-defined, flow. > > This seems pretty fragile -- e.g., it is not at all obvious that > introducing a mac_unicast_add() call can cause kstat problems because > of previously-overlapping mac client names. Also, if the flow name > doesn''t matter, why are we assigning a name at all? >the flow name is there because it will be used for creating the kstat in case mac_unicast_add() gets called. yes, this sounds a bit convoluted. (the code is: mac_client_open()->mac_flow_create() creates an empty flow at some point later mac_unicast_add()->mac_client_datapath_setup()->... mac_flow_create() is called again to create the kstat)> It also seems odd that this name is serving double-duty, as it''s > alternately either provided by the consumer of the client API or set by > mac_client_open() to something that may be of interest to the consumer > (such as the associated datalink). One alternative would be to have the > name always identify the consumer (mci_client_user or whatever) and have > mci_name always identify the associated datalink, and internally use the > concatenation of mci_client_user and mci_name name the various related > objects like kstats and flows. >sounds like this could work. eric
Darren Reed
2009-Aug-14 17:58 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 14/08/09 12:58 AM, Eric Cheng wrote:> On Thu, Aug 13, 2009 at 10:04:01PM -0700, Darren Reed wrote: >>> dls_link.c: >>> >>> 930: >>> dls_link_getzid() seems to be unused. >>> >> Discuss. >> In testing, zone_access_datalink() didn''t work as expected, thus it has >> been thrown to /dev/null and dls_link_getzid() used instead (code path >> now works as expected.) >> > > I''m fine with keeping dls_link_getzid() if it''s needed but do you know > if the zone_access_datalink() problem is related to the macname issue > I mentioned in sockmod_pfp:1223? > >>> mac_client.c: >>> >>> 3253-3254: >>> I think you can''t assume that the mp_copy is still valid after >>> invoking mpip->mpi_fn(). mp_copy could already be freed. >>> >> Discuss. >> The basis of "mp_copy == mp" is the mpi_no_copy being true. >> Users of the interface that do not request the mblk to be copied >> (by using MAC_PROMISC_FLAGS_NO_COPY) are not entitled >> to modify the mblk_t that is passed in. That includes prohibition >> from free''ing them. Or is there some other code path that will >> free the mblk_t whilst mpi_fn() is being called? >> > > no, it won''t get freed. > I just think it''s better to not change the expected behavior of > mpi_fn() because someone passed in a certain flag. I think it''s > more consistent with other mac callbacks to let it consume the > packet. > > If you''re concerned about copying cost, you could do dupmsg() instead > of copymsg() if mpi_no_copy is set.No. The entire point of the flag is to avoid any duplication, be it dupmsg or copymsg. They''re simply not required with BPF and do nothing except making things run slower. When using BPF to capture packets, there should be no need to allocate any new data structures in the process of capturing the packet and the only time packet data is copied is when it matches the filter being applied to the network traffic.>>> sockmod_pfp.c: >>> >>> 1211: >>> you should consider adding a pfp_close() function symmetrical >>> to pfp_open_index() for closing mh/mch. >>> >> Reject. >> There would only be one use of pfp_close(), whilst there are multiple >> places that need pfp_open_index() >> > > I don''t mind if you keep it this way. > but from cscope, I see two uses of pfp_open_index(), in both these > cases I see mac_client_close()/mac_close() as well. so you should > have more than one use of pfp_close(). > >>> 716,719: >>> if you did the pfp_open_index() at 660, these conditions must >>> be true. you could set a boolean at 660 and use it instead >>> of checking mh/mch. something like: >>> >>> if (new_open) { >>> ASSERT(mch != ps->ps_mch); >>> ASSERT(mh != ps->ps_mh); >>> pfp_close(mch, mh); >>> } >>> >> Discuss. >> So you only wanted pfp_close() to do the mac_client_close and the >> mac_close? That seems hardly worth it... but the new_open change >> is worthwhile. >> > > I found two cases but as I said I don''t mind either way.Accept. So with the above change to use "new_open", it does become worthwhile to have pfp_close().>>> 330: >>> you need to specify MAC_PROMISC_FLAGS_VLAN_TAG_STRIP in >>> mac_promisc_add() otherwise the mhi_bindsap here will be >>> incorrect for vlan packets. >>> >> Discuss. >> Does this comment apply to all uses of mac_promisc_add()? >> I couldn''t find any that are at line 330...I think you mean 1330? >> > > I was talking about this: > > if ((ps->ps_proto != 0) && (ps->ps_proto != hdr.mhi_bindsap)) { > > } > > e.g.: > if someone binds to the ip sap, all inbound packets with a > vlan header will be dropped. > > yes, this affects all calls to mac_promisc_add(). > > an alternative is to not strip the vlan tag, and skip to the real > sap yourself, like dls_link_header_info(). > > I''m not sure what''s the correct semantics for pf_packet. it maybe > better to pass up a full raw header.Hmmm! What would be nice is a way to tell mac_header_info() to optionally iterate through layer 2 headers until the last one is found and extract the "real" SAP into mhi_bindsap. Looking at dls_link_header_info(), I can recall looking at it. In its present form, it isn''t useful because it requires a dls_link_t *. But changing it to use mac_handle_t is quite easy (see attached) and that would make it possible to use that function rather than writing another version of it. But changing the first parameter of dls_link_header_info() implies that the function name and location should also change...which is getting outside of the scope of what this project is about. The functional requirement is to provide the raw ethernet packet. So I think what I''ll do is: 1) change the code to mimic dls_link_header_info() 2) leave the code as is (without the MAC_PROMISC_FLAGS_VLAN_TAG_STRIP flag) 3) file a bug that mentions dls_link_header_info() can work with a mac_handle_t instead of a dls_link_t * and when the mac/dls cleanup gets into gear, it can take care of both pieces of code then. See CR#6872007 Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090814/ab833238/attachment.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dlp_to_mh.patch.txt URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090814/ab833238/attachment.txt>
Peter Memishian
2009-Aug-14 18:29 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
> (the code is:> mac_client_open()->mac_flow_create() creates an empty flow > at some point later mac_unicast_add()->mac_client_datapath_setup()->... > mac_flow_create() is called again to create the kstat) > > > It also seems odd that this name is serving double-duty, as it''s > > alternately either provided by the consumer of the client API or set by > > mac_client_open() to something that may be of interest to the consumer > > (such as the associated datalink). One alternative would be to have the > > name always identify the consumer (mci_client_user or whatever) and have > > mci_name always identify the associated datalink, and internally use the > > concatenation of mci_client_user and mci_name name the various related > > objects like kstats and flows. > > > > sounds like this could work. Shall I file a CR? -- meem
Nicolas Droux
2009-Aug-14 18:46 UTC
[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review
Eric Cheng wrote:> On Fri, Aug 14, 2009 at 01:29:53AM -0700, Peter Memishian wrote: >> > > So what is the intended purpose of mci_name? Clearly it is not to uniquely >> > > identify an object. >> > >> > right now there is a kstat and flow name that''s tied with this. all of >> > these get renamed together. as long as the mac client doesn''t call >> > mac_unicast_add(), the kstat won''t be created. the flow name doesn''t >> > matter since this is an internal, not user-defined, flow. >> >> This seems pretty fragile -- e.g., it is not at all obvious that >> introducing a mac_unicast_add() call can cause kstat problems because >> of previously-overlapping mac client names. Also, if the flow name >> doesn''t matter, why are we assigning a name at all? >> > > the flow name is there because it will be used for creating the kstat > in case mac_unicast_add() gets called. yes, this sounds a bit convoluted. > > (the code is: > mac_client_open()->mac_flow_create() creates an empty flow > at some point later mac_unicast_add()->mac_client_datapath_setup()->... > mac_flow_create() is called again to create the kstat)This may no longer be the case in the near future. BPF can send packets, and these packets should be accounted by dlstat(1M) when it displays per MAC-client statistics. So we will have to maintain statistics which are derived from the MAC client name, which could conflict with the dls client if simply using MAC_OPEN_FLAGS_USE_DATALINK_NAME.>> It also seems odd that this name is serving double-duty, as it''s >> alternately either provided by the consumer of the client API or set by >> mac_client_open() to something that may be of interest to the consumer >> (such as the associated datalink). One alternative would be to have the >> name always identify the consumer (mci_client_user or whatever) and have >> mci_name always identify the associated datalink, and internally use the >> concatenation of mci_client_user and mci_name name the various related >> objects like kstats and flows. >> > > sounds like this could work.+1 Nicolas.
Eric Cheng
2009-Aug-14 20:23 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Fri, Aug 14, 2009 at 11:29:50AM -0700, Peter Memishian wrote:> > > (the code is: > > mac_client_open()->mac_flow_create() creates an empty flow > > at some point later mac_unicast_add()->mac_client_datapath_setup()->... > > mac_flow_create() is called again to create the kstat) > > > > > It also seems odd that this name is serving double-duty, as it''s > > > alternately either provided by the consumer of the client API or set by > > > mac_client_open() to something that may be of interest to the consumer > > > (such as the associated datalink). One alternative would be to have the > > > name always identify the consumer (mci_client_user or whatever) and have > > > mci_name always identify the associated datalink, and internally use the > > > concatenation of mci_client_user and mci_name name the various related > > > objects like kstats and flows. > > > > > > > sounds like this could work. > > Shall I file a CR? >yes, thanks! eric
Eric Cheng
2009-Aug-14 21:10 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Fri, Aug 14, 2009 at 10:58:04AM -0700, Darren Reed wrote:> >no, it won''t get freed. > >I just think it''s better to not change the expected behavior of > >mpi_fn() because someone passed in a certain flag. I think it''s > >more consistent with other mac callbacks to let it consume the > >packet. > > > >If you''re concerned about copying cost, you could do dupmsg() instead > >of copymsg() if mpi_no_copy is set. > > No. The entire point of the flag is to avoid any duplication, > be it dupmsg or copymsg. They''re simply not required with BPF > and do nothing except making things run slower. When using BPF > to capture packets, there should be no need to allocate any > new data structures in the process of capturing the packet > and the only time packet data is copied is when it matches the > filter being applied to the network traffic. >I understand that with this flag the mpi_fn() has read-only semantics. my worry is that at some point some code (maybe not bpf) may try to queue the mp instead of completely processing it in the context of mpi_fn(). this is definitely not allowed because mp could be freed from the real datapath. the danger with callbacks is you can''t tell how deep they run and you can''t guarantee they obey certain retrictions throughout. do you know what''s the cost of dupmsg() versus not doing it? if it''s not significant, I think it''s worthwhile keeping callback semantics consistent in the mac.> Hmmm! > What would be nice is a way to tell mac_header_info() to optionally > iterate through layer 2 headers until the last one is found and > extract the "real" SAP into mhi_bindsap. Looking at > dls_link_header_info(), I can recall looking at it. In its present > form, it isn''t useful because it requires a dls_link_t *. But > changing it to use mac_handle_t is quite easy (see attached) and > that would make it possible to use that function rather than writing > another version of it. But changing the first parameter of > dls_link_header_info() implies that the function name and location > should also change...which is getting outside of the scope of what > this project is about. The functional requirement is to provide the > raw ethernet packet. > > So I think what I''ll do is: > 1) change the code to mimic dls_link_header_info() > 2) leave the code as is (without the MAC_PROMISC_FLAGS_VLAN_TAG_STRIP > flag) > 3) file a bug that mentions dls_link_header_info() can work with > a mac_handle_t instead of a dls_link_t * and when the mac/dls > cleanup gets into gear, it can take care of both pieces of > code then. See CR#6872007 >sounds fine to me. eric
Darren Reed
2009-Aug-14 21:53 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On 14/08/09 02:10 PM, Eric Cheng wrote:> On Fri, Aug 14, 2009 at 10:58:04AM -0700, Darren Reed wrote: > >>> no, it won''t get freed. >>> I just think it''s better to not change the expected behavior of >>> mpi_fn() because someone passed in a certain flag. I think it''s >>> more consistent with other mac callbacks to let it consume the >>> packet. >>> >>> If you''re concerned about copying cost, you could do dupmsg() instead >>> of copymsg() if mpi_no_copy is set. >>> >> No. The entire point of the flag is to avoid any duplication, >> be it dupmsg or copymsg. They''re simply not required with BPF >> and do nothing except making things run slower. When using BPF >> to capture packets, there should be no need to allocate any >> new data structures in the process of capturing the packet >> and the only time packet data is copied is when it matches the >> filter being applied to the network traffic. >> >> > > I understand that with this flag the mpi_fn() has read-only semantics. > my worry is that at some point some code (maybe not bpf) may try > to queue the mp instead of completely processing it in the context > of mpi_fn(). this is definitely not allowed because mp could be freed > from the real datapath. the danger with callbacks is you can''t tell > how deep they run and you can''t guarantee they obey certain retrictions > throughout. >If the mac API''s were a committed set of API''s to use, then the thing to do would be to document the behaviour required by the flag so that it would be clear when it was appropriate to use it. The way I read your comment, you are arguing for the code to be slower because someone might write buggy code that uses this flag. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090814/98177c38/attachment.html>
Eric Cheng
2009-Aug-14 22:08 UTC
[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review
On Fri, Aug 14, 2009 at 02:53:03PM -0700, Darren Reed wrote:> > > >I understand that with this flag the mpi_fn() has read-only semantics. > >my worry is that at some point some code (maybe not bpf) may try > >to queue the mp instead of completely processing it in the context > >of mpi_fn(). this is definitely not allowed because mp could be freed > >from the real datapath. the danger with callbacks is you can''t tell > >how deep they run and you can''t guarantee they obey certain retrictions > >throughout. > > > > If the mac API''s were a committed set of API''s to use, then the thing > to do would be to document the behaviour required by the flag so > that it would be clear when it was appropriate to use it. > > The way I read your comment, you are arguing for the code to be > slower because someone might write buggy code that uses this flag. >no, that''s not it. my opinion is in certain cases consistency is more important than performance, especially framework level code. what''s being introduced is inconsistent not only with the mac, but with many other callback related code throughout the ip stack. that''s my main concern. anyway if you really want to keep it I''m fine. if you have some ballpark numbers about the performance cost I''d like to have them. eric
Peter Memishian
2009-Aug-14 23:37 UTC
[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review
> >> It also seems odd that this name is serving double-duty, as it''s> >> alternately either provided by the consumer of the client API or set by > >> mac_client_open() to something that may be of interest to the consumer > >> (such as the associated datalink). One alternative would be to have the > >> name always identify the consumer (mci_client_user or whatever) and have > >> mci_name always identify the associated datalink, and internally use the > >> concatenation of mci_client_user and mci_name name the various related > >> objects like kstats and flows. > >> > > > > sounds like this could work. > > +1 Filed 6872170. -- meem