thr3ads.net - crossbow discuss - [crossbow-discuss] BPF/PF

If this information is useful, please help other people find it:
Share via:

Darren Reed

2009-Jul-28 02:01 UTC

[crossbow-discuss] BPF/PF_PACKET Code review

The primary development phase of the BPF/PF_PACKET project is
now completed and I''d like to ask people for feedback and to review
the code as it currently stands. The sockmod_pfp.c file has already
been reviewed by Anders (thanks!) and changes according to his
comments integrated.

For reference, the PSARC case information relating to this work can
be found here:
http://arc.opensolaris.org/caselog/PSARC/2009/232/commitment.materials/

There are two gates that will require putbacks in the immediate future,
ON and SFW.

The timeout for review of these changes is the 12th of August, 2009.

The changes required for onnv are here and include:
- adaption of BPF from NetBSD to Solaris
- delivery of PF_PACKET socket module
- linking together dls, ipnet, mac and bpf
- modification of ipnet to support delivery of packets via libpcap

http://cr.opensolaris.org/~darrenr/onnv-pcapture/

A small number of changes are required to convert libpcap shipped
with SFW to function with BPF as supplied with Solaris. The most
important being allowing more than one DLT to be reported with
the DLT list function. The change to using BPF for libpcap removes
support for using DLPI with libpcap. The changes can be found here:

http://cr.opensolaris.org/~darrenr/sfw-pcap/

Cheers,
Darren

Sebastien Roy

2009-Aug-04 20:17 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

Darren,

On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
> 
Do you have a cscope database for this code-review repo somewhere to
help with browsing the code?

-Seb

Darren Reed

2009-Aug-04 20:47 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On  4/08/09 01:17 PM, Sebastien Roy wrote:> Darren,
>
> On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:
>   
>> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
>>
>>     
>
> Do you have a cscope database for this code-review repo somewhere to
> help with browsing the code?
>   
Is there a way to upload a cut-down version of that to opensolaris
or does this remain a SWAN thing only?

/net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/

btw, in reviewing the code myself, I had some questions about if
I should do some renaming in ipnet.c...

For example, there is "ipnetif_refhold/ipnetif_refrele" but then
there is "ipnet_create_if" and "ipnet_free_if". Also, in
some places
"ipnetif" is the "local" variable for an ipnetif_t and in
others it is
"ifp".

Whilst seemingly minor, it doesn''t feel like it is all the product
of a single design? During working on that file, I got annoyed
enough with the different code styles used for local variables
that I converted them all to be one because otherwise it was not
clear to which style new code should be added! And in review,
I''ve come up against the same issue with adding new functions
but it has bothered me less.

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090804/23f9f07c/attachment.html>

Sebastien Roy

2009-Aug-04 20:53 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Tue, 2009-08-04 at 13:47 -0700, Darren Reed wrote:> On  4/08/09 01:17 PM, Sebastien Roy wrote: 
> > Darren,
> > 
> > On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:
> >   
> > > http://cr.opensolaris.org/~darrenr/onnv-pcapture/
> > > 
> > >     
> > 
> > Do you have a cscope database for this code-review repo somewhere to
> > help with browsing the code?
> >   
> 
> Is there a way to upload a cut-down version of that to opensolaris
> or does this remain a SWAN thing only?
cscope needs to co-exist with an actual repo on a filesystem.  I suppose
you could post cscope databases on opensolaris in a repo and require
people to pull that repo to a local filesystem to use it...
> 
> /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/
> 
Thanks.  Could you run xref under usr/src/uts/ as well please?
> btw, in reviewing the code myself, I had some questions about if
> I should do some renaming in ipnet.c...
> 
> For example, there is "ipnetif_refhold/ipnetif_refrele" but then
> there is "ipnet_create_if" and "ipnet_free_if". Also,
in some places
> "ipnetif" is the "local" variable for an ipnetif_t and
in others it is
> "ifp".
Feel free to fix those inconsistencies.
> 
> Whilst seemingly minor, it doesn''t feel like it is all the product
> of a single design?
It''s not a design-level issue, but rather an implementation issue. 
It''s
a product of having multiple people working on the file over time, and
perhaps not synchronizing totally on nomenclature.
> During working on that file, I got annoyed
> enough with the different code styles used for local variables
> that I converted them all to be one because otherwise it was not
> clear to which style new code should be added!
The style is, and should be (for multiple variables):

	<type>	<name>;
	<type>	<name>;

Where each name is aligned at a common tab-stop.  I don''t see many
inconsistencies here, I''m not sure what you''re referring to. 
I''d
refrain from making sweeping changes to the code style in the file, as
that detracts from reviewing your actual substantive changes.
> And in review,
> I''ve come up against the same issue with adding new functions
> but it has bothered me less.
Okay,
-Seb

Darren Reed

2009-Aug-04 21:09 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On  4/08/09 01:53 PM, Sebastien Roy wrote:> On Tue, 2009-08-04 at 13:47 -0700, Darren Reed wrote:
>   
>> On  4/08/09 01:17 PM, Sebastien Roy wrote: 
>>     
>>> Darren,
>>>
>>> On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:
>>>   
>>>       
>>>> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
>>>>
>>>>     
>>>>         
>>> Do you have a cscope database for this code-review repo somewhere
to
>>> help with browsing the code?
>>>   
>>>       
>> Is there a way to upload a cut-down version of that to opensolaris
>> or does this remain a SWAN thing only?
>>     
>
> cscope needs to co-exist with an actual repo on a filesystem.  I suppose
> you could post cscope databases on opensolaris in a repo and require
> people to pull that repo to a local filesystem to use it...
>
>   
>> /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/
>>
>>     
>
> Thanks.  Could you run xref under usr/src/uts/ as well please?
>   
I''ve previously only built cscope''s for usr/src with
"make cscope.out".
It looks like it should work for uts, but it fails for me?
>> btw, in reviewing the code myself, I had some questions about if
>> I should do some renaming in ipnet.c...
>>
>> For example, there is "ipnetif_refhold/ipnetif_refrele" but
then
>> there is "ipnet_create_if" and "ipnet_free_if".
Also, in some places
>> "ipnetif" is the "local" variable for an ipnetif_t
and in others it is
>> "ifp".
>>     
>
> Feel free to fix those inconsistencies.
>   
ok.

>> Whilst seemingly minor, it doesn''t feel like it is all the
product
>> of a single design?
>>     
>
> It''s not a design-level issue, but rather an implementation issue.
It''s
> a product of having multiple people working on the file over time, and
> perhaps not synchronizing totally on nomenclature.
>   
Yeah, I figured as much. C''est la vie.

>> During working on that file, I got annoyed
>> enough with the different code styles used for local variables
>> that I converted them all to be one because otherwise it was not
>> clear to which style new code should be added!
>>     
>
> The style is, and should be (for multiple variables):
>
> 	<type>	<name>;
> 	<type>	<name>;
>
> Where each name is aligned at a common tab-stop.  I don''t see many
> inconsistencies here, I''m not sure what you''re referring
to.  I''d
> refrain from making sweeping changes to the code style in the file, as
> that detracts from reviewing your actual substantive changes.
>   
There weren''t many ... just enough in the wrong places ...
but if that is the style, then that''s what I''ll use.

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090804/11af043f/attachment.html>

Sebastien Roy

2009-Aug-04 21:13 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Tue, 2009-08-04 at 14:09 -0700, Darren Reed wrote:> On  4/08/09 01:53 PM, Sebastien Roy wrote: 
> >   
> > > /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/
> > > 
> > >     
> > 
> > Thanks.  Could you run xref under usr/src/uts/ as well please?
> >   
> 
> I''ve previously only built cscope''s for usr/src with
"make
> cscope.out".
> It looks like it should work for uts, but it fails for me?
I usually just type "xref", and that works fine.
> > > During working on that file, I got annoyed
> > > enough with the different code styles used for local variables
> > > that I converted them all to be one because otherwise it was not
> > > clear to which style new code should be added!
> > >     
> > 
> > The style is, and should be (for multiple variables):
> > 
> > 	<type>	<name>;
> > 	<type>	<name>;
> > 
> > Where each name is aligned at a common tab-stop.  I don''t see
many
> > inconsistencies here, I''m not sure what you''re
referring to.  I''d
> > refrain from making sweeping changes to the code style in the file, as
> > that detracts from reviewing your actual substantive changes.
> >   
> 
> There weren''t many ... just enough in the wrong places ...
> but if that is the style, then that''s what I''ll use.
Great, thanks.

-Seb

Sebastien Roy

2009-Aug-04 21:29 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Tue, 2009-08-04 at 14:29 -0700, Darren Reed wrote:> It helped having $SRC set.
> 
> Fixed.
Thank you.
-Seb

Darren Reed

2009-Aug-04 21:29 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On  4/08/09 02:13 PM, Sebastien Roy wrote:> On Tue, 2009-08-04 at 14:09 -0700, Darren Reed wrote:
>   
>> On  4/08/09 01:53 PM, Sebastien Roy wrote: 
>>     
>>>   
>>>       
>>>> /net/nvtbld-x.sfbay/exportz/avalon/on_bpf_pfpacket_review/
>>>>
>>>>     
>>>>         
>>> Thanks.  Could you run xref under usr/src/uts/ as well please?
>>>   
>>>       
>> I''ve previously only built cscope''s for usr/src with
"make
>> cscope.out".
>> It looks like it should work for uts, but it fails for me?
>>     
>
> I usually just type "xref", and that works fine.
>   
It helped having $SRC set.

Fixed.

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090804/a739519d/attachment.html>

Sebastien Roy

2009-Aug-12 13:37 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

Darren,

On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
>
I reviewed the changes to ipnet and ip:

General: There are quite a bit of changes related to the style of
local variable declaration that was changed, thus adding to the volume
of the review. In general, it would be preferable to restrict the set
of changes to the functionality being added, and to cstyle and lint
issues found while making these changes. Otherwise, sticking with the
existing style of the file without making sweeping changes to style
based on personal preference should be the modus operandi. I won''t
comment on each instance, I''ve restricted my specific comments to
things not related to local variable declarations. :-)

uts/common/inet/ipnet.h

* 58: This may need to be atomic (atomic_inc_64())

* 58: Wrap _y in parens

uts/common/inet/ipnet/ipnet.c

* 216: s/instnace/instance

* 539: Please restore ipnet_sap code as discussed on
clearview-discuss.

* 572: This is a spurious change, please restore the code as it was.
ips is derived from ns->netstack_ipnet at 532. Why go through ips
to get at ns again when ns can ge used directly?

* 787: As we discussed on clearview-discuss, the ipnet SAP space was fine.

* 1168: What''s the purpose of this addition? This function only gets
called for ipnet_t streams that are on the /dev/lo0 device.

* 1327-1328,2300-2304: This is not correct. The IPNETIF_LOOPBACK code
breaks the semantics defined in PSARC 2006/475 as we talked about on
clearview-discuss.

* 1959: ipobs_bounce_func() isn''t obvious as far as names go. How
about ipobs_hook_cb()? The convention of _cb() for a callback is
already used elsewhere in the file.

* 1959: What is the reason for this callback having to duplicate every
packet passed to it? Why wouldn''t the code that dispatches packets
to callbacks always simply pass a duplicate? An observability hook
should always have those semantics (it should never manipulate the
original or block while processing it), so hard-coding such
semantics for all NH_OBSERVE hooks would make sense to me.

* 2203-2232: datalink_id_t is not the same as IP interface index.
They don''t represent the same objects, and they don''t exist
at the
same layer. if_index is an IP interface index, and a datalink_id_t
is a linkid as handled in the GLDv3 framework. This code has
nothing to do with GLDv3, and is strictly contained at the IP-layer,
so it shouldn''t be handling datalink_id_t.

-Seb

Peter Memishian

2009-Aug-12 18:29 UTC

head link

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

> General: There are quite a bit of changes related to the style of > local variable declaration that was changed, thus adding to the volume
 > of the review.  In general, it would be preferable to restrict the set
 > of changes to the functionality being added, and to cstyle and lint
 > issues found while making these changes.  Otherwise, sticking with the
 > existing style of the file without making sweeping changes to style
 > based on personal preference should be the modus operandi.  I
won''t
 > comment on each instance, I''ve restricted my specific comments to
 > things not related to local variable declarations. :-)

I''d go a step further and demand that all changes that are related to
separating initializers from declarations be reverted.  For instance,
for no apparent reason, this:

 537 static int
 538 ipnet_close(queue_t *rq)
 539 {
 540         ipnet_t         *ipnet = rq->q_ptr;
 541         ipnet_stack_t   *ips = ipnet->ipnet_ns->netstack_ipnet;

... was expanded to:

 588 static int
 589 ipnet_close(queue_t *rq)
 590 {
 591         ipnet_stack_t *ips;
 592         ipnet_t *ipnet;
 593 
 594         ipnet = rq->q_ptr;
 595         ips = ipnet->ipnet_ns->netstack_ipnet;

The original code was correct and wholly unobjectionable -- and the latter
form suggests to the reader that there is something subtle going on that
required it, which is not true.

--
meem

Sebastien Roy

2009-Aug-12 18:43 UTC

head link

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Wed, 2009-08-12 at 11:29 -0700, Peter Memishian
wrote:> > General: There are quite a bit of changes related to the style of
>  > local variable declaration that was changed, thus adding to the
volume
>  > of the review.  In general, it would be preferable to restrict the
set
>  > of changes to the functionality being added, and to cstyle and lint
>  > issues found while making these changes.  Otherwise, sticking with
the
>  > existing style of the file without making sweeping changes to style
>  > based on personal preference should be the modus operandi.  I
won''t
>  > comment on each instance, I''ve restricted my specific
comments to
>  > things not related to local variable declarations. :-)
> 
> I''d go a step further and demand that all changes that are related
to
> separating initializers from declarations be reverted.
Yes, I would also include such modifications in the category of style
changes that I wish to be reverted.

-Seb

Darren Reed

2009-Aug-12 20:54 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On 12/08/09 06:37 AM, Sebastien Roy wrote:> Darren,
>
> On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:
>> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
>>
>
> I reviewed the changes to ipnet and ip:
>
> General: There are quite a bit of changes related to the style of
> local variable declaration that was changed, thus adding to the volume
> of the review.  In general, it would be preferable to restrict the set
> of changes to the functionality being added, and to cstyle and lint
> issues found while making these changes.  Otherwise, sticking with the
> existing style of the file without making sweeping changes to style
> based on personal preference should be the modus operandi.  I
won''t
> comment on each instance, I''ve restricted my specific comments to
> things not related to local variable declarations. :-)

As we discussed, i want there to be *1* style for the file.
I have no interest in which style that is. I just don''t like
working on source code with mixed styles because it isn''t
clear which one should be used going forward...

Is it to be:
<type><tab><variable>
or
<type><n tabs><variables>
.. where all the variables are aligned and an arbitrary number
of tabs are inserted.

The difference is (what I''ve currently done):

struct ip        ip;
int     foo;

one tab between type and name, vs:

struct ip        ip
int              foo;

where short type names have extra tabs.

At present, compare _init()''s variables (space between type and
variable name) with those in ipnet_if_init().
_init()''s use the latter of the above - multiple tabs to bring
variables out to the same position, which is different to what
I interpreted your earlier suggestion (single tab) as being.
>
> uts/common/inet/ipnet.h
>
> * 58: This may need to be atomic (atomic_inc_64())
Discuss.
In a similar vein to many of the counters in IP not being
atomic increments, so to is this. There''s a real cost for
using atomic_() ops and it seems that unless there is an
absolute need for it to be 100% reliable (reference count,
etc), the "danger" in just using "++" for general statistics
seems to be accepted. Compare with MIB_BUMP() and
other _BUMP() macros in $SRC/uts/common/inet/*.h.
> * 58: Wrap _y in parens
Reject.
Does not work - code will not compile like that,
I wish it did. Not good, I know, but that''s life.
> uts/common/inet/ipnet/ipnet.c
>
> * 216: s/instnace/instance
Accept.
> * 539: Please restore ipnet_sap code as discussed on
>   clearview-discuss.
Accept.
> * 572: This is a spurious change, please restore the code as it was.
>   ips is derived from ns->netstack_ipnet at 532.  Why go through ips
>   to get at ns again when ns can ge used directly?
Accept.
>
> * 787: As we discussed on clearview-discuss, the ipnet SAP space was fine.
Accept.
>
> * 1168: What''s the purpose of this addition?  This function only
gets
>   called for ipnet_t streams that are on the /dev/lo0 device.
Discuss.
ipnet_loaccept() is used twice: once in ipnet_open() and once in
ipnet_promisc_add().
>
> * 1327-1328,2300-2304: This is not correct.  The IPNETIF_LOOPBACK code
>   breaks the semantics defined in PSARC 2006/475 as we talked about on
>   clearview-discuss.
Accept.
IPNETIF_LOOPBACK is now only used by the BPF code path.
> * 1959: ipobs_bounce_func() isn''t obvious as far as names go.  How
>   about ipobs_hook_cb()?  The convention of _cb() for a callback is
>   already used elsewhere in the file.
Accept.
> * 1959: What is the reason for this callback having to duplicate every
>  packet passed to it?  Why wouldn''t the code that dispatches
packets
>  to callbacks always simply pass a duplicate?  An observability hook
>  should always have those semantics (it should never manipulate the
>  original or block while processing it), so hard-coding such
>  semantics for all NH_OBSERVE hooks would make sense to me.
Discuss.
The duplication here has been moved from ip.c`ipobs_hook() to here.
Whereas ipobs_hook() was previously just walking a list of functions
to call and doing the duplication there, as required, now the list
walking is separated (pfhooks) from the packet duplication. Thus the
dispatching comes out of pfhooks and jumps through ipobs_bounce_func_cb()
to do the duplication for the real recipient (func). But, if your
comments above are correct, then dupmsg() should be removed and only
copymsg() used, otherwise the observe is presented with the real
packet data (in dblk_t''s off of its own mblk_t.) There is a performance
cost with this, though: it enforces a deep copy of all observed packets,
copying all of the dblk_t''s, not just a shallow one (creating new
mblk_t''s.)
The current implementation only does a deep copy if the shallow one
fails.

> * 2203-2232: datalink_id_t is not the same as IP interface index.
>   They don''t represent the same objects, and they don''t
exist at the
>   same layer.  if_index is an IP interface index, and a datalink_id_t
>   is a linkid as handled in the GLDv3 framework.  This code has
>   nothing to do with GLDv3, and is strictly contained at the IP-layer,
>   so it shouldn''t be handling datalink_id_t.
Discuss.
The index used is the index from IP, but seeing as there is no
ipindex_t, I used datalink_id_t to put the index in. It would
seem that using datalink_id_t as a type here is cause for confusion
so it might be better to just use uint_t.

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090812/e0adc720/attachment.html>

Sebastien Roy

2009-Aug-12 21:14 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Wed, 2009-08-12 at 13:54 -0700, Darren Reed wrote:> On 12/08/09 06:37 AM, Sebastien Roy wrote: 
> > Darren,
> > 
> > On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:
> > > http://cr.opensolaris.org/~darrenr/onnv-pcapture/
> > > 
> > 
> > I reviewed the changes to ipnet and ip:
> > 
> > General: There are quite a bit of changes related to the style of
> > local variable declaration that was changed, thus adding to the volume
> > of the review.  In general, it would be preferable to restrict the set
> > of changes to the functionality being added, and to cstyle and lint
> > issues found while making these changes.  Otherwise, sticking with the
> > existing style of the file without making sweeping changes to style
> > based on personal preference should be the modus operandi.  I
won''t
> > comment on each instance, I''ve restricted my specific
comments to
> > things not related to local variable declarations. :-)
> 
> 
> As we discussed, i want there to be *1* style for the file.
> I have no interest in which style that is. I just don''t like
> working on source code with mixed styles because it isn''t
> clear which one should be used going forward...
> 
> Is it to be:
> <type><tab><variable>
> or
> <type><n tabs><variables> 
> .. where all the variables are aligned and an arbitrary number
> of tabs are inserted.
> 
> The difference is (what I''ve currently done):
> 
> struct ip        ip;
> int     foo;
> 
> one tab between type and name, vs:
> 
> struct ip        ip
> int              foo;
> 
> where short type names have extra tabs.
> 
> At present, compare _init()''s variables (space between type and
> variable name) with those in ipnet_if_init().
> _init()''s use the latter of the above - multiple tabs to bring
> variables out to the same position, which is different to what
> I interpreted your earlier suggestion (single tab) as being.
All variable names should be aligned at a tab boundary, and initialized
in the declaration where possible.  That''s the style used throughout
the
file.  There may be a couple of exceptions (i.e. mistakes), but that''s
not reason enough to change the entire file to use an entirely new
style.  If there''s a function that isn''t using this style,
then let''s
fix it rather than fix every other function.  e.g.

	netstack_t	*ns = netstack_find_by_zoneid(ifp->if_zoneid);
	ip_stack_t	*ipst = ns->netstack_ip;
	ipnet_t		*ipnet;
	char		name[32];
	int		error = 0;
> 
> > 
> > uts/common/inet/ipnet.h
> > 
> > * 58: This may need to be atomic (atomic_inc_64())
> 
> Discuss.
> In a similar vein to many of the counters in IP not being
> atomic increments, so to is this. There''s a real cost for
> using atomic_() ops and it seems that unless there is an
> absolute need for it to be 100% reliable (reference count,
> etc), the "danger" in just using "++" for general
statistics
> seems to be accepted. Compare with MIB_BUMP() and
> other _BUMP() macros in $SRC/uts/common/inet/*.h.
I would think that the cost of the atomic_inc_*() APIs are quite minimal
(and they''re relatively widely used in IP), but I can''t say
that I care
very much either way.  Having an accurate error count would seem like an
important thing, and performance when recording an error would seem like
a secondary concern (and again, I doubt that atomic operations would
affect performance of ipnet).
> > 
> > * 1168: What''s the purpose of this addition?  This function
only gets
> >   called for ipnet_t streams that are on the /dev/lo0 device.
> 
> Discuss.
> ipnet_loaccept() is used twice: once in ipnet_open() and once in
> ipnet_promisc_add().
I don''t follow.  ipnet_acceptfn should only be set to ipnet_loaccept()
if /dev/lo0 is opened, and in no other case.  Therefore the check you
added in the function itself: "if (getminor(ipnet->ipnet_if->if_dev)
=IPNET_MINOR_LO)" is redundant.  If ipnet_loaccept() is getting called
when it shouldn''t be, then something else is wrong.
> > 
> > * 1327-1328,2300-2304: This is not correct.  The IPNETIF_LOOPBACK code
> >   breaks the semantics defined in PSARC 2006/475 as we talked about on
> >   clearview-discuss.
> 
> Accept.
> IPNETIF_LOOPBACK is now only used by the BPF code path.
In what case, and what does it mean?  Remember that /dev/lo0
and /dev/ipnet/lo0 have different semantics, and that ipnet_loaccept()
is only strictly implementing the semantics of /dev/lo0.
> > * 1959: What is the reason for this callback having to duplicate every
> >  packet passed to it?  Why wouldn''t the code that dispatches
packets
> >  to callbacks always simply pass a duplicate?  An observability hook
> >  should always have those semantics (it should never manipulate the
> >  original or block while processing it), so hard-coding such
> >  semantics for all NH_OBSERVE hooks would make sense to me.
> 
> Discuss.
> The duplication here has been moved from ip.c`ipobs_hook() to here.
> Whereas ipobs_hook() was previously just walking a list of functions
> to call and doing the duplication there, as required, now the list
> walking is separated (pfhooks) from the packet duplication. Thus the
> dispatching comes out of pfhooks and jumps through ipobs_bounce_func_cb()
> to do the duplication for the real recipient (func).
Ah, I see.
> But, if your
> comments above are correct, then dupmsg() should be removed and only
> copymsg() used, otherwise the observe is presented with the real
> packet data (in dblk_t''s off of its own mblk_t.) There is a
performance
> cost with this, though: it enforces a deep copy of all observed packets,
> copying all of the dblk_t''s, not just a shallow one (creating new
mblk_t''s.)
> The current implementation only does a deep copy if the shallow one
> fails.
Okay.
> > * 2203-2232: datalink_id_t is not the same as IP interface index.
> >   They don''t represent the same objects, and they
don''t exist at the
> >   same layer.  if_index is an IP interface index, and a datalink_id_t
> >   is a linkid as handled in the GLDv3 framework.  This code has
> >   nothing to do with GLDv3, and is strictly contained at the IP-layer,
> >   so it shouldn''t be handling datalink_id_t.
> 
> Discuss.
> The index used is the index from IP, but seeing as there is no
> ipindex_t, I used datalink_id_t to put the index in. It would
> seem that using datalink_id_t as a type here is cause for confusion
> so it might be better to just use uint_t.
Yep, I think so.

-Seb

Darren Reed

2009-Aug-12 21:58 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On 12/08/09 02:14 PM, Sebastien Roy wrote:> On Wed, 2009-08-12 at 13:54 -0700, Darren Reed wrote:
>   
>> On 12/08/09 06:37 AM, Sebastien Roy wrote: 
>>     
>>> Darren,
>>>
>>> On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:
>>>       
>>>> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
>>>>
>>>>         
>>> I reviewed the changes to ipnet and ip:
>>>
>>> General: There are quite a bit of changes related to the style of
>>> local variable declaration that was changed, thus adding to the
volume
>>> of the review.  In general, it would be preferable to restrict the
set
>>> of changes to the functionality being added, and to cstyle and lint
>>> issues found while making these changes.  Otherwise, sticking with
the
>>> existing style of the file without making sweeping changes to style
>>> based on personal preference should be the modus operandi.  I
won''t
>>> comment on each instance, I''ve restricted my specific
comments to
>>> things not related to local variable declarations. :-)
>>>       
>> As we discussed, i want there to be *1* style for the file.
>> I have no interest in which style that is. I just don''t like
>> working on source code with mixed styles because it isn''t
>> clear which one should be used going forward...
>>
>> Is it to be:
>> <type><tab><variable>
>> or
>> <type><n tabs><variables> 
>> .. where all the variables are aligned and an arbitrary number
>> of tabs are inserted.
>>
>> The difference is (what I''ve currently done):
>>
>> struct ip        ip;
>> int     foo;
>>
>> one tab between type and name, vs:
>>
>> struct ip        ip
>> int              foo;
>>
>> where short type names have extra tabs.
>>
>> At present, compare _init()''s variables (space between type
and
>> variable name) with those in ipnet_if_init().
>> _init()''s use the latter of the above - multiple tabs to bring
>> variables out to the same position, which is different to what
>> I interpreted your earlier suggestion (single tab) as being.
>>     
>
> All variable names should be aligned at a tab boundary, and initialized
> in the declaration where possible.  That''s the style used
throughout the
> file.  There may be a couple of exceptions (i.e. mistakes), but
that''s
> not reason enough to change the entire file to use an entirely new
> style.  If there''s a function that isn''t using this
style, then let''s
> fix it rather than fix every other function.  e.g.
>
> 	netstack_t	*ns = netstack_find_by_zoneid(ifp->if_zoneid);
> 	ip_stack_t	*ipst = ns->netstack_ip;
> 	ipnet_t		*ipnet;
> 	char		name[32];
> 	int		error = 0;
>   
Not a problem.

>>> uts/common/inet/ipnet.h
>>>
>>> * 58: This may need to be atomic (atomic_inc_64())
>>>       
>> Discuss.
>> In a similar vein to many of the counters in IP not being
>> atomic increments, so to is this. There''s a real cost for
>> using atomic_() ops and it seems that unless there is an
>> absolute need for it to be 100% reliable (reference count,
>> etc), the "danger" in just using "++" for general
statistics
>> seems to be accepted. Compare with MIB_BUMP() and
>> other _BUMP() macros in $SRC/uts/common/inet/*.h.
>>     
>
> I would think that the cost of the atomic_inc_*() APIs are quite minimal
> (and they''re relatively widely used in IP),
No, they''re not as inexpensive as we might think, especially on multi
CPU systems. There was an internal talk on the cost of mutex''s and
such earlier in the year: a much smaller number of them can be done
than we might expect (the other point made from that talk was to
avoid locking whenever possible because of their impact on the bus.)


>>> * 1168: What''s the purpose of this addition?  This
function only gets
>>>   called for ipnet_t streams that are on the /dev/lo0 device.
>>>       
>> Discuss.
>> ipnet_loaccept() is used twice: once in ipnet_open() and once in
>> ipnet_promisc_add().
>>     
>
> I don''t follow.  ipnet_acceptfn should only be set to
ipnet_loaccept()
> if /dev/lo0 is opened, and in no other case.  Therefore the check you
> added in the function itself: "if
(getminor(ipnet->ipnet_if->if_dev) => IPNET_MINOR_LO)" is
redundant.  If ipnet_loaccept() is getting called
> when it shouldn''t be, then something else is wrong.
>   
So lets examine ipnet_accept vs ipnet_loaccept for the purpose of
lo0 packets going through ipnet, into BPF, in a local zone with a
shared stack.

If we set ipnet_acceptfn to ipnet_accept for lo0 packets that are
going to end up in BPF:
- the IP address of the packet isn''t important, so what address type
  it is does not matter, thus calls to ipnet_get_addrtype() are not
  required;
- the lo0 interface has no special behaviour for promiscuous mode,
  so that flag and the SAP one are meaningless;
- thus there''s lots of code in that function that gets executed for
  seemingly no reason.

The goal of using BPF in a local zone with a shared stack is to
see all of the traffic in that zone which is associated with the
lo0 "interface". The ipnet_loaccept function provides a filter
that does exactly that - except for the filter on the hook type.
I suppose i could add an ipnet_bpf_loaccept but it would be
a carbon copy of ipnet_loaccept with that check removed.

I think what I''m saying is that BPF on lo0 in a zone has the
same semantics as /dev/lo0 and not /dev/ipnet/lo0.

>>> * 1959: What is the reason for this callback having to duplicate
every
>>>  packet passed to it?  Why wouldn''t the code that
dispatches packets
>>>  to callbacks always simply pass a duplicate?  An observability
hook
>>>  should always have those semantics (it should never manipulate the
>>>  original or block while processing it), so hard-coding such
>>>  semantics for all NH_OBSERVE hooks would make sense to me.
>>>       
>> Discuss.
>> The duplication here has been moved from ip.c`ipobs_hook() to here.
>> Whereas ipobs_hook() was previously just walking a list of functions
>> to call and doing the duplication there, as required, now the list
>> walking is separated (pfhooks) from the packet duplication. Thus the
>> dispatching comes out of pfhooks and jumps through
ipobs_bounce_func_cb()
>> to do the duplication for the real recipient (func).
>>     
>
> Ah, I see.
>
>   
>> But, if your
>> comments above are correct, then dupmsg() should be removed and only
>> copymsg() used, otherwise the observe is presented with the real
>> packet data (in dblk_t''s off of its own mblk_t.) There is a
performance
>> cost with this, though: it enforces a deep copy of all observed
packets,
>> copying all of the dblk_t''s, not just a shallow one (creating
new mblk_t''s.)
>> The current implementation only does a deep copy if the shallow one
>> fails.
>>     
>
> Okay.
>   
Just to confirm, it''s ok to leave the code to do both dupmsg() and
copymsg()
for performance reasons.

>>> * 2203-2232: datalink_id_t is not the same as IP interface index.
>>>   They don''t represent the same objects, and they
don''t exist at the
>>>   same layer.  if_index is an IP interface index, and a
datalink_id_t
>>>   is a linkid as handled in the GLDv3 framework.  This code has
>>>   nothing to do with GLDv3, and is strictly contained at the
IP-layer,
>>>   so it shouldn''t be handling datalink_id_t.
>>>       
>> Discuss.
>> The index used is the index from IP, but seeing as there is no
>> ipindex_t, I used datalink_id_t to put the index in. It would
>> seem that using datalink_id_t as a type here is cause for confusion
>> so it might be better to just use uint_t.
>>     
>
> Yep, I think so.
>   
Ok, I''ll fix that.

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090812/dacb2c87/attachment.html>

Sebastien Roy

2009-Aug-13 01:20 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Wed, 2009-08-12 at 14:58 -0700, Darren Reed wrote:> > > > * 1168: What''s the purpose of this addition?  This
function only gets
> > > >   called for ipnet_t streams that are on the /dev/lo0
device.
> > > >       
> > > Discuss.
> > > ipnet_loaccept() is used twice: once in ipnet_open() and once in
> > > ipnet_promisc_add().
> > >     
> > 
> > I don''t follow.  ipnet_acceptfn should only be set to
ipnet_loaccept()
> > if /dev/lo0 is opened, and in no other case.  Therefore the check you
> > added in the function itself: "if
(getminor(ipnet->ipnet_if->if_dev) => > IPNET_MINOR_LO)" is
redundant.  If ipnet_loaccept() is getting called
> > when it shouldn''t be, then something else is wrong.
> >   
> 
> So lets examine ipnet_accept vs ipnet_loaccept for the purpose of
> lo0 packets going through ipnet, into BPF, in a local zone with a
> shared stack.
> 
> If we set ipnet_acceptfn to ipnet_accept for lo0 packets that are
> going to end up in BPF:
> - the IP address of the packet isn''t important, so what address
type
>   it is does not matter, thus calls to ipnet_get_addrtype() are not
>   required;
The address is important.  A consumer of an ipnet device receives
packets if they contain an IP address assigned to that interface (this
is described in the IP Observability PSARC case).  That''s the
distinction between /dev/lo0 and /dev/ipnet/lo0 (and thus the difference
between ipnet_accept() and ipnet_loaccept()).
> - the lo0 interface has no special behaviour for promiscuous mode,
>   so that flag and the SAP one are meaningless;
> - thus there''s lots of code in that function that gets executed
for
>   seemingly no reason.
ipnet_accept() implements the semantics of _all_ ipnet observability
nodes including /dev/ipnet/lo0.  ipnet_loaccept() only implements the
semantics of /dev/lo0 (which are deliberately different as described in
the PSARC case).  If you feel that ipnet_accept() does not perform well
enough for the special case of /dev/ipnet/lo0 in a shared stack zone,
then the answer isn''t to break its semantic by using ipnet_loaccept(),
but rather to address its performance issues (if indeed there are any).
> The goal of using BPF in a local zone with a shared stack is to
> see all of the traffic in that zone which is associated with the
> lo0 "interface". The ipnet_loaccept function provides a filter
> that does exactly that - except for the filter on the hook type.
> I suppose i could add an ipnet_bpf_loaccept but it would be
> a carbon copy of ipnet_loaccept with that check removed.
> 
> I think what I''m saying is that BPF on lo0 in a zone has the
> same semantics as /dev/lo0 and not /dev/ipnet/lo0.
I don''t understand.  I thought that there were different
"levels" of
observability for BPF, where if one enabled DLT_IPNET, one would get the
ipnet semantics (/dev/ipnet/* for any device).  I would then assume that
for "lo0", if one does not use DLT_IPNET, then one gets the /dev/lo0
semantics, but if one enables DLT_IPNET, one gets /dev/ipnet/lo0
semantics.

In any case, we''re currently talking about the setting of
ipnet_loaccept
as ipnet_acceptfn, which definitely needs to be worked out for BPF, but
the comment I made was slightly different.  The code change you made in
ipnet_loaccept() itself seems to be tangential to that, and I still
don''t know why you made that change.  Returning B_FALSE if /dev/lo0
wasn''t the device opened still seems redundant regardless.
> > > > * 1959: What is the reason for this callback having to
duplicate every
> > > >  packet passed to it?  Why wouldn''t the code that
dispatches packets
> > > >  to callbacks always simply pass a duplicate?  An
observability hook
> > > >  should always have those semantics (it should never
manipulate the
> > > >  original or block while processing it), so hard-coding such
> > > >  semantics for all NH_OBSERVE hooks would make sense to me.
> > > >       
> > > Discuss.
> > > The duplication here has been moved from ip.c`ipobs_hook() to
here.
> > > Whereas ipobs_hook() was previously just walking a list of
functions
> > > to call and doing the duplication there, as required, now the
list
> > > walking is separated (pfhooks) from the packet duplication. Thus
the
> > > dispatching comes out of pfhooks and jumps through
ipobs_bounce_func_cb()
> > > to do the duplication for the real recipient (func).
> > >     
> > 
> > Ah, I see.
> > 
> >   
> > > But, if your
> > > comments above are correct, then dupmsg() should be removed and
only
> > > copymsg() used, otherwise the observe is presented with the real
> > > packet data (in dblk_t''s off of its own mblk_t.) There
is a performance
> > > cost with this, though: it enforces a deep copy of all observed
packets,
> > > copying all of the dblk_t''s, not just a shallow one
(creating new mblk_t''s.)
> > > The current implementation only does a deep copy if the shallow
one
> > > fails.
> > >     
> > 
> > Okay.
> >   
> 
> Just to confirm, it''s ok to leave the code to do both dupmsg() and
> copymsg()
> for performance reasons.
Sure, that''s fine.

-Seb

Darren Reed

2009-Aug-13 04:40 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On 12/08/09 06:20 PM, Sebastien Roy wrote:> On Wed, 2009-08-12 at 14:58 -0700, Darren Reed wrote:
>   
>>>>> * 1168: What''s the purpose of this addition?  This
function only gets
>>>>>   called for ipnet_t streams that are on the /dev/lo0
device.
>>>>>       
>>>>>           
>>>> Discuss.
>>>> ipnet_loaccept() is used twice: once in ipnet_open() and once
in
>>>> ipnet_promisc_add().
>>>>     
>>>>         
>>> I don''t follow.  ipnet_acceptfn should only be set to
ipnet_loaccept()
>>> if /dev/lo0 is opened, and in no other case.  Therefore the check
you
>>> added in the function itself: "if
(getminor(ipnet->ipnet_if->if_dev) =>>> IPNET_MINOR_LO)" is
redundant.  If ipnet_loaccept() is getting called
>>> when it shouldn''t be, then something else is wrong.
>>>   
>>>       
>> So lets examine ipnet_accept vs ipnet_loaccept for the purpose of
>> lo0 packets going through ipnet, into BPF, in a local zone with a
>> shared stack.
>>
>> If we set ipnet_acceptfn to ipnet_accept for lo0 packets that are
>> going to end up in BPF:
>> - the IP address of the packet isn''t important, so what
address type
>>   it is does not matter, thus calls to ipnet_get_addrtype() are not
>>   required;
>>     
>
> The address is important.  A consumer of an ipnet device receives
> packets if they contain an IP address assigned to that interface (this
> is described in the IP Observability PSARC case).  That''s the
> distinction between /dev/lo0 and /dev/ipnet/lo0 (and thus the difference
> between ipnet_accept() and ipnet_loaccept()).
>
>   
>> - the lo0 interface has no special behaviour for promiscuous mode,
>>   so that flag and the SAP one are meaningless;
>> - thus there''s lots of code in that function that gets
executed for
>>   seemingly no reason.
>>     
>
> ipnet_accept() implements the semantics of _all_ ipnet observability
> nodes including /dev/ipnet/lo0.  ipnet_loaccept() only implements the
> semantics of /dev/lo0 (which are deliberately different as described in
> the PSARC case).  If you feel that ipnet_accept() does not perform well
> enough for the special case of /dev/ipnet/lo0 in a shared stack zone,
> then the answer isn''t to break its semantic by using
ipnet_loaccept(),
> but rather to address its performance issues (if indeed there are any).
>
>   
>> The goal of using BPF in a local zone with a shared stack is to
>> see all of the traffic in that zone which is associated with the
>> lo0 "interface". The ipnet_loaccept function provides a
filter
>> that does exactly that - except for the filter on the hook type.
>> I suppose i could add an ipnet_bpf_loaccept but it would be
>> a carbon copy of ipnet_loaccept with that check removed.
>>
>> I think what I''m saying is that BPF on lo0 in a zone has the
>> same semantics as /dev/lo0 and not /dev/ipnet/lo0.
>>     
>
> I don''t understand.  I thought that there were different
"levels" of
> observability for BPF, where if one enabled DLT_IPNET, one would get the
> ipnet semantics (/dev/ipnet/* for any device).  I would then assume that
> for "lo0", if one does not use DLT_IPNET, then one gets the
/dev/lo0
> semantics, but if one enables DLT_IPNET, one gets /dev/ipnet/lo0
> semantics.
>   
Ah, I see what you''re driving at.

So I suppose the task then is to implement both DLT_NULL and
DLT_IPNET for lo0 as opposed to just DLT_IPNET.

I hadn''t considered that...

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090812/cfaa7e6f/attachment.html>

Sebastien Roy

2009-Aug-13 12:05 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Wed, 2009-08-12 at 21:40 -0700, Darren Reed wrote:> Ah, I see what you''re driving at.
> 
> So I suppose the task then is to implement both DLT_NULL and
> DLT_IPNET for lo0 as opposed to just DLT_IPNET.
> 
> I hadn''t considered that...
Or DLT_RAW for /dev/lo0 (I think this implies that packets begin with an
IP header).  /dev/lo0 is simply there for compatibility with OSs that
have a similar device, and those OSs certainly don''t use the ipnet
header on /dev/lo0.  Perhaps that would be acceptable.

The BPF functionality that switches the device used based on DLT value
has its drawbacks, and one of them is clear here.  For one, selecting
the "ipnet" lo0 (/dev/ipnet/lo0) with DLT_IPNET also implies that the
packets received will have an ipnet header.  Using DLT_RAW (or DLT_NULL)
to select the /dev/lo0 semantics also implies that there isn''t an ipnet
header, which is unfortunate.

Perhaps another way to handle this would be to have a Solaris
"loopback"
DLT type that both implies /dev/lo0 _and_ ipnet headers, but I''m not
sure that''s really necessary given the nature of /dev/lo0.

Do others have alternate suggestions?

-Seb

Darren Reed

2009-Aug-13 15:46 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

Sebastien Roy wrote:> On Wed, 2009-08-12 at 21:40 -0700, Darren Reed wrote:
>> Ah, I see what you''re driving at.
>>
>> So I suppose the task then is to implement both DLT_NULL and
>> DLT_IPNET for lo0 as opposed to just DLT_IPNET.
>>
>> I hadn''t considered that...
>
> Or DLT_RAW for /dev/lo0 (I think this implies that packets begin with an
> IP header).  /dev/lo0 is simply there for compatibility with OSs that
> have a similar device, and those OSs certainly don''t use the ipnet
> header on /dev/lo0.  Perhaps that would be acceptable.
>
> The BPF functionality that switches the device used based on DLT value
> has its drawbacks, and one of them is clear here.  For one, selecting
> the "ipnet" lo0 (/dev/ipnet/lo0) with DLT_IPNET also implies that
the
> packets received will have an ipnet header.  Using DLT_RAW (or DLT_NULL)
> to select the /dev/lo0 semantics also implies that there isn''t an
ipnet
> header, which is unfortunate.
Well, it is up to us to decide what the behaviour should be.

My take on this is that there is no point in us having
another DLT for /dev/lo0. The reason that other OS''s use
DLT_NULL is that the only useful information that needs
to be prepended is the address family of the packet.


Maybe a different way to look at this is if OpenSolaris
did not have zones or IPNET then the choice would be
whether or not to provide a DLT_NULL packet type for
packets captured from lo0 (assuming that the choice was
made to prsent lo0 packets through BPF as well as normal
packets from below IP.)

I think that there is a reasonably good chance that this
decision would be made. Thus if zones and later ipnet was
delivered on top of that, we would end up with DLT_NULL
and DLT_IPNET for ipnet on lo0.
> Perhaps another way to handle this would be to have a Solaris
"loopback"
> DLT type that both implies /dev/lo0 _and_ ipnet headers, but I''m
not
> sure that''s really necessary given the nature of /dev/lo0.
Right.

Darren

Eric Cheng

2009-Aug-14 03:44 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Mon, Jul 27, 2009 at 07:01:58PM -0700, Darren Reed
wrote:> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
>
Hi darren,

Here are my comments for the mac and sockmod_pfp components:

dls_link.c:

930:
dls_link_getzid() seems to be unused.


mac_client.c:

3253-3254:
I think you can''t assume that the mp_copy is still valid after
invoking mpip->mpi_fn(). mp_copy could already be freed.

to fix this, you''ll probably need to move the mpi_no_copy logic
to mac_promisc_dispatch() and mac_promisc_client_dispatch(). the
b_next should be saved before calling mac_promisc_dispatch_one().

also, if mpi_no_copy is set, the mp_chain has to be freed
right after mac_promisc_dispatch()/mac_promisc_client_dispatch()
because you can''t let the same mp_chain come up two different paths.


bpf.c:

1710-1720:
if I read this correctly, this is passing the mac name to
mac_client_open() which will result in the mac name being copied
to bif_ifname. I think you want the real link name right?

to get the real link name, the name arg to mac_client_open()
should be NULL and flags should include
MAC_OPEN_FLAGS_USE_DATALINK_NAME.

also, in several places in this file you have something like:
if (bpf_debug)
	cmn_err(CE_WARN, "...");

it may be better to put this into a macro.


bpf_mac.c:

102:
do you need to return the correct size for non-ether macs?
the bpr_hdr_length entry point also seems to be unused.


sockmod_pfp.c:

1211:
you should consider adding a pfp_close() function symmetrical
to pfp_open_index() for closing mh/mch.

1223:
similar problem as bpf.c.
you should pass a NULL name and MAC_OPEN_FLAGS_USE_DATALINK_NAME
flag to mac_client_open(), then use mac_client_name() to get
the link name and pass it to zone_access_datalink().

716,719:
if you did the pfp_open_index() at 660, these conditions must
be true. you could set a boolean at 660 and use it instead
of checking mh/mch. something like:

if (new_open) {
	ASSERT(mch != ps->ps_mch);
	ASSERT(mh != ps->ps_mh);
	pfp_close(mch, mh);
}

722:
mp could be uninitialized if you jumped from 674 and this would
free invalid memory.

330:
you need to specify MAC_PROMISC_FLAGS_VLAN_TAG_STRIP in
mac_promisc_add() otherwise the mhi_bindsap here will be
incorrect for vlan packets.

356:
shouldn''t you use msgdsize() instead of MBLKL?

408:
mhi_saddr could possibly be NULL (for IB links)

976:
why doesn''t this function use dls_mgmt_get_linkid()?

955,982:
should this be copyin() instead of bcopy()?

1150:
if possible, try to limit bf_len/bf_insns to some reasonable
number so you don''t allocate too much kernel memory.

1180:
need to return an error if bpf_validate() fails.

1160:
maybe pfp_release_bpf() should be done after bpf_validate().


eric

Peter Memishian

2009-Aug-14 04:06 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

> 1710-1720: > if I read this correctly, this is passing the mac name to
 > mac_client_open() which will result in the mac name being copied
 > to bif_ifname. I think you want the real link name right?
 > 
 > to get the real link name, the name arg to mac_client_open()
 > should be NULL and flags should include
 > MAC_OPEN_FLAGS_USE_DATALINK_NAME.

Given that DLS also uses MAC_OPEN_FLAGS_USE_DATALINK_NAME, would we end up
with multiple MAC clients that have the same name?

--
meem

Darren Reed

2009-Aug-14 05:04 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On 13/08/09 08:44 PM, Eric Cheng wrote:> On Mon, Jul 27, 2009 at 07:01:58PM -0700, Darren Reed wrote:
>   
>> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
>>
>>     
>
> Hi darren,
>
> Here are my comments for the mac and sockmod_pfp components:
>
> dls_link.c:
>
> 930:
> dls_link_getzid() seems to be unused.
>   
Discuss.
In testing, zone_access_datalink() didn''t work as expected, thus it has
been thrown to /dev/null and dls_link_getzid() used instead (code path
now works as expected.)

> mac_client.c:
>
> 3253-3254:
> I think you can''t assume that the mp_copy is still valid after
> invoking mpip->mpi_fn(). mp_copy could already be freed.
>   
Discuss.
The basis of "mp_copy == mp" is the mpi_no_copy being true.
Users of the interface that do not request the mblk to be copied
(by using MAC_PROMISC_FLAGS_NO_COPY) are not entitled
to modify the mblk_t that is passed in. That includes prohibition
from free''ing them. Or is there some other code path that will
free the mblk_t whilst mpi_fn() is being called?
> to fix this, you''ll probably need to move the mpi_no_copy logic
> to mac_promisc_dispatch() and mac_promisc_client_dispatch(). the
> b_next should be saved before calling mac_promisc_dispatch_one().
>
> also, if mpi_no_copy is set, the mp_chain has to be freed
> right after mac_promisc_dispatch()/mac_promisc_client_dispatch()
> because you can''t let the same mp_chain come up two different
paths.
>
>
> bpf.c:
>
> 1710-1720:
> if I read this correctly, this is passing the mac name to
> mac_client_open() which will result in the mac name being copied
> to bif_ifname. I think you want the real link name right?
>   
Yes.
> to get the real link name, the name arg to mac_client_open()
> should be NULL and flags should include
> MAC_OPEN_FLAGS_USE_DATALINK_NAME.
>   
Accept.
> also, in several places in this file you have something like:
> if (bpf_debug)
> 	cmn_err(CE_WARN, "...");
>
> it may be better to put this into a macro.
>   
Reject.
Putting debugging things in a macro is not friendly to having multiple
arguments to the "printf" function without ugliness like BPF_DEBUG1(),
BPF_DEBUG2(), etc. At least as far as I know, there are no stdargs
macro variations.
> bpf_mac.c:
>
> 102:
> do you need to return the correct size for non-ether macs?
> the bpr_hdr_length entry point also seems to be unused.
>   
Accept.
(It will be removed.)
> sockmod_pfp.c:
>
> 1211:
> you should consider adding a pfp_close() function symmetrical
> to pfp_open_index() for closing mh/mch.
>   
Reject.
There would only be one use of pfp_close(), whilst there are multiple
places that need pfp_open_index()
> 1223:
> similar problem as bpf.c.
> you should pass a NULL name and MAC_OPEN_FLAGS_USE_DATALINK_NAME
> flag to mac_client_open(), then use mac_client_name() to get
> the link name and pass it to zone_access_datalink().
>   
Accept.
> 716,719:
> if you did the pfp_open_index() at 660, these conditions must
> be true. you could set a boolean at 660 and use it instead
> of checking mh/mch. something like:
>
> if (new_open) {
> 	ASSERT(mch != ps->ps_mch);
> 	ASSERT(mh != ps->ps_mh);
> 	pfp_close(mch, mh);
> }
>   
Discuss.
So you only wanted pfp_close() to do the mac_client_close and the
mac_close? That seems hardly worth it... but the new_open change
is worthwhile.
> 722:
> mp could be uninitialized if you jumped from 674 and this would
> free invalid memory.
>   
Accept.
(I''m surprised neither lint nor the compiler found this, well done!)
> 330:
> you need to specify MAC_PROMISC_FLAGS_VLAN_TAG_STRIP in
> mac_promisc_add() otherwise the mhi_bindsap here will be
> incorrect for vlan packets.
>   
Discuss.
Does this comment apply to all uses of mac_promisc_add()?
I couldn''t find any that are at line 330...I think you mean 1330?
> 356:
> shouldn''t you use msgdsize() instead of MBLKL?
>   
Reject.
No. The code wants to know if the entire packet is in one mblk_t or not
so that it can optimise later functions - for example, if it is, then it is
safe to use bcopy() as the "copy function" for packet data and if it
is
not, then you need to use a function that can walk mblk_t''s.
> 408:
> mhi_saddr could possibly be NULL (for IB links)
>   
Accept.
> 976:
> why doesn''t this function use dls_mgmt_get_linkid()?
>   
Accept.
> 955,982:
> should this be copyin() instead of bcopy()?
>   
Accept.
> 1150:
> if possible, try to limit bf_len/bf_insns to some reasonable
> number so you don''t allocate too much kernel memory.
>   
Reject.
This is enforced by bpf_validate.
> 1180:
> need to return an error if bpf_validate() fails.
>   
Accept.
> 1160:
> maybe pfp_release_bpf() should be done after bpf_validate().
>   
Accept.
And in addition, there needs to be a lock used to guard against
pfp_release_bpf() being called whilst bpf_filter() is executing
using that buffer.

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090813/0baea59e/attachment.html>

Peter Memishian

2009-Aug-14 05:20 UTC

head link

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

> Or DLT_RAW for /dev/lo0 (I think this implies that packets begin with an > IP header).  /dev/lo0 is simply there for compatibility with OSs that
 > have a similar device, and those OSs certainly don''t use the
ipnet
 > header on /dev/lo0.  Perhaps that would be acceptable.
 > 
 > The BPF functionality that switches the device used based on DLT value
 > has its drawbacks, and one of them is clear here.  For one, selecting
 > the "ipnet" lo0 (/dev/ipnet/lo0) with DLT_IPNET also implies
that the
 > packets received will have an ipnet header.  Using DLT_RAW (or DLT_NULL)
 > to select the /dev/lo0 semantics also implies that there isn''t an
ipnet
 > header, which is unfortunate.
 > 
 > Perhaps another way to handle this would be to have a Solaris
"loopback"
 > DLT type that both implies /dev/lo0 _and_ ipnet headers, but I''m
not
 > sure that''s really necessary given the nature of /dev/lo0.
 > 
 > Do others have alternate suggestions?

While I agree that /dev/lo0 provides semantic compatibility with other
systems, it seems a shame to lose the ipnet header since that renders it
of limited use as a way to monitor zone loopback activity -- and
monitoring zones was of course one of our core objectives.

-- 
meem

Eric Cheng

2009-Aug-14 07:01 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Thu, Aug 13, 2009 at 09:06:16PM -0700, Peter Memishian
wrote:>  > 1710-1720:
>  > if I read this correctly, this is passing the mac name to
>  > mac_client_open() which will result in the mac name being copied
>  > to bif_ifname. I think you want the real link name right?
>  > 
>  > to get the real link name, the name arg to mac_client_open()
>  > should be NULL and flags should include
>  > MAC_OPEN_FLAGS_USE_DATALINK_NAME.
> 
> Given that DLS also uses MAC_OPEN_FLAGS_USE_DATALINK_NAME, would we end up
> with multiple MAC clients that have the same name?
>
I checked the code. looks like its safe to allow this. mci_name need not
be unique. we identify the primary mac client (the one used by DLS) using
the flag MAC_CLIENT_FLAGS_PRIMARY, not mci_name. pf_packet doesn''t use
this flag. when someone does a rename, these pf_packet mac clients will get
renamed as well (by mac_rename_primary()). also, pf_packet mac clients do not
create kstats (since they do not call mac_unicast_add()) so there won''t
be
kstat collisions.

eric

Peter Memishian

2009-Aug-14 07:19 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

> I checked the code. looks like its safe to allow this. mci_name need not > be unique. we identify the primary mac client (the one used by DLS) using
 > the flag MAC_CLIENT_FLAGS_PRIMARY, not mci_name.

So what is the intended purpose of mci_name?  Clearly it is not to uniquely
identify an object.

-- 
meem

Eric Cheng

2009-Aug-14 07:58 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Thu, Aug 13, 2009 at 10:04:01PM -0700, Darren Reed
wrote:> >dls_link.c:
> >
> >930:
> >dls_link_getzid() seems to be unused.
> >  
> 
> Discuss.
> In testing, zone_access_datalink() didn''t work as expected, thus
it has
> been thrown to /dev/null and dls_link_getzid() used instead (code path
> now works as expected.)
>
I''m fine with keeping dls_link_getzid() if it''s needed but do
you know
if the zone_access_datalink() problem is related to the macname issue
I mentioned in sockmod_pfp:1223?
> 
> >mac_client.c:
> >
> >3253-3254:
> >I think you can''t assume that the mp_copy is still valid after
> >invoking mpip->mpi_fn(). mp_copy could already be freed.
> >  
> 
> Discuss.
> The basis of "mp_copy == mp" is the mpi_no_copy being true.
> Users of the interface that do not request the mblk to be copied
> (by using MAC_PROMISC_FLAGS_NO_COPY) are not entitled
> to modify the mblk_t that is passed in. That includes prohibition
> from free''ing them. Or is there some other code path that will
> free the mblk_t whilst mpi_fn() is being called?
> 
no, it won''t get freed.
I just think it''s better to not change the expected behavior of
mpi_fn() because someone passed in a certain flag. I think it''s
more consistent with other mac callbacks to let it consume the
packet.

If you''re concerned about copying cost, you could do dupmsg() instead
of copymsg() if mpi_no_copy is set.
> 
> >sockmod_pfp.c:
> >
> >1211:
> >you should consider adding a pfp_close() function symmetrical
> >to pfp_open_index() for closing mh/mch.
> >  
> 
> Reject.
> There would only be one use of pfp_close(), whilst there are multiple
> places that need pfp_open_index()
> 
I don''t mind if you keep it this way.
but from cscope, I see two uses of pfp_open_index(), in both these
cases I see mac_client_close()/mac_close() as well. so you should
have more than one use of pfp_close().
> 
> >716,719:
> >if you did the pfp_open_index() at 660, these conditions must
> >be true. you could set a boolean at 660 and use it instead
> >of checking mh/mch. something like:
> >
> >if (new_open) {
> >	ASSERT(mch != ps->ps_mch);
> >	ASSERT(mh != ps->ps_mh);
> >	pfp_close(mch, mh);
> >}
> >  
> 
> Discuss.
> So you only wanted pfp_close() to do the mac_client_close and the
> mac_close? That seems hardly worth it... but the new_open change
> is worthwhile.
>
I found two cases but as I said I don''t mind either way.

 > >330:
> >you need to specify MAC_PROMISC_FLAGS_VLAN_TAG_STRIP in
> >mac_promisc_add() otherwise the mhi_bindsap here will be
> >incorrect for vlan packets.
> >  
> 
> Discuss.
> Does this comment apply to all uses of mac_promisc_add()?
> I couldn''t find any that are at line 330...I think you mean 1330?
>
I was talking about this:

if ((ps->ps_proto != 0) && (ps->ps_proto != hdr.mhi_bindsap)) {

}

e.g.:
if someone binds to the ip sap, all inbound packets with a
vlan header will be dropped.

yes, this affects all calls to mac_promisc_add().

an alternative is to not strip the vlan tag, and skip to the real
sap yourself, like dls_link_header_info().

I''m not sure what''s the correct semantics for pf_packet. it
maybe
better to pass up a full raw header.

eric

Eric Cheng

2009-Aug-14 08:06 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Fri, Aug 14, 2009 at 12:19:30AM -0700, Peter Memishian
wrote:> 
>  > I checked the code. looks like its safe to allow this. mci_name need
not
>  > be unique. we identify the primary mac client (the one used by DLS)
using
>  > the flag MAC_CLIENT_FLAGS_PRIMARY, not mci_name.
> 
> So what is the intended purpose of mci_name?  Clearly it is not to uniquely
> identify an object.
>
right now there is a kstat and flow name that''s tied with this. all of
these get renamed together. as long as the mac client doesn''t call
mac_unicast_add(), the kstat won''t be created. the flow name
doesn''t
matter since this is an internal, not user-defined, flow.

eric

Peter Memishian

2009-Aug-14 08:29 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

> > So what is the intended purpose of mci_name?  Clearly it is not to
uniquely > > identify an object.
 > 
 > right now there is a kstat and flow name that''s tied with this.
all of
 > these get renamed together. as long as the mac client doesn''t
call
 > mac_unicast_add(), the kstat won''t be created. the flow name
doesn''t
 > matter since this is an internal, not user-defined, flow.

This seems pretty fragile -- e.g., it is not at all obvious that
introducing a mac_unicast_add() call can cause kstat problems because
of previously-overlapping mac client names.  Also, if the flow name
doesn''t matter, why are we assigning a name at all?

It also seems odd that this name is serving double-duty, as it''s
alternately either provided by the consumer of the client API or set by
mac_client_open() to something that may be of interest to the consumer
(such as the associated datalink).  One alternative would be to have the
name always identify the consumer (mci_client_user or whatever) and have
mci_name always identify the associated datalink, and internally use the
concatenation of mci_client_user and mci_name name the various related
objects like kstats and flows.

-- 
meem

Eric Cheng

2009-Aug-14 09:08 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Fri, Aug 14, 2009 at 01:29:53AM -0700, Peter Memishian
wrote:> 
>  > > So what is the intended purpose of mci_name?  Clearly it is not
to uniquely
>  > > identify an object.
>  > 
>  > right now there is a kstat and flow name that''s tied with
this. all of
>  > these get renamed together. as long as the mac client
doesn''t call
>  > mac_unicast_add(), the kstat won''t be created. the flow name
doesn''t
>  > matter since this is an internal, not user-defined, flow.
> 
> This seems pretty fragile -- e.g., it is not at all obvious that
> introducing a mac_unicast_add() call can cause kstat problems because
> of previously-overlapping mac client names.  Also, if the flow name
> doesn''t matter, why are we assigning a name at all?
>
the flow name is there because it will be used for creating the kstat
in case mac_unicast_add() gets called. yes, this sounds a bit convoluted.

(the code is:
 mac_client_open()->mac_flow_create() creates an empty flow
 at some point later mac_unicast_add()->mac_client_datapath_setup()->...
 mac_flow_create() is called again to create the kstat)
> It also seems odd that this name is serving double-duty, as it''s
> alternately either provided by the consumer of the client API or set by
> mac_client_open() to something that may be of interest to the consumer
> (such as the associated datalink).  One alternative would be to have the
> name always identify the consumer (mci_client_user or whatever) and have
> mci_name always identify the associated datalink, and internally use the
> concatenation of mci_client_user and mci_name name the various related
> objects like kstats and flows.
>
sounds like this could work.

eric

Darren Reed

2009-Aug-14 17:58 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On 14/08/09 12:58 AM, Eric Cheng wrote:> On Thu, Aug 13, 2009 at 10:04:01PM -0700, Darren Reed wrote:
>>> dls_link.c:
>>>
>>> 930:
>>> dls_link_getzid() seems to be unused.
>>>  
>> Discuss.
>> In testing, zone_access_datalink() didn''t work as expected,
thus it has
>> been thrown to /dev/null and dls_link_getzid() used instead (code path
>> now works as expected.)
>>
>
> I''m fine with keeping dls_link_getzid() if it''s needed
but do you know
> if the zone_access_datalink() problem is related to the macname issue
> I mentioned in sockmod_pfp:1223?
>
>>> mac_client.c:
>>>
>>> 3253-3254:
>>> I think you can''t assume that the mp_copy is still valid
after
>>> invoking mpip->mpi_fn(). mp_copy could already be freed.
>>>  
>> Discuss.
>> The basis of "mp_copy == mp" is the mpi_no_copy being true.
>> Users of the interface that do not request the mblk to be copied
>> (by using MAC_PROMISC_FLAGS_NO_COPY) are not entitled
>> to modify the mblk_t that is passed in. That includes prohibition
>> from free''ing them. Or is there some other code path that will
>> free the mblk_t whilst mpi_fn() is being called?
>>
>
> no, it won''t get freed.
> I just think it''s better to not change the expected behavior of
> mpi_fn() because someone passed in a certain flag. I think it''s
> more consistent with other mac callbacks to let it consume the
> packet.
>
> If you''re concerned about copying cost, you could do dupmsg()
instead
> of copymsg() if mpi_no_copy is set.
No. The entire point of the flag is to avoid any duplication,
be it dupmsg or copymsg. They''re simply not required with BPF
and do nothing except making things run slower. When using BPF
to capture packets, there should be no need to allocate any
new data structures in the process of capturing the packet
and the only time packet data is copied is when it matches the
filter being applied to the network traffic.

>>> sockmod_pfp.c:
>>>
>>> 1211:
>>> you should consider adding a pfp_close() function symmetrical
>>> to pfp_open_index() for closing mh/mch.
>>>  
>> Reject.
>> There would only be one use of pfp_close(), whilst there are multiple
>> places that need pfp_open_index()
>>
>
> I don''t mind if you keep it this way.
> but from cscope, I see two uses of pfp_open_index(), in both these
> cases I see mac_client_close()/mac_close() as well. so you should
> have more than one use of pfp_close().
>
>>> 716,719:
>>> if you did the pfp_open_index() at 660, these conditions must
>>> be true. you could set a boolean at 660 and use it instead
>>> of checking mh/mch. something like:
>>>
>>> if (new_open) {
>>> 	ASSERT(mch != ps->ps_mch);
>>> 	ASSERT(mh != ps->ps_mh);
>>> 	pfp_close(mch, mh);
>>> }
>>>  
>> Discuss.
>> So you only wanted pfp_close() to do the mac_client_close and the
>> mac_close? That seems hardly worth it... but the new_open change
>> is worthwhile.
>>
>
> I found two cases but as I said I don''t mind either way.
Accept.
So with the above change to use "new_open", it does become
worthwhile to have pfp_close().

>>> 330:
>>> you need to specify MAC_PROMISC_FLAGS_VLAN_TAG_STRIP in
>>> mac_promisc_add() otherwise the mhi_bindsap here will be
>>> incorrect for vlan packets.
>>>  
>> Discuss.
>> Does this comment apply to all uses of mac_promisc_add()?
>> I couldn''t find any that are at line 330...I think you mean
1330?
>>
>
> I was talking about this:
>
> if ((ps->ps_proto != 0) && (ps->ps_proto != hdr.mhi_bindsap))
{
>
> }
>
> e.g.:
> if someone binds to the ip sap, all inbound packets with a
> vlan header will be dropped.
>
> yes, this affects all calls to mac_promisc_add().
>
> an alternative is to not strip the vlan tag, and skip to the real
> sap yourself, like dls_link_header_info().
>
> I''m not sure what''s the correct semantics for pf_packet.
it maybe
> better to pass up a full raw header.
Hmmm!
What would be nice is a way to tell mac_header_info() to optionally
iterate through layer 2 headers until the last one is found and
extract the "real" SAP into mhi_bindsap. Looking at
dls_link_header_info(), I can recall looking at it. In its present
form, it isn''t useful because it requires a dls_link_t *. But
changing it to use mac_handle_t is quite easy (see attached) and
that would make it possible to use that function rather than writing
another version of it. But changing the first parameter of
dls_link_header_info() implies that the function name and location
should also change...which is getting outside of the scope of what
this project is about. The functional requirement is to provide the
raw ethernet packet.

So I think what I''ll do is:
1) change the code to mimic dls_link_header_info()
2) leave the code as is (without the MAC_PROMISC_FLAGS_VLAN_TAG_STRIP
   flag)
3) file a bug that mentions dls_link_header_info() can work with
   a mac_handle_t instead of a dls_link_t * and when the mac/dls
   cleanup gets into gear, it can take care of both pieces of
   code then. See CR#6872007

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090814/ab833238/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dlp_to_mh.patch.txt
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090814/ab833238/attachment.txt>

Peter Memishian

2009-Aug-14 18:29 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

> (the code is: >  mac_client_open()->mac_flow_create() creates an empty flow
 >  at some point later
mac_unicast_add()->mac_client_datapath_setup()->...
 >  mac_flow_create() is called again to create the kstat)
 > 
 > > It also seems odd that this name is serving double-duty, as
it''s
 > > alternately either provided by the consumer of the client API or set
by
 > > mac_client_open() to something that may be of interest to the
consumer
 > > (such as the associated datalink).  One alternative would be to have
the
 > > name always identify the consumer (mci_client_user or whatever) and
have
 > > mci_name always identify the associated datalink, and internally use
the
 > > concatenation of mci_client_user and mci_name name the various
related
 > > objects like kstats and flows.
 > >
 > 
 > sounds like this could work.

Shall I file a CR?

-- 
meem

Nicolas Droux

2009-Aug-14 18:46 UTC

head link

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

Eric Cheng wrote:> On Fri, Aug 14, 2009 at 01:29:53AM -0700, Peter Memishian wrote:
>>  > > So what is the intended purpose of mci_name?  Clearly it is
not to uniquely
>>  > > identify an object.
>>  > 
>>  > right now there is a kstat and flow name that''s tied
with this. all of
>>  > these get renamed together. as long as the mac client
doesn''t call
>>  > mac_unicast_add(), the kstat won''t be created. the flow
name doesn''t
>>  > matter since this is an internal, not user-defined, flow.
>>
>> This seems pretty fragile -- e.g., it is not at all obvious that
>> introducing a mac_unicast_add() call can cause kstat problems because
>> of previously-overlapping mac client names.  Also, if the flow name
>> doesn''t matter, why are we assigning a name at all?
>>
> 
> the flow name is there because it will be used for creating the kstat
> in case mac_unicast_add() gets called. yes, this sounds a bit convoluted.
> 
> (the code is:
>  mac_client_open()->mac_flow_create() creates an empty flow
>  at some point later
mac_unicast_add()->mac_client_datapath_setup()->...
>  mac_flow_create() is called again to create the kstat)
This may no longer be the case in the near future. BPF can send packets, 
and these packets should be accounted by dlstat(1M) when it displays per 
MAC-client statistics.

So we will have to maintain statistics which are derived from the MAC 
client name, which could conflict with the dls client if simply using 
MAC_OPEN_FLAGS_USE_DATALINK_NAME.
>> It also seems odd that this name is serving double-duty, as
it''s
>> alternately either provided by the consumer of the client API or set by
>> mac_client_open() to something that may be of interest to the consumer
>> (such as the associated datalink).  One alternative would be to have
the
>> name always identify the consumer (mci_client_user or whatever) and
have
>> mci_name always identify the associated datalink, and internally use
the
>> concatenation of mci_client_user and mci_name name the various related
>> objects like kstats and flows.
>>
> 
> sounds like this could work.
+1

Nicolas.

Eric Cheng

2009-Aug-14 20:23 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Fri, Aug 14, 2009 at 11:29:50AM -0700, Peter Memishian
wrote:> 
>  > (the code is:
>  >  mac_client_open()->mac_flow_create() creates an empty flow
>  >  at some point later
mac_unicast_add()->mac_client_datapath_setup()->...
>  >  mac_flow_create() is called again to create the kstat)
>  > 
>  > > It also seems odd that this name is serving double-duty, as
it''s
>  > > alternately either provided by the consumer of the client API or
set by
>  > > mac_client_open() to something that may be of interest to the
consumer
>  > > (such as the associated datalink).  One alternative would be to
have the
>  > > name always identify the consumer (mci_client_user or whatever)
and have
>  > > mci_name always identify the associated datalink, and internally
use the
>  > > concatenation of mci_client_user and mci_name name the various
related
>  > > objects like kstats and flows.
>  > >
>  > 
>  > sounds like this could work.
> 
> Shall I file a CR?
>
yes, thanks!

eric

Eric Cheng

2009-Aug-14 21:10 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Fri, Aug 14, 2009 at 10:58:04AM -0700, Darren Reed
wrote:> >no, it won''t get freed.
> >I just think it''s better to not change the expected behavior
of
> >mpi_fn() because someone passed in a certain flag. I think
it''s
> >more consistent with other mac callbacks to let it consume the
> >packet.
> >
> >If you''re concerned about copying cost, you could do dupmsg()
instead
> >of copymsg() if mpi_no_copy is set.
> 
> No. The entire point of the flag is to avoid any duplication,
> be it dupmsg or copymsg. They''re simply not required with BPF
> and do nothing except making things run slower. When using BPF
> to capture packets, there should be no need to allocate any
> new data structures in the process of capturing the packet
> and the only time packet data is copied is when it matches the
> filter being applied to the network traffic.
> 
I understand that with this flag the mpi_fn() has read-only semantics.
my worry is that at some point some code (maybe not bpf) may try
to queue the mp instead of completely processing it in the context
of mpi_fn(). this is definitely not allowed because mp could be freed
from the real datapath. the danger with callbacks is you can''t tell
how deep they run and you can''t guarantee they obey certain retrictions
throughout.

do you know what''s the cost of dupmsg() versus not doing it?
if it''s not significant, I think it''s worthwhile keeping
callback
semantics consistent in the mac.

 > Hmmm!
> What would be nice is a way to tell mac_header_info() to optionally
> iterate through layer 2 headers until the last one is found and
> extract the "real" SAP into mhi_bindsap. Looking at
> dls_link_header_info(), I can recall looking at it. In its present
> form, it isn''t useful because it requires a dls_link_t *. But
> changing it to use mac_handle_t is quite easy (see attached) and
> that would make it possible to use that function rather than writing
> another version of it. But changing the first parameter of
> dls_link_header_info() implies that the function name and location
> should also change...which is getting outside of the scope of what
> this project is about. The functional requirement is to provide the
> raw ethernet packet.
> 
> So I think what I''ll do is:
> 1) change the code to mimic dls_link_header_info()
> 2) leave the code as is (without the MAC_PROMISC_FLAGS_VLAN_TAG_STRIP
>   flag)
> 3) file a bug that mentions dls_link_header_info() can work with
>   a mac_handle_t instead of a dls_link_t * and when the mac/dls
>   cleanup gets into gear, it can take care of both pieces of
>   code then. See CR#6872007
>
sounds fine to me.

eric

Darren Reed

2009-Aug-14 21:53 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On 14/08/09 02:10 PM, Eric Cheng wrote:> On Fri, Aug 14, 2009 at 10:58:04AM -0700, Darren Reed wrote:
>   
>>> no, it won''t get freed.
>>> I just think it''s better to not change the expected
behavior of
>>> mpi_fn() because someone passed in a certain flag. I think
it''s
>>> more consistent with other mac callbacks to let it consume the
>>> packet.
>>>
>>> If you''re concerned about copying cost, you could do
dupmsg() instead
>>> of copymsg() if mpi_no_copy is set.
>>>       
>> No. The entire point of the flag is to avoid any duplication,
>> be it dupmsg or copymsg. They''re simply not required with BPF
>> and do nothing except making things run slower. When using BPF
>> to capture packets, there should be no need to allocate any
>> new data structures in the process of capturing the packet
>> and the only time packet data is copied is when it matches the
>> filter being applied to the network traffic.
>>
>>     
>
> I understand that with this flag the mpi_fn() has read-only semantics.
> my worry is that at some point some code (maybe not bpf) may try
> to queue the mp instead of completely processing it in the context
> of mpi_fn(). this is definitely not allowed because mp could be freed
> from the real datapath. the danger with callbacks is you can''t
tell
> how deep they run and you can''t guarantee they obey certain
retrictions
> throughout.
>   
If the mac API''s were a committed set of API''s to use, then
the thing
to do would be to document the behaviour required by the flag so
that it would be clear when it was appropriate to use it.

The way I read your comment, you are arguing for the code to be
slower because someone might write buggy code that uses this flag.

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090814/98177c38/attachment.html>

Eric Cheng

2009-Aug-14 22:08 UTC

head link

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

On Fri, Aug 14, 2009 at 02:53:03PM -0700, Darren Reed
wrote:> >
> >I understand that with this flag the mpi_fn() has read-only semantics.
> >my worry is that at some point some code (maybe not bpf) may try
> >to queue the mp instead of completely processing it in the context
> >of mpi_fn(). this is definitely not allowed because mp could be freed
> >from the real datapath. the danger with callbacks is you can''t
tell
> >how deep they run and you can''t guarantee they obey certain
retrictions
> >throughout.
> >  
> 
> If the mac API''s were a committed set of API''s to use,
then the thing
> to do would be to document the behaviour required by the flag so
> that it would be clear when it was appropriate to use it.
> 
> The way I read your comment, you are arguing for the code to be
> slower because someone might write buggy code that uses this flag.
>
no, that''s not it. my opinion is in certain cases consistency is more
important than performance, especially framework level code.
what''s being introduced is inconsistent not only with the mac, but
with many other callback related code throughout the ip stack. that''s
my main concern.

anyway if you really want to keep it I''m fine. if you have some
ballpark
numbers about the performance cost I''d like to have them. 

eric

Peter Memishian

2009-Aug-14 23:37 UTC

head link

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

> >> It also seems odd that this name is serving double-duty, as
it''s > >> alternately either provided by the consumer of the client API or
set by
 > >> mac_client_open() to something that may be of interest to the
consumer
 > >> (such as the associated datalink).  One alternative would be to
have the
 > >> name always identify the consumer (mci_client_user or whatever)
and have
 > >> mci_name always identify the associated datalink, and internally
use the
 > >> concatenation of mci_client_user and mci_name name the various
related
 > >> objects like kstats and flows.
 > >>
 > > 
 > > sounds like this could work.
 > 
 > +1

Filed 6872170.

-- 
meem

crossbow discuss - Jul 2009 - BPF/PF_PACKET Code review

[crossbow-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

[crossbow-discuss] [clearview-discuss] [networking-discuss] BPF/PF_PACKET Code review