thr3ads.net - crossbow discuss - [crossbow-discuss] inadequate flow hashing algorithm [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Sebastien Roy

2009-Jan-08 16:40 UTC

[crossbow-discuss] inadequate flow hashing algorithm

Hi Crossbow team,

I''m making some progress with the Crossbow/Clearview iptun merge, but
am
running into a problem receiving packets.

When the first IP interface above a tunnel binds, a flow is created, and
that flow is inserted into the flow hash table using the
flow_l2_hash_fe() hashing function.  When a packet is received, it
attempts to match a flow by calling flow_l2_hash().  Both of these
functions use the HASH_MAC_VID() hashing macro, which is passed the
destination MAC address as its first argument.  None of this is news to
you, every l2 flow behaves this way.

This HASH_MAC_VID() macro looks like this:

#define	HASH_MAC_VID(a, v, s) \
	((((uint32_t)(a)[3] + (a)[4] + (a)[5]) ^ (v)) % (s))

I''m sure you see the problem.  It accesses data that may be off of the
end of the MAC address is the MAC address is less than 6 bytes, and it
also assumes that this 3-byte sample of the MAC address will represent
an adequately variable sample.  Perhaps this macro is the reason why
there''s a check that MAC addresses must be greater than or equal to 6
bytes long...

In the interest of removing this MAC address length limitation (because
I have to to get iptun to work), this hashing algorithm needs to be more
flexible and take into account the length of the MAC address.  Perhaps
the MAC-type plugins could provide their hashing function?  Alternately,
the hashing function could take the MAC address length into account and
only use a sample of bytes within the actual address.

Any thoughts on how you want to see this fixed?

Thanks,
-Seb

Sebastien Roy

2009-Jan-08 20:14 UTC

head link

[crossbow-discuss] inadequate flow hashing algorithm

On Thu, 2009-01-08 at 11:40 -0500, Sebastien Roy wrote:> This HASH_MAC_VID() macro looks like this:
> 
> #define	HASH_MAC_VID(a, v, s) \
> 	((((uint32_t)(a)[3] + (a)[4] + (a)[5]) ^ (v)) % (s))
A followup question:  Assuming that we keep this macro for
Ethernet-based flow hashing (non-Ethernet flows have no business hashing
using a VLAN-id anyway), wouldn''t it be simpler to treat the last 4
bytes of the Ethernet address as an integer, xor it with the VLAN id and
be done with it?  Why do the addition there?

-Seb

Eric Cheng

2009-Jan-08 22:20 UTC

head link

[crossbow-discuss] inadequate flow hashing algorithm

On Thu, Jan 08, 2009 at 11:40:31AM -0500, Sebastien Roy
wrote:> Hi Crossbow team,
> 
> I''m making some progress with the Crossbow/Clearview iptun merge,
but am
> running into a problem receiving packets.
> 
> When the first IP interface above a tunnel binds, a flow is created, and
> that flow is inserted into the flow hash table using the
> flow_l2_hash_fe() hashing function.  When a packet is received, it
> attempts to match a flow by calling flow_l2_hash().  Both of these
> functions use the HASH_MAC_VID() hashing macro, which is passed the
> destination MAC address as its first argument.  None of this is news to
> you, every l2 flow behaves this way.
> 
> This HASH_MAC_VID() macro looks like this:
> 
> #define	HASH_MAC_VID(a, v, s) \
> 	((((uint32_t)(a)[3] + (a)[4] + (a)[5]) ^ (v)) % (s))
> 
> I''m sure you see the problem.  It accesses data that may be off of
the
> end of the MAC address is the MAC address is less than 6 bytes, and it
> also assumes that this 3-byte sample of the MAC address will represent
> an adequately variable sample.  Perhaps this macro is the reason why
> there''s a check that MAC addresses must be greater than or equal
to 6
> bytes long...
>
sorry I forgot about this case when I said the address length
doesn''t matter.

 > In the interest of removing this MAC address length limitation (because
> I have to to get iptun to work), this hashing algorithm needs to be more
> flexible and take into account the length of the MAC address.  Perhaps
> the MAC-type plugins could provide their hashing function?  Alternately,
> the hashing function could take the MAC address length into account and
> only use a sample of bytes within the actual address.
> 
> Any thoughts on how you want to see this fixed?
>
I would prefer to keep the existing macro and use it in flow_ether_hash().
for flow_l2_hash() and flow_l2_hash_fe(), we could do something like this:

#define	HASH_L2(a, v, l, s, h) {
	uint32_t i, sum = 0, nbytes = ((l) < 3) ? (l) : 3;	\
	for (i = (l) - nbytes; i < (l); i++)			\
		sum += (a)[i];					\
	h = (sum ^ (v)) % (s);					\
}

static uint32_t
flow_l2_hash(flow_tab_t *ft, flow_state_t *s)
{
	flow_l2info_t	*l2 = &s->fs_l2info;
	uint32_t	h;

	HASH_l2(l2->l2_daddr, l2->l2_vid, l2->l2_addrlen, ft->ft_size, h);
	return (h);
}

note that the new l2_addrlen will have to be filled in at
flow_l2_accept.


regarding why we do a sum instead of using the last 4 bytes as an integer,
I think the reason was we can''t be sure of the alignment of the mac
address
so we can''t just use *((uint32_t *)&l2->l2_daddr[2]) directly.

eric

Sebastien Roy

2009-Jan-08 22:49 UTC

head link

[crossbow-discuss] inadequate flow hashing algorithm

Hi Eric,

On Thu, 2009-01-08 at 14:20 -0800, Eric Cheng wrote:> On Thu, Jan 08, 2009 at 11:40:31AM -0500, Sebastien Roy wrote:
> > In the interest of removing this MAC address length limitation
(because
> > I have to to get iptun to work), this hashing algorithm needs to be
more
> > flexible and take into account the length of the MAC address.  Perhaps
> > the MAC-type plugins could provide their hashing function? 
Alternately,
> > the hashing function could take the MAC address length into account
and
> > only use a sample of bytes within the actual address.
> > 
> > Any thoughts on how you want to see this fixed?
> >
> 
> I would prefer to keep the existing macro and use it in flow_ether_hash().
> for flow_l2_hash() and flow_l2_hash_fe(), we could do something like this:...

That''s what I ended up doing.  See the changes in mac_flow.c in the
clearview webrev below.  Note that I had to implement a
flow_ether_hash_fe() in order to keep the hash functions the same in
both directions for Ethernet (i.e., you can''t have flow_ether_hash()
use
one hash function, but have Ethernet use flow_l2_hash_fe() which uses a
different function):

http://clearview.east/ws/clearview/clearview/webrev/
> #define HASH_L2(a, v, l, s, h) {
>         uint32_t i, sum = 0, nbytes = ((l) < 3) ? (l) : 3;      \
>         for (i = (l) - nbytes; i < (l); i++)                    \
>                 sum += (a)[i];                                  \
>         h = (sum ^ (v)) % (s);                                  \
> }
I ended up doing something similar, except using a function instead of a
macro (I guess I could use a macro if need be, but I''m not convinced
that the macro helps performance given compiler optimization).  Note
that there is no point in hashing using a VLAN id, since only Ethernet
can have VLANs (right?), and it has it''s own specialized hashing
function.  In any case, please check out what I came up with, and I''ll
be happy to apply any changes based on feedback you may have.
> note that the new l2_addrlen will have to be filled in at
> flow_l2_accept.
We have the MAC address length in the flow table itself, since it''s in
the mac_impl_t''s mi_type->mt_address_length.
> regarding why we do a sum instead of using the last 4 bytes as an integer,
> I think the reason was we can''t be sure of the alignment of the
mac address
> so we can''t just use *((uint32_t *)&l2->l2_daddr[2])
directly.
Okay, that makes sense.

Thanks,
-Seb

Eric Cheng

2009-Jan-08 23:43 UTC

head link

[crossbow-discuss] inadequate flow hashing algorithm

On Thu, Jan 08, 2009 at 05:49:26PM -0500, Sebastien Roy
wrote:> Hi Eric,
> 
> On Thu, 2009-01-08 at 14:20 -0800, Eric Cheng wrote:
> > On Thu, Jan 08, 2009 at 11:40:31AM -0500, Sebastien Roy wrote:
> > > In the interest of removing this MAC address length limitation
(because
> > > I have to to get iptun to work), this hashing algorithm needs to
be more
> > > flexible and take into account the length of the MAC address. 
Perhaps
> > > the MAC-type plugins could provide their hashing function? 
Alternately,
> > > the hashing function could take the MAC address length into
account and
> > > only use a sample of bytes within the actual address.
> > > 
> > > Any thoughts on how you want to see this fixed?
> > >
> > 
> > I would prefer to keep the existing macro and use it in
flow_ether_hash().
> > for flow_l2_hash() and flow_l2_hash_fe(), we could do something like
this:
> ...
> 
> That''s what I ended up doing.  See the changes in mac_flow.c in
the
> clearview webrev below.  Note that I had to implement a
> flow_ether_hash_fe() in order to keep the hash functions the same in
> both directions for Ethernet (i.e., you can''t have
flow_ether_hash() use
> one hash function, but have Ethernet use flow_l2_hash_fe() which uses a
> different function):
> 
> http://clearview.east/ws/clearview/clearview/webrev/
>
I''m fine with your changes.
thanks for fixing this.

eric
 > > #define HASH_L2(a, v, l, s, h) {
> >         uint32_t i, sum = 0, nbytes = ((l) < 3) ? (l) : 3;      \
> >         for (i = (l) - nbytes; i < (l); i++)                    \
> >                 sum += (a)[i];                                  \
> >         h = (sum ^ (v)) % (s);                                  \
> > }
> 
> I ended up doing something similar, except using a function instead of a
> macro (I guess I could use a macro if need be, but I''m not
convinced
> that the macro helps performance given compiler optimization).  Note
> that there is no point in hashing using a VLAN id, since only Ethernet
> can have VLANs (right?), and it has it''s own specialized hashing
> function.  In any case, please check out what I came up with, and
I''ll
> be happy to apply any changes based on feedback you may have.
> 
> > note that the new l2_addrlen will have to be filled in at
> > flow_l2_accept.
> 
> We have the MAC address length in the flow table itself, since
it''s in
> the mac_impl_t''s mi_type->mt_address_length.
> 
> > regarding why we do a sum instead of using the last 4 bytes as an
integer,
> > I think the reason was we can''t be sure of the alignment of
the mac address
> > so we can''t just use *((uint32_t *)&l2->l2_daddr[2])
directly.
> 
> Okay, that makes sense.
> 
> Thanks,
> -Seb
> 
>

crossbow discuss - Jan 2009 - inadequate flow hashing algorithm

[crossbow-discuss] inadequate flow hashing algorithm

[crossbow-discuss] inadequate flow hashing algorithm

[crossbow-discuss] inadequate flow hashing algorithm

[crossbow-discuss] inadequate flow hashing algorithm

[crossbow-discuss] inadequate flow hashing algorithm