thr3ads.net - crossbow discuss - [crossbow-discuss] questions about tx rings... [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Garrett D''Amore

2007-Jun-15 01:59 UTC

[crossbow-discuss] questions about tx rings...

I''ve been thinking about hardware that has multiple transmit rings
("tx
resources").

We really should have a way to expose this up to the stack.  And 
ideally, the stack should guarantee that a given flow will always be 
sent down using the same hardware tx resource.

I''ve heard that crossbow will deliver this, but I can''t find
evidence of
it in the crossbow gate.  Am I missing something?  Is it functionality 
yet to be added, or is it not planned?

The other problem I''ve heard from PAE, which is that one potential 
approach drivers could use today, which is to map the flow by hashing 
the sending CPU (which one would expect not to change for a given flow) 
is doomed to suffer packet reordering.  Apparently the problem is that 
application threads can get get bounced around between CPUs by the 
scheduler pretty freely (more so than one would thing), and the result 
is that you can''t assume that the sending CPU will be reasonably static
for a given flow.  (I gotta think this wreaks havoc on the caches 
involved... but that''s a different problem.)

_If_ transmitted packets are sent to the stack and always land in a 
delivery queue, then perhaps the outbound queue (squeue?) can have a 
worker thread that doesn''t migrate around.  But in order for that to 
happen, we have to stop having sending threads deliver right to the 
driver driver when intervening queues are empty.

I _think_ this will work better for throughput.  It may hurt latency 
slightly though.  I haven''t measured the latencies involved with
queuing
as opposed to direct delivery through the driver''s xxx_send/xxx_start 
routine, but I''d be curious to know if others here have.

Anyway, let me know your thoughts.

    -- Garrett

Sunay Tripathi

2007-Jun-15 05:46 UTC

head link

[crossbow-discuss] questions about tx rings...

Garrett,

Garrett D''Amore wrote:> I''ve been thinking about hardware that has multiple transmit rings
("tx
> resources").
> 
> We really should have a way to expose this up to the stack.  And 
> ideally, the stack should guarantee that a given flow will always be 
> sent down using the same hardware tx resource.
> 
> I''ve heard that crossbow will deliver this, but I can''t
find evidence of
> it in the crossbow gate.  Am I missing something?  Is it functionality 
> yet to be added, or is it not planned?
Its designed in but code is yet to make it in Crossbow gate. I think
parts of it are sitting in Roamer and Gopi''s workspaces.
> The other problem I''ve heard from PAE, which is that one potential
> approach drivers could use today, which is to map the flow by hashing 
> the sending CPU (which one would expect not to change for a given flow) 
> is doomed to suffer packet reordering.  Apparently the problem is that 
> application threads can get get bounced around between CPUs by the 
> scheduler pretty freely (more so than one would thing), and the result 
> is that you can''t assume that the sending CPU will be reasonably
static
> for a given flow.  (I gotta think this wreaks havoc on the caches 
> involved... but that''s a different problem.)
> 
> _If_ transmitted packets are sent to the stack and always land in a 
> delivery queue, then perhaps the outbound queue (squeue?) can have a 
> worker thread that doesn''t migrate around.  But in order for that
to
> happen, we have to stop having sending threads deliver right to the 
> driver driver when intervening queues are empty.
This doesn''t really apply to forwarding traffic and in case of traffic
terminating on the host, the application thread very rarely is able to
reach the driver directly (its about 17-18% of the time on web 
workloads). The times it does means that there was nothing else to do
anyway and its better to let the thread go through instead of doing
a context switch.
> I _think_ this will work better for throughput.  It may hurt latency 
> slightly though.  I haven''t measured the latencies involved with
queuing
> as opposed to direct delivery through the driver''s
xxx_send/xxx_start
> routine, but I''d be curious to know if others here have.
Yes, you are discussing FireEngine design here. The ARC case has a 
detailed document which discusses all these things. Can''t remember the
case number but search for FireEngine.

Cheers,
Sunay
> 
> Anyway, let me know your thoughts.
> 
>    -- Garrett
> 
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://opensolaris.org/mailman/listinfo/crossbow-discuss

-- 
Sunay Tripathi
Distinguished Engineer
Solaris Core Operating System
Sun MicroSystems Inc.

Solaris Networking:     http://www.opensolaris.org/os/community/networking
Project Crossbow:       http://www.opensolaris.org/os/project/crossbow

Garrett D''Amore

2007-Jun-15 06:03 UTC

head link

[crossbow-discuss] questions about tx rings...

Sunay Tripathi wrote:> Garrett,
>
> Garrett D''Amore wrote:
>> I''ve been thinking about hardware that has multiple transmit
rings
>> ("tx resources").
>>
>> We really should have a way to expose this up to the stack.  And 
>> ideally, the stack should guarantee that a given flow will always be 
>> sent down using the same hardware tx resource.
>>
>> I''ve heard that crossbow will deliver this, but I
can''t find evidence
>> of it in the crossbow gate.  Am I missing something?  Is it 
>> functionality yet to be added, or is it not planned?
>
> Its designed in but code is yet to make it in Crossbow gate. I think
> parts of it are sitting in Roamer and Gopi''s workspaces.
Okay.  Are there any design documents which provide the overall view of 
this?  I''ve read bits and pieces of crossbow, and the marketing 
literature, but I''d really like to have details all the way down the 
driver API level.
>
>> The other problem I''ve heard from PAE, which is that one
potential
>> approach drivers could use today, which is to map the flow by hashing 
>> the sending CPU (which one would expect not to change for a given 
>> flow) is doomed to suffer packet reordering.  Apparently the problem 
>> is that application threads can get get bounced around between CPUs 
>> by the scheduler pretty freely (more so than one would thing), and 
>> the result is that you can''t assume that the sending CPU will
be
>> reasonably static for a given flow.  (I gotta think this wreaks havoc 
>> on the caches involved... but that''s a different problem.)
>>
>> _If_ transmitted packets are sent to the stack and always land in a 
>> delivery queue, then perhaps the outbound queue (squeue?) can have a 
>> worker thread that doesn''t migrate around.  But in order for
that to
>> happen, we have to stop having sending threads deliver right to the 
>> driver driver when intervening queues are empty.
>
> This doesn''t really apply to forwarding traffic 
Agreed.  Although if we use multiple rings for forwarding, we still have 
to be careful to minimize reordering of the forwarded streams.
> and in case of traffic
> terminating on the host, the application thread very rarely is able to
> reach the driver directly (its about 17-18% of the time on web 
> workloads). The times it does means that there was nothing else to do
> anyway and its better to let the thread go through instead of doing
> a context switch.
I think this is a fallacy, even if you have observed it.

Because it ignores another potential location of queuing, which is the 
device driver (and the hardware) itself.  For example, some of the 
hardware rings have fairly deep TX queues -- up to 1,000 packets or more 
in some cases, which can lead to incorrect assumptions about just how 
busy the link really is.  And if you have multiple such rings, its 
really, really important to get the ordering right.

I also fear that the attempt to "let the packet pass thru" is an 
optimization for the case of a lightly loaded environment, without 
regard to the impact it places upon the driver.

Essentially, what I''m saying is, I am concerned that the design that 
requires the NIC driver to consider load balancing and flow management 
is inherently busted.  Its much, much better, I think, if the ordering 
and ring scheduling considerations be handled by the stack, without any 
brains whatsoever on the part of the driver.  Anything else leads to 
either a lot of wasted driver cycles, or drivers that make poor 
decisions because they don''t have sufficient information.  I think we 
can see a bit of both in at least two of the drivers that support 
multiple tx rings: nxge and ce.

This also leads, I think, to some of the craziness that PAE has to do to 
manually tune the device drivers.  We really, I think, should be looking 
at ways to remove driver tuning from the steps that customers have to 
use to get good performance.
>
>> I _think_ this will work better for throughput.  It may hurt latency 
>> slightly though.  I haven''t measured the latencies involved
with
>> queuing as opposed to direct delivery through the driver''s 
>> xxx_send/xxx_start routine, but I''d be curious to know if
others here
>> have.
>
> Yes, you are discussing FireEngine design here. The ARC case has a 
> detailed document which discusses all these things. Can''t remember
the
> case number but search for FireEngine.
Thanks, I''ll investigate further.

    -- Garrett
>
> Cheers,
> Sunay
>
>>
>> Anyway, let me know your thoughts.
>>
>>    -- Garrett
>>
>> _______________________________________________
>> crossbow-discuss mailing list
>> crossbow-discuss at opensolaris.org
>> http://opensolaris.org/mailman/listinfo/crossbow-discuss
>
>

Kais Belgaied

2007-Jun-17 20:06 UTC

head link

[crossbow-discuss] questions about tx rings...

Sunay Tripathi wrote:
>> I _think_ this will work better for throughput.  It may hurt latency 
>> slightly though.  I haven''t measured the latencies involved
with
>> queuing as opposed to direct delivery through the driver''s 
>> xxx_send/xxx_start routine, but I''d be curious to know if
others here
>> have.
>
>
> Yes, you are discussing FireEngine design here. The ARC case has a 
> detailed document which discusses all these things. Can''t remember
the
> case number but search for FireEngine.

PSARC/2002/433 FireEngine: A new architecture in Networking


    Kais
>
> Cheers,
> Sunay

Reasonably Related Threads

Search for more possibly parallel threads

crossbow discuss - Jun 2007 - questions about tx rings...

[crossbow-discuss] questions about tx rings...

[crossbow-discuss] questions about tx rings...

[crossbow-discuss] questions about tx rings...

[crossbow-discuss] questions about tx rings...

Reasonably Related Threads