thr3ads.net - Shorewall users - hfsc question [May 2011]

If this information is useful, please help other people find it:
Share via:

Mr Dash Four

2011-May-10 21:30 UTC

hfsc question

I have the following skeleton of classes ("a" is the number of my eth0
device in tcdevices):

a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for 
outgoing - to my ISP1 - traffic (much lower speeds than the rest of my 
nets: 40KB/s out, 1200KB/s in - that is KBytes - as in 1024 bytes, not 
kbits)!
a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax 
50ms, umax 1500b  - VOIP traffic routed outside
a:13:15, priority 3, guaranteed speed 80kbit to full, dmax ??, umax 
1500b - VPN 1 traffic going out
a:13:16, priority 4, guaranteed speed 80kbit to full, dmax ??, umax 
1500b - VPN 2 traffic going out
a:13:17, priority 5, guaranteed speed 40kbit to full, dmax ??, umax 
1500b - other unclassified traffic going out
a:18 ... (other internal GBit traffic)

My question is - I have read the superbly put section on HFSC in the 
complex traffic shaping article 
(http://shorewall.net/traffic_shaping.htm), which deciphers the "HFSC 
Scheduling with Linux" article //pretty well (thanks Tom!), and spend an 
hour calculating the subclasses'' dmax times (yes, I managed to do it at
the end, I think) following the example in that article.

What I am not entirely certain is this - can I use more than one class 
on which to "boost" the dmax values as I did with the VOIP class
(13:14)
above - on class 13:15 for example - and dump all the excesses from both 
classes on the remaining "leaves" - classes 16 and 17? I know their
dmax
values could be in their hundreds of milliseconds, but I am prepared to 
take that hit.

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Tom Eastep

2011-May-11 13:13 UTC

head link

Re: hfsc question

On 05/10/2011 02:30 PM, Mr Dash Four wrote:> I have the following skeleton of classes ("a" is the number of my
> eth0 device in tcdevices):
> 
> a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class
> for outgoing - to my ISP1 - traffic (much lower speeds than the rest
> of my nets: 40KB/s out, 1200KB/s in - that is KBytes - as in 1024
> bytes, not kbits)! a:13:14, priority 2, guaranteed speed 120kbit to
> full (40KB/s), dmax 50ms, umax 1500b  - VOIP traffic routed outside 
> a:13:15, priority 3, guaranteed speed 80kbit to full, dmax ??, umax 
> 1500b - VPN 1 traffic going out a:13:16, priority 4, guaranteed speed
> 80kbit to full, dmax ??, umax 1500b - VPN 2 traffic going out 
> a:13:17, priority 5, guaranteed speed 40kbit to full, dmax ??, umax 
> 1500b - other unclassified traffic going out a:18 ... (other internal
> GBit traffic)
> 
> My question is - I have read the superbly put section on HFSC in the
>  complex traffic shaping article 
> (http://shorewall.net/traffic_shaping.htm), which deciphers the "HFSC
>  Scheduling with Linux" article //pretty well (thanks Tom!),
You''re welcome.
> and spend an hour calculating the subclasses'' dmax times (yes, I
> managed to do it at the end, I think) following the example in that
> article.
> 
> What I am not entirely certain is this - can I use more than one
> class on which to "boost" the dmax values as I did with the VOIP
> class (13:14) above - on class 13:15 for example - and dump all the
> excesses from both classes on the remaining "leaves" - classes 16
and
> 17? I know their dmax values could be in their hundreds of
> milliseconds, but I am prepared to take that hit.
I don''t know. I guess that you will have to experiment and see. Please
let us know what you learn, so I can update the Complex TC doc with your
findings.

-Tom
-- 
Tom Eastep        \ When I die, I want to go like my Grandfather who
Shoreline,         \ died peacefully in his sleep. Not screaming like
Washington, USA     \ all of the passengers in his car
http://shorewall.net \________________________________________________



------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Mr Dash Four

2011-May-11 14:46 UTC

head link

Re: hfsc question

> I don''t know. I guess that you will have to experiment and see.
Please
> let us know what you learn, so I can update the Complex TC doc with your
> findings.
>   I think I figured it out - the tricky part was to determine the excesses 
and the additional delay I have to split between the two remaining 
classes - 16 & 17. So, here is what I think the final figures should be:

a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for 
outgoing traffic
a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax 
60ms, umax 1500b  - VOIP traffic routed outside ("boosted" to
200kbits)
a:13:15, priority 3, guaranteed speed 80kbit to full, dmax 100ms, umax 
1500b - VPN 1 traffic going out ("boosted" to 120kbits)
a:13:16, priority 4, guaranteed speed 80kbit to full, dmax 224ms, umax 
1500b - VPN 2 traffic going out
a:13:17, priority 5, guaranteed speed 40kbit to full, dmax 374ms, umax 
1500b - other unclassified traffic going out

N.B. I had to revise up my initial dmax value for class 14 from 50ms to 
60ms as I don''t think I could have the total speed of all *boosted* 
subclasses (14 & 15 in my case) to exceed the speed of the parent class 
(class 13). My calculations are as follows (strap yourselves in!):

1. Nominal delays (delay at which classes transmit packets at nominal 
speed):
14 - 100ms -> (8*1,500)/120,000
15 - 150ms -> (8*1,500)/80,000
16 - 150ms -> (8*1,500)/80,000
17 - 300ms -> (8*1,500)/40,000

2. Calculating dmax and excess bits values for class 14:
2.1 The "boost" speed for class 14 is 200kbits, which translates to
60ms
-> (8*1,500)/200,000. Therefore dmax for class 14 = 60ms.
2.2 The total speed of all subclasses when class 14 has "boost" speed
is
200,000 + 80,000 + 80,000 + 40,000 = 400kbits
2.3 The total amount of bits transmitted at that speed for 60ms is 
24,000 (400,000*60ms)
2.4 The total amount of bits transmitted at nominal speed by all classes 
for 60ms is 19,200 (320,000*60ms)
2.5 Therefore the amount of excess bits for class 14 is 4,800 bits 
(24,000 - 19,200)

3. Calculating dmax and excess bits values for class 15:
3.1 The "boost" speed for class 15 is 120kbits, which translates to 
100ms. Therefore dmax for class 15 = 100ms
3.2 The total speed of all subclasses when class 15 has "boost" speed
is
120,000 + 120,000 + 80,000 + 40,000 = 360kbits
3.3 The total amount of bits transmitted at that speed for 100ms is 
36,000 (360,000*100ms)
3.4 The total amount of bits transmitted at nominal speed by all classes 
for 100ms is 32,000 (320,000*100ms)
3.5 Therefore the amount of excess bits for class 15 is 4,000 bits 
(36,000 - 32,000)

4. Determine dmax values for leaf classes 16 & 17, taking into account 
punitive excesses from classes 14 & 15.
4.1 The total number of excess bits is 8,800 (4,800 + 4,000). I have 
decided to split them up according to the nominal speed of classes 16 & 
17 so that each class takes roughly equal measure (in terms of time in 
milliseconds) of the punishment. In other words, class 16 takes 2/3 of 
the total amount of excess bits (as it has twice the speed of class 17) 
with class 17 taking the rest. So, 2/3 of 8,800 ~ 5,867 bits, while 
class 17 takes the rest - 2,933 bits.
4.2 The amount of time required to transmit 5,867 bits at speed 80kbit/s 
(the nominal speed of class 16) is 73.33ms (5,867/80,000) - rounded up 
to 74ms. Therefore dmax value for class 16 is 224ms (150ms + 74ms).
4.3 The amount of time required to transmit 2,933 bits at speed 40kbit/s 
(the nominal speed of class 17) is 73.325ms (2,933/40,000) - rounded up 
to 74ms. Therefore dmax value for class 17 is 374ms (300ms + 74ms).

Does the above makes sense? How do I actually "test" whether these 
calculations are correct?

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Mr Dash Four

2011-May-11 14:54 UTC

head link

Re: hfsc question

> a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for 
> outgoing traffic
> a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax 
> 60ms, umax 1500b  - VOIP traffic routed outside ("boosted" to
200kbits)
> a:13:15, priority 3, guaranteed speed 80kbit to full, dmax 100ms, umax 
> 1500b - VPN 1 traffic going out ("boosted" to 120kbits)
> a:13:16, priority 4, guaranteed speed 80kbit to full, dmax 224ms, umax 
> 1500b - VPN 2 traffic going out
> a:13:17, priority 5, guaranteed speed 40kbit to full, dmax 374ms, umax 
> 1500b - other unclassified traffic going outForgot to post this question:

What happens when I only specify dmax and umax values on the boosted 
classes and leave out the rest? For example:

a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for 
outgoing traffic
a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax 
60ms, umax 1500b  - VOIP traffic routed outside ("boosted" to
200kbits)
a:13:15, priority 3, guaranteed speed 80kbit to full, dmax 100ms, umax 
1500b - VPN 1 traffic going out ("boosted" to 120kbits)
a:13:16, priority 4, guaranteed speed 80kbit to full - VPN 2 traffic 
going out
a:13:17, priority 5, guaranteed speed 40kbit to full - other 
unclassified traffic going out


The reason I am asking is this:

Further down in my "a:XX" set of statements I have another 
leaf-structured classes, which are only used for my internal subnets. 
They all have vastly superior speeds (1GBit/s+), so I did boost the 
speed of the traffic I wanted "boosted", but left everything else 
without specifying dmax and umax as the excess delays calculated were 
absolutely miniscule (less than a millisecond!).

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Tom Eastep

2011-May-11 15:33 UTC

head link

Re: hfsc question

On 05/11/2011 07:46 AM, Mr Dash Four wrote:> 
>> I don''t know. I guess that you will have to experiment and
see. Please
>> let us know what you learn, so I can update the Complex TC doc with
your
>> findings.
>>   
> I think I figured it out - the tricky part was to determine the excesses 
> and the additional delay I have to split between the two remaining 
> classes - 16 & 17. So, here is what I think the final figures should
be:
> 
...> Does the above makes sense? How do I actually "test" whether
these
> calculations are correct?
The calculations make sense. I think my first attempt to test this would
be to run parallel netperfs (one netperf per class) and look at the
output of ''shorewall show tc''. That will show you what the
actual speeds
are. I don''t recall if netperf provides any latency data.

-Tom
-- 
Tom Eastep        \ When I die, I want to go like my Grandfather who
Shoreline,         \ died peacefully in his sleep. Not screaming like
Washington, USA     \ all of the passengers in his car
http://shorewall.net \________________________________________________



------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Tom Eastep

2011-May-11 15:34 UTC

head link

Re: hfsc question

On 05/11/2011 07:54 AM, Mr Dash Four wrote:> 
>> a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for 
>> outgoing traffic
>> a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax 
>> 60ms, umax 1500b  - VOIP traffic routed outside ("boosted" to
200kbits)
>> a:13:15, priority 3, guaranteed speed 80kbit to full, dmax 100ms, umax 
>> 1500b - VPN 1 traffic going out ("boosted" to 120kbits)
>> a:13:16, priority 4, guaranteed speed 80kbit to full, dmax 224ms, umax 
>> 1500b - VPN 2 traffic going out
>> a:13:17, priority 5, guaranteed speed 40kbit to full, dmax 374ms, umax 
>> 1500b - other unclassified traffic going out
> Forgot to post this question:
> 
> What happens when I only specify dmax and umax values on the boosted 
> classes and leave out the rest?
I don''t know.

-Tom
-- 
Tom Eastep        \ When I die, I want to go like my Grandfather who
Shoreline,         \ died peacefully in his sleep. Not screaming like
Washington, USA     \ all of the passengers in his car
http://shorewall.net \________________________________________________



------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Mr Dash Four

2011-May-11 15:45 UTC

head link

Re: hfsc question

> The calculations make sense. I think my first attempt to test this would
> be to run parallel netperfs (one netperf per class) and look at the
> output of ''shorewall show tc''. That will show you what
the actual speeds
> are. I don''t recall if netperf provides any latency data.
>   I haven''t used netperf before, but I guess there is always time to try 
it out. If it works I will then test a configuration where I have only 
dmax:umax specified on "boosted" classes and see what happens.


------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Mr Dash Four

2011-May-12 02:39 UTC

head link

Re: hfsc question

> The calculations make sense. I think my first attempt to test this would
> be to run parallel netperfs (one netperf per class) and look at the
> output of ''shorewall show tc''. That will show you what
the actual speeds
> are. I don''t recall if netperf provides any latency data.
>   Well, I did quite a bit of testing in the past couple of hours, but I 
am, quite frankly, unimpressed!

I used netperf''s biggest brother - iperf - instead.

The values of dmax and umax do not seem to have any effect on the net 
speed whatsoever - at least that is what the tests seem to indicate. The 
speed limits are very strictly observed, so is the priority of each 
class, but the results are roughly the same regardless of whether 
dmax:umax have been specified in tcclasses (see netspeed-tests.txt 
attached).

I did run "shorewall show tc eth0" (as this was the device I was
testing
everything on) after each test, but there was no speed indication there 
- just the number of packets passed through each class, which the result 
of iperf shows anyway.


------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Tom Eastep

2011-May-12 02:54 UTC

head link

Re: hfsc question

On 5/11/11 7:39 PM, Mr Dash Four wrote:> 
>> The calculations make sense. I think my first attempt to test this
would
>> be to run parallel netperfs (one netperf per class) and look at the
>> output of ''shorewall show tc''. That will show you
what the actual speeds
>> are. I don''t recall if netperf provides any latency data.
>>   
> Well, I did quite a bit of testing in the past couple of hours, but I
> am, quite frankly, unimpressed!
> 
> I used netperf''s biggest brother - iperf - instead.
I consider iperf to be netperf''s little sister

-Tom
-- 
Tom Eastep        \ When I die, I want to go like my Grandfather who
Shoreline,         \ died peacefully in his sleep. Not screaming like
Washington, USA     \ all of the passengers in his car
http://shorewall.net \________________________________________________



------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Tom Eastep

2011-May-12 03:03 UTC

head link

Re: hfsc question

On 5/11/11 7:39 PM, Mr Dash Four wrote:
> I did run "shorewall show tc eth0" (as this was the device I was
testing
> everything on) after each test
Uh -- ''shorewall show tc eth0'' is a real-time command. You
must execute
the command while eth0 is under load.

-Tom
-- 
Tom Eastep        \ When I die, I want to go like my Grandfather who
Shoreline,         \ died peacefully in his sleep. Not screaming like
Washington, USA     \ all of the passengers in his car
http://shorewall.net \________________________________________________



------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Tom Eastep

2011-May-12 03:16 UTC

head link

Re: hfsc question

On 5/11/11 7:54 PM, Tom Eastep wrote:> On 5/11/11 7:39 PM, Mr Dash Four wrote:
>>
>>> The calculations make sense. I think my first attempt to test this
would
>>> be to run parallel netperfs (one netperf per class) and look at the
>>> output of ''shorewall show tc''. That will show you
what the actual speeds
>>> are. I don''t recall if netperf provides any latency data.
>>>   
>> Well, I did quite a bit of testing in the past couple of hours, but I
>> am, quite frankly, unimpressed!
>>
>> I used netperf''s biggest brother - iperf - instead.
> 
> I consider iperf to be netperf''s little sister

I recommended netperf because I already knew that iperf only reports
throughput and not latency. When you vary TC parameters that only affect
latency, you can''t expect throughput to change.

I still don''t know if netperf reports latency information. I expected
you to try it and let us know.

-Tom
-- 
Tom Eastep        \ When I die, I want to go like my Grandfather who
Shoreline,         \ died peacefully in his sleep. Not screaming like
Washington, USA     \ all of the passengers in his car
http://shorewall.net \________________________________________________



------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Mr Dash Four

2011-May-12 11:16 UTC

head link

Re: hfsc question

> I still don''t know if netperf reports latency information. I
expected
> you to try it and let us know.
>    From the NetPerf web page:

Netperf is a benchmark that can be used to measure the performance of 
many different types of networking. It provides tests for both 
unidirecitonal throughput, and *end-to-end latency*. The environments 
currently measureable by netperf include:

    * TCP and UDP via BSD Sockets for both IPv4 and IPv6
    * DLPI
    * Unix Domain Sockets
    * SCTP for both IPv4 and IPv6

So, latency could be tested. According to the man page the tests 
available are as follows:

TCP_STREAM
TCP_SENDFILE
TCP_MAERTS
TCP_RR
TCP_CRR
UDP_STREAM
UDP_RR
DLCO_STREAM
DLCO_RR
DLCL_STREAM
DLCL_RR
STREAM_STREAM
STREAM_RR
TCPIPV6_STREAM
TCPIPV6_RR
TCPIPV6_CRR
UDPIPV6_STREAM
UDPIPV6_RR
DG_STREAM
DG_RR
LOC_CPU
REM_CPU

Of course I have absolutely no idea what all of that means!


------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Mr Dash Four

2011-May-12 19:19 UTC

head link

Re: hfsc question

> So, latency could be tested. According to the man page the tests 
> available are as follows:
>
> TCP_RR
> TCP_CRR
> UDP_RR
> TCPIPV6_RR
> TCPIPV6_CRR
> UDPIPV6_RRThese are the tests I ran - I had to recompile the whole package to 
include the"histogram" option (which gives pretty good idea of the 
latencies involved), though I am not sure whether I could believe these 
- I am getting some really low latencies - about 6-10ms even when I 
constrained the bandwidth and put the entire subnet under load. I also 
used uperf (where I could customise every aspect of the tests involved), 
but got very similar results.

One very annoying feature of netperf is that I cannot specify the source 
port for the data tests - only for the control connection, which is of 
no use to me. Without this, it is very difficult to "shove" the
traffic
generated from these tests into the classes I defined. uperf in this 
respect is much more flexible.

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Mr Dash Four

2011-May-12 19:19 UTC

head link

Re: hfsc question

>> I did run "shorewall show tc eth0" (as this was the device I
was testing
>> everything on) after each test
>>     
>
> Uh -- ''shorewall show tc eth0'' is a real-time command.
You must execute
> the command while eth0 is under load.
>   Unless I am missing something obvious, it doesn''t matter whether I run 
this command in real-time or not as it won''t give any indication of the
latencies involved - this command gives me, in various forms, the number 
of packets/bytes passed through the chain of classes, nothing more.

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay

Shorewall users - May 2011 - hfsc question

hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question

Re: hfsc question