I have the following skeleton of classes ("a" is the number of my eth0 device in tcdevices): a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for outgoing - to my ISP1 - traffic (much lower speeds than the rest of my nets: 40KB/s out, 1200KB/s in - that is KBytes - as in 1024 bytes, not kbits)! a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax 50ms, umax 1500b - VOIP traffic routed outside a:13:15, priority 3, guaranteed speed 80kbit to full, dmax ??, umax 1500b - VPN 1 traffic going out a:13:16, priority 4, guaranteed speed 80kbit to full, dmax ??, umax 1500b - VPN 2 traffic going out a:13:17, priority 5, guaranteed speed 40kbit to full, dmax ??, umax 1500b - other unclassified traffic going out a:18 ... (other internal GBit traffic) My question is - I have read the superbly put section on HFSC in the complex traffic shaping article (http://shorewall.net/traffic_shaping.htm), which deciphers the "HFSC Scheduling with Linux" article //pretty well (thanks Tom!), and spend an hour calculating the subclasses'' dmax times (yes, I managed to do it at the end, I think) following the example in that article. What I am not entirely certain is this - can I use more than one class on which to "boost" the dmax values as I did with the VOIP class (13:14) above - on class 13:15 for example - and dump all the excesses from both classes on the remaining "leaves" - classes 16 and 17? I know their dmax values could be in their hundreds of milliseconds, but I am prepared to take that hit. ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
On 05/10/2011 02:30 PM, Mr Dash Four wrote:> I have the following skeleton of classes ("a" is the number of my > eth0 device in tcdevices): > > a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class > for outgoing - to my ISP1 - traffic (much lower speeds than the rest > of my nets: 40KB/s out, 1200KB/s in - that is KBytes - as in 1024 > bytes, not kbits)! a:13:14, priority 2, guaranteed speed 120kbit to > full (40KB/s), dmax 50ms, umax 1500b - VOIP traffic routed outside > a:13:15, priority 3, guaranteed speed 80kbit to full, dmax ??, umax > 1500b - VPN 1 traffic going out a:13:16, priority 4, guaranteed speed > 80kbit to full, dmax ??, umax 1500b - VPN 2 traffic going out > a:13:17, priority 5, guaranteed speed 40kbit to full, dmax ??, umax > 1500b - other unclassified traffic going out a:18 ... (other internal > GBit traffic) > > My question is - I have read the superbly put section on HFSC in the > complex traffic shaping article > (http://shorewall.net/traffic_shaping.htm), which deciphers the "HFSC > Scheduling with Linux" article //pretty well (thanks Tom!),You''re welcome.> and spend an hour calculating the subclasses'' dmax times (yes, I > managed to do it at the end, I think) following the example in that > article. > > What I am not entirely certain is this - can I use more than one > class on which to "boost" the dmax values as I did with the VOIP > class (13:14) above - on class 13:15 for example - and dump all the > excesses from both classes on the remaining "leaves" - classes 16 and > 17? I know their dmax values could be in their hundreds of > milliseconds, but I am prepared to take that hit.I don''t know. I guess that you will have to experiment and see. Please let us know what you learn, so I can update the Complex TC doc with your findings. -Tom -- Tom Eastep \ When I die, I want to go like my Grandfather who Shoreline, \ died peacefully in his sleep. Not screaming like Washington, USA \ all of the passengers in his car http://shorewall.net \________________________________________________ ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
> I don''t know. I guess that you will have to experiment and see. Please > let us know what you learn, so I can update the Complex TC doc with your > findings. >I think I figured it out - the tricky part was to determine the excesses and the additional delay I have to split between the two remaining classes - 16 & 17. So, here is what I think the final figures should be: a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for outgoing traffic a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax 60ms, umax 1500b - VOIP traffic routed outside ("boosted" to 200kbits) a:13:15, priority 3, guaranteed speed 80kbit to full, dmax 100ms, umax 1500b - VPN 1 traffic going out ("boosted" to 120kbits) a:13:16, priority 4, guaranteed speed 80kbit to full, dmax 224ms, umax 1500b - VPN 2 traffic going out a:13:17, priority 5, guaranteed speed 40kbit to full, dmax 374ms, umax 1500b - other unclassified traffic going out N.B. I had to revise up my initial dmax value for class 14 from 50ms to 60ms as I don''t think I could have the total speed of all *boosted* subclasses (14 & 15 in my case) to exceed the speed of the parent class (class 13). My calculations are as follows (strap yourselves in!): 1. Nominal delays (delay at which classes transmit packets at nominal speed): 14 - 100ms -> (8*1,500)/120,000 15 - 150ms -> (8*1,500)/80,000 16 - 150ms -> (8*1,500)/80,000 17 - 300ms -> (8*1,500)/40,000 2. Calculating dmax and excess bits values for class 14: 2.1 The "boost" speed for class 14 is 200kbits, which translates to 60ms -> (8*1,500)/200,000. Therefore dmax for class 14 = 60ms. 2.2 The total speed of all subclasses when class 14 has "boost" speed is 200,000 + 80,000 + 80,000 + 40,000 = 400kbits 2.3 The total amount of bits transmitted at that speed for 60ms is 24,000 (400,000*60ms) 2.4 The total amount of bits transmitted at nominal speed by all classes for 60ms is 19,200 (320,000*60ms) 2.5 Therefore the amount of excess bits for class 14 is 4,800 bits (24,000 - 19,200) 3. Calculating dmax and excess bits values for class 15: 3.1 The "boost" speed for class 15 is 120kbits, which translates to 100ms. Therefore dmax for class 15 = 100ms 3.2 The total speed of all subclasses when class 15 has "boost" speed is 120,000 + 120,000 + 80,000 + 40,000 = 360kbits 3.3 The total amount of bits transmitted at that speed for 100ms is 36,000 (360,000*100ms) 3.4 The total amount of bits transmitted at nominal speed by all classes for 100ms is 32,000 (320,000*100ms) 3.5 Therefore the amount of excess bits for class 15 is 4,000 bits (36,000 - 32,000) 4. Determine dmax values for leaf classes 16 & 17, taking into account punitive excesses from classes 14 & 15. 4.1 The total number of excess bits is 8,800 (4,800 + 4,000). I have decided to split them up according to the nominal speed of classes 16 & 17 so that each class takes roughly equal measure (in terms of time in milliseconds) of the punishment. In other words, class 16 takes 2/3 of the total amount of excess bits (as it has twice the speed of class 17) with class 17 taking the rest. So, 2/3 of 8,800 ~ 5,867 bits, while class 17 takes the rest - 2,933 bits. 4.2 The amount of time required to transmit 5,867 bits at speed 80kbit/s (the nominal speed of class 16) is 73.33ms (5,867/80,000) - rounded up to 74ms. Therefore dmax value for class 16 is 224ms (150ms + 74ms). 4.3 The amount of time required to transmit 2,933 bits at speed 40kbit/s (the nominal speed of class 17) is 73.325ms (2,933/40,000) - rounded up to 74ms. Therefore dmax value for class 17 is 374ms (300ms + 74ms). Does the above makes sense? How do I actually "test" whether these calculations are correct? ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
> a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for > outgoing traffic > a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax > 60ms, umax 1500b - VOIP traffic routed outside ("boosted" to 200kbits) > a:13:15, priority 3, guaranteed speed 80kbit to full, dmax 100ms, umax > 1500b - VPN 1 traffic going out ("boosted" to 120kbits) > a:13:16, priority 4, guaranteed speed 80kbit to full, dmax 224ms, umax > 1500b - VPN 2 traffic going out > a:13:17, priority 5, guaranteed speed 40kbit to full, dmax 374ms, umax > 1500b - other unclassified traffic going outForgot to post this question: What happens when I only specify dmax and umax values on the boosted classes and leave out the rest? For example: a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for outgoing traffic a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax 60ms, umax 1500b - VOIP traffic routed outside ("boosted" to 200kbits) a:13:15, priority 3, guaranteed speed 80kbit to full, dmax 100ms, umax 1500b - VPN 1 traffic going out ("boosted" to 120kbits) a:13:16, priority 4, guaranteed speed 80kbit to full - VPN 2 traffic going out a:13:17, priority 5, guaranteed speed 40kbit to full - other unclassified traffic going out The reason I am asking is this: Further down in my "a:XX" set of statements I have another leaf-structured classes, which are only used for my internal subnets. They all have vastly superior speeds (1GBit/s+), so I did boost the speed of the traffic I wanted "boosted", but left everything else without specifying dmax and umax as the excess delays calculated were absolutely miniscule (less than a millisecond!). ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
On 05/11/2011 07:46 AM, Mr Dash Four wrote:> >> I don''t know. I guess that you will have to experiment and see. Please >> let us know what you learn, so I can update the Complex TC doc with your >> findings. >> > I think I figured it out - the tricky part was to determine the excesses > and the additional delay I have to split between the two remaining > classes - 16 & 17. So, here is what I think the final figures should be: >...> Does the above makes sense? How do I actually "test" whether these > calculations are correct?The calculations make sense. I think my first attempt to test this would be to run parallel netperfs (one netperf per class) and look at the output of ''shorewall show tc''. That will show you what the actual speeds are. I don''t recall if netperf provides any latency data. -Tom -- Tom Eastep \ When I die, I want to go like my Grandfather who Shoreline, \ died peacefully in his sleep. Not screaming like Washington, USA \ all of the passengers in his car http://shorewall.net \________________________________________________ ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
On 05/11/2011 07:54 AM, Mr Dash Four wrote:> >> a:13, priority:2, guaranteed speed 320kbit to 320kbit - Main class for >> outgoing traffic >> a:13:14, priority 2, guaranteed speed 120kbit to full (40KB/s), dmax >> 60ms, umax 1500b - VOIP traffic routed outside ("boosted" to 200kbits) >> a:13:15, priority 3, guaranteed speed 80kbit to full, dmax 100ms, umax >> 1500b - VPN 1 traffic going out ("boosted" to 120kbits) >> a:13:16, priority 4, guaranteed speed 80kbit to full, dmax 224ms, umax >> 1500b - VPN 2 traffic going out >> a:13:17, priority 5, guaranteed speed 40kbit to full, dmax 374ms, umax >> 1500b - other unclassified traffic going out > Forgot to post this question: > > What happens when I only specify dmax and umax values on the boosted > classes and leave out the rest?I don''t know. -Tom -- Tom Eastep \ When I die, I want to go like my Grandfather who Shoreline, \ died peacefully in his sleep. Not screaming like Washington, USA \ all of the passengers in his car http://shorewall.net \________________________________________________ ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
> The calculations make sense. I think my first attempt to test this would > be to run parallel netperfs (one netperf per class) and look at the > output of ''shorewall show tc''. That will show you what the actual speeds > are. I don''t recall if netperf provides any latency data. >I haven''t used netperf before, but I guess there is always time to try it out. If it works I will then test a configuration where I have only dmax:umax specified on "boosted" classes and see what happens. ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
> The calculations make sense. I think my first attempt to test this would > be to run parallel netperfs (one netperf per class) and look at the > output of ''shorewall show tc''. That will show you what the actual speeds > are. I don''t recall if netperf provides any latency data. >Well, I did quite a bit of testing in the past couple of hours, but I am, quite frankly, unimpressed! I used netperf''s biggest brother - iperf - instead. The values of dmax and umax do not seem to have any effect on the net speed whatsoever - at least that is what the tests seem to indicate. The speed limits are very strictly observed, so is the priority of each class, but the results are roughly the same regardless of whether dmax:umax have been specified in tcclasses (see netspeed-tests.txt attached). I did run "shorewall show tc eth0" (as this was the device I was testing everything on) after each test, but there was no speed indication there - just the number of packets passed through each class, which the result of iperf shows anyway. ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
On 5/11/11 7:39 PM, Mr Dash Four wrote:> >> The calculations make sense. I think my first attempt to test this would >> be to run parallel netperfs (one netperf per class) and look at the >> output of ''shorewall show tc''. That will show you what the actual speeds >> are. I don''t recall if netperf provides any latency data. >> > Well, I did quite a bit of testing in the past couple of hours, but I > am, quite frankly, unimpressed! > > I used netperf''s biggest brother - iperf - instead.I consider iperf to be netperf''s little sister -Tom -- Tom Eastep \ When I die, I want to go like my Grandfather who Shoreline, \ died peacefully in his sleep. Not screaming like Washington, USA \ all of the passengers in his car http://shorewall.net \________________________________________________ ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
On 5/11/11 7:39 PM, Mr Dash Four wrote:> I did run "shorewall show tc eth0" (as this was the device I was testing > everything on) after each testUh -- ''shorewall show tc eth0'' is a real-time command. You must execute the command while eth0 is under load. -Tom -- Tom Eastep \ When I die, I want to go like my Grandfather who Shoreline, \ died peacefully in his sleep. Not screaming like Washington, USA \ all of the passengers in his car http://shorewall.net \________________________________________________ ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
On 5/11/11 7:54 PM, Tom Eastep wrote:> On 5/11/11 7:39 PM, Mr Dash Four wrote: >> >>> The calculations make sense. I think my first attempt to test this would >>> be to run parallel netperfs (one netperf per class) and look at the >>> output of ''shorewall show tc''. That will show you what the actual speeds >>> are. I don''t recall if netperf provides any latency data. >>> >> Well, I did quite a bit of testing in the past couple of hours, but I >> am, quite frankly, unimpressed! >> >> I used netperf''s biggest brother - iperf - instead. > > I consider iperf to be netperf''s little sisterI recommended netperf because I already knew that iperf only reports throughput and not latency. When you vary TC parameters that only affect latency, you can''t expect throughput to change. I still don''t know if netperf reports latency information. I expected you to try it and let us know. -Tom -- Tom Eastep \ When I die, I want to go like my Grandfather who Shoreline, \ died peacefully in his sleep. Not screaming like Washington, USA \ all of the passengers in his car http://shorewall.net \________________________________________________ ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
> I still don''t know if netperf reports latency information. I expected > you to try it and let us know. >From the NetPerf web page: Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirecitonal throughput, and *end-to-end latency*. The environments currently measureable by netperf include: * TCP and UDP via BSD Sockets for both IPv4 and IPv6 * DLPI * Unix Domain Sockets * SCTP for both IPv4 and IPv6 So, latency could be tested. According to the man page the tests available are as follows: TCP_STREAM TCP_SENDFILE TCP_MAERTS TCP_RR TCP_CRR UDP_STREAM UDP_RR DLCO_STREAM DLCO_RR DLCL_STREAM DLCL_RR STREAM_STREAM STREAM_RR TCPIPV6_STREAM TCPIPV6_RR TCPIPV6_CRR UDPIPV6_STREAM UDPIPV6_RR DG_STREAM DG_RR LOC_CPU REM_CPU Of course I have absolutely no idea what all of that means! ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
> So, latency could be tested. According to the man page the tests > available are as follows: > > TCP_RR > TCP_CRR > UDP_RR > TCPIPV6_RR > TCPIPV6_CRR > UDPIPV6_RRThese are the tests I ran - I had to recompile the whole package to include the"histogram" option (which gives pretty good idea of the latencies involved), though I am not sure whether I could believe these - I am getting some really low latencies - about 6-10ms even when I constrained the bandwidth and put the entire subnet under load. I also used uperf (where I could customise every aspect of the tests involved), but got very similar results. One very annoying feature of netperf is that I cannot specify the source port for the data tests - only for the control connection, which is of no use to me. Without this, it is very difficult to "shove" the traffic generated from these tests into the classes I defined. uperf in this respect is much more flexible. ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay
>> I did run "shorewall show tc eth0" (as this was the device I was testing >> everything on) after each test >> > > Uh -- ''shorewall show tc eth0'' is a real-time command. You must execute > the command while eth0 is under load. >Unless I am missing something obvious, it doesn''t matter whether I run this command in real-time or not as it won''t give any indication of the latencies involved - this command gives me, in various forms, the number of packets/bytes passed through the chain of classes, nothing more. ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay