I''m interested in all of - opinions about why this is a good or bad idea - pointers to similar proposals or products that already exist - implementation suggestions This is meant for real time applications that have small available bandwidth and so they have to consider carefully what''s the best way to use that bandwidth. I imagine that things happen that cause them to continually reevaluate what''s the most important/urgent thing to send next. I want to make it possible for them to delay the choice until the OS is actually ready to send that next packet. The reason they can''t do this now is that the OS enqueues packets. Suppose an application uses udp or tcp to tell the OS to send some data. It then discovers that data is obsolete. The old data might still be in the queue to be sent but it''s too late to recall it. One way to avoid that is to always delay telling the OS to send something until the OS is almost ready to send the next packet from the queue that your data will enter. But that''s not so easy to do, and there''s a big penalty if you wait just a little too long. What I want, at least conceptually, is that the application maintains its own queue of data to be sent, ordered by priority. Whenever the OS is ready to send the next packet for that application, it removes the highest priority packet (if any) from the queue and sends it.
On Saturday 09 July 2005 05:55, Don Cohen wrote:> What I want, at least conceptually, is that the application maintains its > own queue of data to be sent, ordered by priority. Whenever the OS is > ready to send the next packet for that application, it removes the > highest priority packet (if any) from the queue and sends it.Doesn''t every QDisc work that way? When the kernel wants to send a packet, it calls the appropriate dequeue() function in the QDisc. I''m not a kernel developer so this guess might be wrong. But still, I don''t think that the queueing is the main problem with your idea... the main problem is, how do you decide what''s important and what not, and what''s obsolete? Andreas
On Fri, Jul 08, 2005 at 08:55:08PM -0700, Don Cohen wrote:> > I''m interested in all of > - opinions about why this is a good or bad idea > - pointers to similar proposals or products that already exist > - implementation suggestions> This is meant for real time applications that have small available > bandwidth and so they have to consider carefully what''s the best way > to use that bandwidth. I imagine that things happen that cause them > to continually reevaluate what''s the most important/urgent thing to > send next. I want to make it possible for them to delay the choice > until the OS is actually ready to send that next packet. The reason > they can''t do this now is that the OS enqueues packets. Suppose an > application uses udp or tcp to tell the OS to send some data. It then > discovers that data is obsolete. The old data might still be in the > queue to be sent but it''s too late to recall it. One way to avoid > that is to always delay telling the OS to send something until the OS > is almost ready to send the next packet from the queue that your data > will enter. But that''s not so easy to do, and there''s a big penalty > if you wait just a little too long. What I want, at least > conceptually, is that the application maintains its own queue of data > to be sent, ordered by priority. Whenever the OS is ready to send the > next packet for that application, it removes the highest priority > packet (if any) from the queue and sends it.I believe the general solution to this is to use UDP, and make sure your source machine doesn''t queue up packets locally (eg. ethernet network contention) and let the best-effort nature of UDP deal with dropping stuff that gets delayed. I''m not sure there''s any way to have an ''I changed my mind about sending that'' interface into your network stack... And generally it wouldn''t be useful, data spends longer in transit than it does in your queues. -- Paul "TBBle" Hampson, on an alternate email client.
> From: Andreas Klauer <Andreas.Klauer@metamorpher.de>> Doesn''t every QDisc work that way? When the kernel wants to send a packet, > it calls the appropriate dequeue() function in the QDisc. I''m not a kernel > developer so this guess might be wrong. That''s correct, but this operation takes a packet from an OS queue and the only control the application has over that queue is to put something into it. One way to view the idea is that I want to make it convenient for the application to decide what to put into the queue at the latest possible time without losing any of its available bandwidth. Think in terms of an OS callback to the application saying "I''m ready to send your data now, what should I send?" > But still, I don''t think that the queueing is the main problem with your > idea... the main problem is, how do you decide what''s important and what > not, and what''s obsolete? This is up to the application of course. See below. > From: Paul.Hampson@PObox.com (Paul Hampson) > I believe the general solution to this is to use UDP, and make sure The scheme I describe wouldn''t make a lot of sense for tcp, which after all specifies congestion control, retransmission, etc. But UDP still goes through the queuing that I want to optimize. > your source machine doesn''t queue up packets locally (eg. ethernet > network contention) and let the best-effort nature of UDP deal with > dropping stuff that gets delayed. The problem is that the OS is not helpful in avoiding queuing up packets locally. That''s part of what I''m trying to fix. For instance, a relatively cheap approximation would be to give the application a way to see how many packets it has in the queue. Then it could at least delay its decision about what to put into the queue until the queue was short. Even better would be to see an estimate of how long it will be before the next packet it enqueues will be sent - like "your call will be answered in approximately 4 minutes". > I''m not sure there''s any way to have an ''I changed my mind about > sending that'' interface into your network stack... And generally > it wouldn''t be useful, data spends longer in transit than it does > in your queues. That depends on the rate at which the queue is emptied. If your queue has a rate limit of 10bps then your packets can spend a long time in the queue. - There are slow links (For instance, I recall hearing that submarines have very low rates.) - The application might be allocated a small part of the bandwidth shared with other applications. It occurs to me that an example where this would be helpful is transmitting voice data over a low bandwidth link (like a cell phone). Suppose you know that the actual transit time is .1 sec and you want the listener to always hear what the speaker was saying .2 sec ago at the best possible quality. Suppose the available bandwidth is shared with other applications. The voice application doesn''t know when they will want to send or how urgent their data might be. Someone else decides that. It just wants to send the best possible data in the bandwidth allocated to it. I imagine is continually sampling the input and revising what it considers to be the most valuable unsent data for the last .1 sec. Whenever the OS decides it''s time to send the next voice packet I want it to send the latest idea of what''s most valuable. I don''t want to have to put data into the queue to wait for times that might depend on what urgent communication might be required by other applications.
On Sat, Jul 09, 2005 at 08:25:39AM -0700, Don Cohen wrote:> > From: Paul.Hampson@PObox.com (Paul Hampson) > > I believe the general solution to this is to use UDP, and make sure > The scheme I describe wouldn''t make a lot of sense for tcp, which > after all specifies congestion control, retransmission, etc. > But UDP still goes through the queuing that I want to optimize.> > your source machine doesn''t queue up packets locally (eg. ethernet > > network contention) and let the best-effort nature of UDP deal with > > dropping stuff that gets delayed. > The problem is that the OS is not helpful in avoiding queuing up > packets locally. That''s part of what I''m trying to fix. > For instance, a relatively cheap approximation would be to give > the application a way to see how many packets it has in the queue. > Then it could at least delay its decision about what to put into > the queue until the queue was short. Even better would be to > see an estimate of how long it will be before the next packet it > enqueues will be sent - like "your call will be answered in > approximately 4 minutes".> > I''m not sure there''s any way to have an ''I changed my mind about > > sending that'' interface into your network stack... And generally > > it wouldn''t be useful, data spends longer in transit than it does > > in your queues. > That depends on the rate at which the queue is emptied. > If your queue has a rate limit of 10bps then your packets can spend > a long time in the queue. > - There are slow links > (For instance, I recall hearing that submarines have very low rates.) > - The application might be allocated a small part of the bandwidth > shared with other applications.Wait, you''re trying to send more data than the link can take? Then send UDP, throttle it at the local end with a drop-oldest qdisc. Then you get the effect of ''most recent data is best''. Anything more compilcated in terms of priority either needs a custom qdisc, or your application needs to not try and send more than the link can take.> It occurs to me that an example where this would be helpful is > transmitting voice data over a low bandwidth link (like a cell phone). > Suppose you know that the actual transit time is .1 sec and you want > the listener to always hear what the speaker was saying .2 sec ago at > the best possible quality.> Suppose the available bandwidth is shared with other applications. > The voice application doesn''t know when they will want to send or how > urgent their data might be. Someone else decides that. It just wants > to send the best possible data in the bandwidth allocated to it. I > imagine is continually sampling the input and revising what it > considers to be the most valuable unsent data for the last .1 sec. > Whenever the OS decides it''s time to send the next voice packet I want > it to send the latest idea of what''s most valuable. I don''t want to > have to put data into the queue to wait for times that might depend on > what urgent communication might be required by other applications.You gotta prioritise your data, using TOS or diffserv or something. Set your voice to real-time, so it always gets sent, and the your other applications can use unused packet-times. Use a dropping qdisc for traffic where ''most-recent'' is more important than ''all, in order'' as described above, and you''re set. I have a vauge recollection that this sort of thing is discussed in Tannenbaum''s Computer Networks textbook, to do with positional data of satellites or something. (eg. if the positional data is delayed, we write it off, we don''t want to delay the data about where we are _now_ in order to know where we were _then_) -- Paul "TBBle" Hampson, on an alternate email client.
>Wait, you''re trying to send more data than the link can take? Then >send UDP, throttle it at the local end with a drop-oldest qdisc. Then >you get the effect of ''most recent data is best''. Anything more >compilcated in terms of priority either needs a custom qdisc, or your >application needs to not try and send more than the link can take. > >The situation described is real and complex. For example I run an email service which caters for people using satellite phones (1,200 baud on a good day), but the whole point is that they don''t need to change any settings when they jump on a 10Mbit leased line connection... This is a total pain to optimise. Ideally I would like an API to be able to limit the congestion window on the local machine for a particular connection (which I don''t think exists on either windows or linux?). This way the OS will report that the queue is full quickly to the local program without buffering up a ton of data. The issue in my case is that you have two simultaneous streams in transit for email, one to receive new mail and one to send mail out. In the case of the sat phone it''s possible to have net buffers which are 20 secs or so long and so when you send out a status message to say "email received successfully, send me the next one", it can end up queued behind a bunch of lower priority data for a VERY long time. Often these buffers are on the remote ISP end where you have very little control. This is a serious slowdown on a link which is costing you $1.50/min. My main focus has been adjusting the protocol to be less interactive, but it would be nice to have more operating system support for these fringe cases Ed W
On Sun, Jul 10, 2005 at 08:49:13AM +0100, Ed W wrote:> >Wait, you''re trying to send more data than the link can take? Then > >send UDP, throttle it at the local end with a drop-oldest qdisc. Then > >you get the effect of ''most recent data is best''. Anything more > >compilcated in terms of priority either needs a custom qdisc, or your > >application needs to not try and send more than the link can take.> The situation described is real and complex. For example I run an email > service which caters for people using satellite phones (1,200 baud on a > good day), but the whole point is that they don''t need to change any > settings when they jump on a 10Mbit leased line connection...Ah, I was picturing voice over a low-latency, low-speed link. Now I can understand what you''re trying to acheieve. Is that 1200 baud each way? Or do you have to alternate up and down somehow?> This is a total pain to optimise. Ideally I would like an API to be > able to limit the congestion window on the local machine for a > particular connection (which I don''t think exists on either windows or > linux?). This way the OS will report that the queue is full quickly to > the local program without buffering up a ton of data.Indeed. For TCP, you could use setsockopt with SO_SNDBUF maybe? However, I''m not sure this is what you want.> The issue in my case is that you have two simultaneous streams in > transit for email, one to receive new mail and one to send mail out. In > the case of the sat phone it''s possible to have net buffers which are 20 > secs or so long and so when you send out a status message to say "email > received successfully, send me the next one", it can end up queued > behind a bunch of lower priority data for a VERY long time. Often these > buffers are on the remote ISP end where you have very little control. > This is a serious slowdown on a link which is costing you $1.50/min.Assuming you can send both ways simultaneously, or at least guarantee some traffic time outbound no matter how busy the inbound traffic, you would surely have to pipeline your commands in order to get any kind of efficient use out of a high-latency link like a satellite link. Otherwise you lose 2x round-trip-time of incoming data stream while you request the next item. In this situation, you would want the OS buffers to be as full as possible so the minimal time possible is spent receiving, but using a traffic-shaping solution so that the most important stuff (acks and commands) goes out in preference to whatever else you''re sending. eg. If you''re doing POP3 and SMTP, you make sure any to-tcp-110 or tcp-ack-only packet is dequeued before any from-tcp-25 packets. You''d also need to jack the receive window right up, or wait for TCP to figure that out for itself.> My main focus has been adjusting the protocol to be less interactive, > but it would be nice to have more operating system support for these > fringe casesThis is actually a common case, and often cited as a great big hole in TCP/IP''s traffic algorithms. I know, it was a question on the exam. ^_^ -- Paul "TBBle" Hampson, on an alternate email client.
Ed W wrote:> > This is a total pain to optimise. Ideally I would like an API to be > able to limit the congestion window on the local machine for a > particular connection (which I don''t think exists on either windows or > linux?).It looks like you could do it per route in the past - don''t know about now. http://www.linux-ip.net/gl/ip-cref/node77.html Andy.
> From: Paul.Hampson@PObox.com (Paul Hampson)> Wait, you''re trying to send more data than the link can take? Then No, of course I don''t expect to send more than the limit. > send UDP, throttle it at the local end with a drop-oldest qdisc. Then > you get the effect of ''most recent data is best''. Anything more Yes, that gives me "most recent is best" but that does not do what I want except in a few weird cases. If every packet is independent, perhaps it would suffice to always send the newest, e.g., if I were trying to tell the other side what''s the latest clock time. (In that case I''d also limit the queue length to one.) > You gotta prioritise your data, using TOS or diffserv or something. > Set your voice to real-time, so it always gets sent, and the your > other applications can use unused packet-times. Use a dropping qdisc This may be the best I can do in the current world where the facility I described does not exist. It does not solve the problem I described. TOS/diffserv etc is more for use by the intervening infrastructure and this problem applies even in the case where there is no congestion or delay at all in that infrastructure, but only in the link from the sending machine. Using "real time" is just a matter of giving one application priority over others. First, the link itself may have varying bandwidth, and second the other applications might also have urgent data to send. Dropping packets can be disastrous if they happen to contain critical data that is not duplicated in other packets. At very least I have to be able to find out which ones were dropped. But better than all of that is the ability to decide what to send at the last moment. > I have a vauge recollection that this sort of thing is discussed in > Tannenbaum''s Computer Networks textbook, to do with positional data > of satellites or something. (eg. if the positional data is delayed, > we write it off, we don''t want to delay the data about where we are > _now_ in order to know where we were _then_) If the goal is to listen to the sound from .2 sec ago and it takes .1 sec to get there then clearly it''s a waste of time to send data that''s older than .1 sec. But the packet in the queue might have some data that''s older and some that''s newer. I can''t drop part of it. Instead I''d like to know that the packet is about to be sent now, and respond by finding the best data to send now. > From: Ed W <lists@wildgooses.com> > This is a total pain to optimise. Ideally I would like an API to be > able to limit the congestion window on the local machine for a > particular connection (which I don''t think exists on either windows or > linux?). This way the OS will report that the queue is full quickly to > the local program without buffering up a ton of data. > > The issue in my case is that you have two simultaneous streams in > transit for email, one to receive new mail and one to send mail out. In > the case of the sat phone it''s possible to have net buffers which are 20 > secs or so long and so when you send out a status message to say "email > received successfully, send me the next one", it can end up queued > behind a bunch of lower priority data for a VERY long time. Often these > buffers are on the remote ISP end where you have very little control. > This is a serious slowdown on a link which is costing you $1.50/min. I''m not sure I follow the problem, but if you''re saying that one stream should have priority over the other, it seems you could do that with two different queues, one with priority over the other. Or something like sfq could at least prevent one connection from waiting for the other to send a lot of data.
>Assuming you can send both ways simultaneously, or at least guarantee >some traffic time outbound no matter how busy the inbound traffic, >you would surely have to pipeline your commands in order to get any >kind of efficient use out of a high-latency link like a satellite link. >Otherwise you lose 2x round-trip-time of incoming data stream while >you request the next item. > >In this situation, you would want the OS buffers to be as full as >possible so the minimal time possible is spent receiving, but using >a traffic-shaping solution so that the most important stuff (acks >and commands) goes out in preference to whatever else you''re sending. > >Yes you do want to pipeline, but you still don''t want the OS buffers full as possible. Consider that you might want to know a message was sent successfully before sending the next message, but at the same time you have the pipe full with downloading new messages. The OK which says the message was sent OK can be behind 15-20 seconds worth of downloads - hence you have to wait a long time before you can start sending the next message! Also you can''t use any kind of QOS here because the hypothetical 15-20 second buffer is at the remote ISP end. (Who are not cooperative) It''s a tricky situation all you can do is figure out how to keep changing your protocol so that you don''t ever need to hear a reply before you continue sending. Anyone wants to buy it then drop me a line! :-) Ed W
>I''m not sure I follow the problem, but if you''re saying that one >stream should have priority over the other, it seems you could do >that with two different queues, one with priority over the other. >Or something like sfq could at least prevent one connection from >waiting for the other to send a lot of data. > >You could if you have control over the queues. But they are on the remote ISP end... So the problem is similar to the one you describe - once the data is inflight you lose control but you want to limit how much data is inflight so that you have as much control as possible... Ed W