Chris Rapier <rapier at psc.edu> wrote on Thu, 27 Jul 2023
at 18:04:47 EDT in <CA+8-fDLPnL7vSWn1aHB=SZDMVAqix_c9RY8+zmC0=Y8EHr_U2A at
mail.gmail.com>:
> Also, keep in mind that point uses ICMP which often uses a slow path
> through network hardware. So a ping implemented in SSH will likely be
This claim about ICMP and slow paths is often-bandied about but is misleading
and not really true in a way that is relevant.
But I want to take a moment to address it here, even if it's a bit
off-topic.
Although one can always find some example of some device that behaves
differently, for the most part, it only *generation* (not transiting or routing)
of ICMP that may use a slow path. So that means if host A pings host B across 8
intermediate hops, the ping measures the fast-path network latency of the 8 hops
plus the slow path latency of host B in responding. This is not generally a big
deal.
A related source of confusion is a traceroute, where again it is the GENERATION
of the response (an ICMP time exceeded message in response to what is usually a
UDP probe packet, although not always) that is sometimes in the slow path
(again, sometimes).
In a case where the latter is in the fast path (sometimes aka not sometimes) and
the former is in the slow path, it means that if you run an ICMP ping from host
A to the 7th hop between A and B, you'll see longer round-trip-times than
the traceroute will show for the 7th hop from host A to host B (which is the
time to transmit a UDP packet from host A towards host B with a time-to-live of
7 hops followed by the time for the 7th hop to generate the ICMP time exceeded
message and send it back over the network).
(And a related source of confusion still is where the routing is asymmetric such
that the path from A to B traverses intermediate nodes
A->I1->I2->I3->I4->I5->I6->I7->I8->B but the path
from A to I4 traverses a different path, say
A->I1->I2->J1->J2->I4 such that a ping from A to B and a ping
from A to I4 do not take the same path, so measuring round trips in the network
may not be the sum of the apparent successive round trips to the nodes along the
path.)
None of this is relevant when measuring the latency at the ssh layer, which has
no meaningful "gotchas" with the network hardware when compared to
measuring the latency of actual data transmitted over ssh, since the whole
question of "fast paths" is about network nodes that don't look
inside ssh packets. (Although we don't have an ssh timing infrastructure to
give misleading results anyhow.)
Moving back towards the original question, at least a little bit, it is probably
the case that Roland (the OP) does not want to measure the startup overhead of
an SSH connection, so notional "time echo test | ssh user at host"
solutions are unhelpful and actively misleading. The measurement needs to be
done after the SSH connection is set up. Also, because of how TCP works,
it's probably wise to think about how much data is being transferred. The
timing for an RPC that transfers 100-byte query/responses will come out quite
differently than one that transfers 100-megabyte query/responses (or asymmetric
variants thereof), even when averaged over time, because of slow start and
congestion windows and similar issues (and also because path/packet loss
characterics may vary with packet size). So any measurement framework needs to
keep that in mind and understand what the data size requirements are for
whatever the test is simulating.
--
jhawk at alum.mit.edu
John Hawkinson