thr3ads.net - openssh unix dev - Packets Sizes and Information Leakage [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Mansour Moufid

2011-Jan-26 21:15 UTC

Packets Sizes and Information Leakage

This message is a few years old so I cannot reply to the original, but
it is still of current research interest.
> So one of my coworkers is doing a little research on SSH usage in the
> wild using netflow data. One of the things he's trying to do is
> determine a way to differentiate between data transfers and interactive
> sessions. We thought of a couple of ways but we wanted to float them
> here and see if there are methods incorporated to defeat thi sort of
> traffic analysis.
>
> The first idea is to look at the average number of packets per second
> over the length of the flow. The idea is that a data transfer would have
> a significantly higher number of PPS than an interactive session. If
> we analyze few thousand ssh flows and build a histogram we expect to see
> two (or maybe 3 peaks) corresponding to various connection types. I
> think this probably has the best chance of statistically significant
> results.
Inter-packet timings is another measure. A useful (and efficient) way
to distinguish between interactive sessions and bulk transfers would
be a power spectral density function (e.g. maximum entropy
periodogram). The psd of a bulk transfer would be significantly skewed
toward the higher frequencies. People can only type so fast so that
provides a convenient upper bound on frequencies to consider
representative of interactive vs bulk transfer sessions.
> The second method would be to look at the packet size. The idea being
> that interactive packets would end up being significantly smaller than
> full size data packets. I know that some padding is used to protect
> against plaintext attacks according to the RFC but I didn't know if
> there was any additional padding on top of that to protect against
> traffic analysis. Are interactive packets coalesced or padded to the
> known MTU? I'm going to run some tcpdumps but I wanted to ask here as
well.
If I understand correctly, the padding in SSH packets is not meant for
this type of (flow-based) traffic analysis [1]:

         Arbitrary-length padding, such that the total length of
         (packet_length || padding_length || payload || random padding)
         is a multiple of the cipher block size or 8, whichever is
         larger.  There MUST be at least four bytes of padding.  The
         padding SHOULD consist of random bytes.  The maximum amount of
         padding is 255 bytes.

So the random padding length is always set to its maximum value for
each packet (as opposed to being a random value between minimum and
maximum).
> The other method would be to use packet arrival times but we only have
> flow data and putting a packet sniffer on 10G link is prohibitively
> expensive for work like this.
>
> Please note: If there aren't any countermeasures for this type of
> traffic analysis I'm not saying that is a problem at all. Knowing a
flow
> is interactive versus a bulk data transfer really doesn't help out an
> attacker all that much. I'm just curious at this time and my coworker
> needs the data for a presentation to a center director here.
A real problem is that the type of traffic analysis developed for
multi-hop stream encryption protocols (e.g. Tor) becomes trivial and
very efficient when applied to OpenSSH streams.

[1] <http://tools.ietf.org/html/rfc4253#page-7>

Howard Chu

2011-Jan-26 22:09 UTC

head link

Packets Sizes and Information Leakage

Mansour Moufid wrote:> This message is a few years old so I cannot reply to the original, but
> it is still of current research interest.
>
>> So one of my coworkers is doing a little research on SSH usage in the
>> wild using netflow data. One of the things he's trying to do is
>> determine a way to differentiate between data transfers and interactive
>> sessions. We thought of a couple of ways but we wanted to float them
>> here and see if there are methods incorporated to defeat thi sort of
>> traffic analysis.
>>
>> The first idea is to look at the average number of packets per second
>> over the length of the flow. The idea is that a data transfer would
have
>> a significantly higher number of PPS than an interactive session. If
>> we analyze few thousand ssh flows and build a histogram we expect to
see
>> two (or maybe 3 peaks) corresponding to various connection types. I
>> think this probably has the best chance of statistically significant
>> results.
>
> Inter-packet timings is another measure. A useful (and efficient) way
> to distinguish between interactive sessions and bulk transfers would
> be a power spectral density function (e.g. maximum entropy
> periodogram). The psd of a bulk transfer would be significantly skewed
> toward the higher frequencies. People can only type so fast so that
> provides a convenient upper bound on frequencies to consider
> representative of interactive vs bulk transfer sessions.
With my Linemode patch the packet frequencies would drop even lower, since 
character echoing and line editing would all be local to the client. But at 
that point you would have to decide whether this is an actual interactive 
session, or simply an automated session with periodic refresh updates.
>> The second method would be to look at the packet size. The idea being
>> that interactive packets would end up being significantly smaller than
>> full size data packets. I know that some padding is used to protect
>> against plaintext attacks according to the RFC but I didn't know if
>> there was any additional padding on top of that to protect against
>> traffic analysis. Are interactive packets coalesced or padded to the
>> known MTU? I'm going to run some tcpdumps but I wanted to ask here
as well.
>
> If I understand correctly, the padding in SSH packets is not meant for
> this type of (flow-based) traffic analysis [1]:
>
>           Arbitrary-length padding, such that the total length of
>           (packet_length || padding_length || payload || random padding)
>           is a multiple of the cipher block size or 8, whichever is
>           larger.  There MUST be at least four bytes of padding.  The
>           padding SHOULD consist of random bytes.  The maximum amount of
>           padding is 255 bytes.
>
> So the random padding length is always set to its maximum value for
> each packet (as opposed to being a random value between minimum and
> maximum).
>
>> The other method would be to use packet arrival times but we only have
>> flow data and putting a packet sniffer on 10G link is prohibitively
>> expensive for work like this.
>>
>> Please note: If there aren't any countermeasures for this type of
>> traffic analysis I'm not saying that is a problem at all. Knowing a
flow
>> is interactive versus a bulk data transfer really doesn't help out
an
>> attacker all that much. I'm just curious at this time and my
coworker
>> needs the data for a presentation to a center director here.
>
> A real problem is that the type of traffic analysis developed for
> multi-hop stream encryption protocols (e.g. Tor) becomes trivial and
> very efficient when applied to OpenSSH streams.
>
> [1]<http://tools.ietf.org/html/rfc4253#page-7>
> _______________________________________________
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Apparently Analagous Threads

Search for more apparently analagous threads

openssh unix dev - Jan 2011 - Packets Sizes and Information Leakage

Packets Sizes and Information Leakage

Packets Sizes and Information Leakage

Apparently Analagous Threads