Håkon,
At a guess, your socket callbacks and socket buffer metrics are not
doing the same thing as the as the linux TCP/IP stack. We only ever
do non-blocking I/O on our sockets and we replace the write_space and
data_ready callbacks with our own.
On the incoming side, we simply expect the data_ready callback to
occur any time new data is available, so unless you rely on data_ready
being your own callback, that''s probably OK.
However on the outgoing side, we are a bit more demanding. Firstly,
when the outgoing socket buffer gets full, we need the SOCK_NOSPACE flag
to get set. We''ve seen that some stacks (eg. some vendor''s
SDP
implementations) don''t do this for non-blocking writes. Then we expect
the write_space callback to occur whenever some space in the outgoing
socket buffer gets freed up.
Both outgoing message flow control and dead peer detection relies on
sk->sndbuf and sk->wmem_queued being reasonably accurate. We only resume
output (and clear SOCK_NOSPACE) when wmem_queued compared with sndbuf shows
that significant space has been freed up in the socket''s output buffer.
Also, we look at wmem_queued to determine whether anything got ACKed by
our peer, so if that isn''t being updated promptly, we could time out
and
believe that our peer has died.
Hope that helps.
Cheers,
Eric
---------------------------------------------------
|Eric Barton Barton Software |
|9 York Gardens Tel: +44 (117) 330 1575 |
|Clifton Mobile: +44 (7909) 680 356 |
|Bristol BS8 4LL Fax: call first |
|United Kingdom E-Mail: eeb@bartonsoftware.com|
---------------------------------------------------
> -----Original Message-----
> From: lustre-devel-admin@lists.clusterfs.com
> [mailto:lustre-devel-admin@lists.clusterfs.com]On Behalf Of
> Håkon Kvale
> Stensland
> Sent: Wednesday, July 14, 2004 8:44 AM
> To: lustre-devel@lists.clusterfs.com
> Subject: [lustre-devel] Fw: Lustre over sci_socket
>
>
> Hello, We are trying to test the performance of Lustre over the AF_SCI
> socket address family implementation over SCI. This is working without
> any problems on other distributed file systems (like PVFS) and gives good
> results. What we do is to register a socket family (see sock_register
> kernel function). So we provide the generic protocol calls (release,
> bind, connect, getsockopt, sendmsg , recvmsg etc.) In order to make this
> work we replaced the family parameter in the ksocknal_cb.c (AF_INET to
> AF_SCI). In userspace this is handled by a preload wrapper for socket
> call. So everything looked good but it did not work.
>
> We suspect you are using some extra flexibility which prevents us from
> running. We see some dealing with the device irq and registering callback
> functions so we would like to get some details on that.
>
> We observe decent behaviour (three connections are established and some
> data is sent successfuly - see wraapper_log_meta.txt and
> ethernet_log_meta.txt) upon initialization until both the client (meta
> server) and server (storage server) stop any socket activity (there are
> no pending calls), and after some time (1 min.) the client exits with
> timeout error. We saw there were some extra hooks added to the patched
> kernel (tcp_recvpackets, tcp_sendpage_zccd), but grep-ing did not show us
> where they were used. Can you give us a suggestion what could be wrong?
> We have an empty wrapper socket family which only calls the AF_INET
> protocol functions and prints input and output. We usually use this to
> find out what socket calls are done, but even with it we are having the
> same result (see wrapper_log_meta.txt). It is obvious that the first
> "RPC" times out and we have no clue what is it waiting for.
>
> Regards,
>
> Dolphin Team
>
> =============================================================> Håkon
Kvale Stensland |
> Dolphin Interconnect Solutions AS | E-mail:
> P.O. Box 150 Oppsal | hks@dolphinics.com
> N-0619 Oslo, Norway | Web:
> Tel:+47 23 16 70 42 | http://www.dolphinics.com
> Fax:+47 23 16 71 80 |
> Visiting Address: Olaf Helsets vei 6 |