thr3ads.net - Lustre devel - [Lustre-devel] Query to understand the Lustre request/reply message [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Vilobh Meshram

2010-Oct-12 17:55 UTC

[Lustre-devel] Query to understand the Lustre request/reply message

I want to understand the message encoding and decoding logic in lustre.I am
planning to send a request to the MDS and based on the reply from MDs want
to populate the

   struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg
*/
   struct lustre_msg *rq_repmsg;

I am trying this for a simple "Hello" message but not seeing the
expected
output.Sometime I even see Kernel Crash.
If you can please give me some insight on the way the Lustre File system
encodes decodes the messages sent accross nodes it would be helpful.

Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101012/1769fe4d/attachment.html

Alexey Lyashkov

2010-Oct-12 18:21 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Vilobh,

ldlm_cli_cancel_req is good example to use old PtlRPC API.
for first you need allocate request buffer via ptlrpc_prep_req
next is - allocate reply buffer via ptlrpc_req_set_repsize
next - call ptlrpc_queue_wait to send message and wait reply.

osc_getattr_async good example for new PtlRPC API and async RPC processing.

if that isn''t help you - please show a yours code to find a error.

On Oct 12, 2010, at 20:55, Vilobh Meshram wrote:
> I want to understand the message encoding and decoding logic in lustre.I am
planning to send a request to the MDS and based on the reply from MDs want to
populate the
> 
>    struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg
*/
>    struct lustre_msg *rq_repmsg;
> 
> I am trying this for a simple "Hello" message but not seeing the
expected output.Sometime I even see Kernel Crash.
> If you can please give me some insight on the way the Lustre File system
encodes decodes the messages sent accross nodes it would be helpful.
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101012/299c21a8/attachment.html

Vilobh Meshram

2010-Oct-12 22:17 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Thanks Alexey.It was helpful.

I have one more question :-

If we want to add a new RPC with a new opcode are there any guidlines to be
followed in the Lustre File System.

Also ,
1)How does the MDS process the ptlrpc_request i.e how does the MDS extract
the buffer information from the ptlrpc_message.
2)For every new RPC is the message length which is to be sent on wire (
including the fixed header size + the buffer size) dependent on the number
of buffers in the lustre request message i.e the count field in the
ptlrpc_prep_req() or the size of the size[] array.


Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*


On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
> Hi Vilobh,
>
> ldlm_cli_cancel_req is good example to use old PtlRPC API.
> for first you need allocate request buffer via ptlrpc_prep_req
> next is - allocate reply buffer via ptlrpc_req_set_repsize
> next - call ptlrpc_queue_wait to send message and wait reply.
>
> osc_getattr_async good example for new PtlRPC API and async RPC processing.
>
> if that isn''t help you - please show a yours code to find a error.
>
> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote:
>
> I want to understand the message encoding and decoding logic in lustre.I am
> planning to send a request to the MDS and based on the reply from MDs want
> to populate the
>
>    struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg
> */
>    struct lustre_msg *rq_repmsg;
>
> I am trying this for a simple "Hello" message but not seeing the
expected
> output.Sometime I even see Kernel Crash.
> If you can please give me some insight on the way the Lustre File system
> encodes decodes the messages sent accross nodes it would be helpful.
>
> Thanks,
> Vilobh
> *Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio*
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101012/4ba0960f/attachment.html

Alexey Lyashkov

2010-Oct-13 03:46 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

That is depend of rpc type - is that RPC want to be return lock to caller or
not, and is that rpc want to have special code to reconstruct in replay phase.
in general you need look to mdt/mdt_handler.c. mdt_get_info is good example of
simple rpc processing - but it use new PtlRPC api.
that is API hide of low level request structures and provide api to access to
message buffer by identifier.
to use that API you need define structure of own message in ptlrpc/layout.c, and
own command in enum mds_cmd_t, adjust array with commands and write own handler.


On Oct 13, 2010, at 01:17, Vilobh Meshram wrote:> Thanks Alexey.It was helpful.
> 
> I have one more question :-
> 
> If we want to add a new RPC with a new opcode are there any guidlines to be
followed in the Lustre File System.
> 
> Also ,
> 1)How does the MDS process the ptlrpc_request i.e how does the MDS extract
the buffer information from the ptlrpc_message.
> 2)For every new RPC is the message length which is to be sent on wire (
including the fixed header size + the buffer size) dependent on the number of
buffers in the lustre request message i.e the count field in the
ptlrpc_prep_req() or the size of the size[] array.
> 
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> 
> 
> On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov <alexey.lyashkov at
clusterstor.com> wrote:
> Hi Vilobh,
> 
> ldlm_cli_cancel_req is good example to use old PtlRPC API.
> for first you need allocate request buffer via ptlrpc_prep_req
> next is - allocate reply buffer via ptlrpc_req_set_repsize
> next - call ptlrpc_queue_wait to send message and wait reply.
> 
> osc_getattr_async good example for new PtlRPC API and async RPC processing.
> 
> if that isn''t help you - please show a yours code to find a error.
> 
> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote:
> 
>> I want to understand the message encoding and decoding logic in
lustre.I am planning to send a request to the MDS and based on the reply from
MDs want to populate the
>> 
>>    struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than
msg */
>>    struct lustre_msg *rq_repmsg;
>> 
>> I am trying this for a simple "Hello" message but not seeing
the expected output.Sometime I even see Kernel Crash.
>> If you can please give me some insight on the way the Lustre File
system encodes decodes the messages sent accross nodes it would be helpful.
>> 
>> Thanks,
>> Vilobh
>> Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/e749ba6f/attachment.html

Vilobh Meshram

2010-Oct-13 04:06 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Thanks Alexey for the reply.Thanks a lot.

I will try out the steps mentioned by you and see if I can add a new RPC for
the task which I am thinking of to implement in Lustre.

The RPC of which I am thinking of will not return the lock to the caller.Yes
that rpc will have special code to reconstruct in replay phase.

Just a last question from which release of Lustre can we make use of the new
API.Is their any documentation which lists the use of the new API.If yes can
you please point me to that ?

Thanks again.

Thanks,
Vilobh
*Graduate Research Associate*
***Department of Computer Science*
***The Ohio State University Columbus Ohio*

On Tue, Oct 12, 2010 at 11:46 PM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
> That is depend of rpc type - is that RPC want to be return lock to caller
> or not, and is that rpc want to have special code to reconstruct in replay
> phase.
> in general you need look to mdt/mdt_handler.c. mdt_get_info is good example
> of simple rpc processing - but it use new PtlRPC api.
> that is API hide of low level request structures and provide api to access
> to message buffer by identifier.
> to use that API you need define structure of own message in
> ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with
> commands and write own handler.
>
>
> On Oct 13, 2010, at 01:17, Vilobh Meshram wrote:
>
> Thanks Alexey.It was helpful.
>
> I have one more question :-
>
> If we want to add a new RPC with a new opcode are there any guidlines to be
> followed in the Lustre File System.
>
> Also ,
> 1)How does the MDS process the ptlrpc_request i.e how does the MDS extract
> the buffer information from the ptlrpc_message.
> 2)For every new RPC is the message length which is to be sent on wire (
> including the fixed header size + the buffer size) dependent on the number
> of buffers in the lustre request message i.e the count field in the
> ptlrpc_prep_req() or the size of the size[] array.
>
>
> Thanks,
> Vilobh
> *Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio*
>
>
> On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov <
> alexey.lyashkov at clusterstor.com> wrote:
>
>> Hi Vilobh,
>>
>> ldlm_cli_cancel_req is good example to use old PtlRPC API.
>> for first you need allocate request buffer via ptlrpc_prep_req
>> next is - allocate reply buffer via ptlrpc_req_set_repsize
>> next - call ptlrpc_queue_wait to send message and wait reply.
>>
>> osc_getattr_async good example for new PtlRPC API and async RPC
>> processing.
>>
>> if that isn''t help you - please show a yours code to find a
error.
>>
>> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote:
>>
>> I want to understand the message encoding and decoding logic in
lustre.I
>> am planning to send a request to the MDS and based on the reply from
MDs
>> want to populate the
>>
>>    struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than
>> msg */
>>    struct lustre_msg *rq_repmsg;
>>
>> I am trying this for a simple "Hello" message but not seeing
the expected
>> output.Sometime I even see Kernel Crash.
>> If you can please give me some insight on the way the Lustre File
system
>> encodes decodes the messages sent accross nodes it would be helpful.
>>
>> Thanks,
>> Vilobh
>> *Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio*
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/45c860df/attachment-0001.html

Alexey Lyashkov

2010-Oct-13 04:20 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Oct 13, 2010, at 07:06, Vilobh Meshram wrote:
> Thanks Alexey for the reply.Thanks a lot.
> 
> I will try out the steps mentioned by you and see if I can add a new RPC
for the task which I am thinking of to implement in Lustre.
> 
> The RPC of which I am thinking of will not return the lock to the
caller.Yes that rpc will have special code to reconstruct in replay phase.In that case possible you need look to ''setattr'' functions -
it have own mdt_reconstruct_setattr reconstructor, but that is need support on
client side.
typically that say - you need have special field in message to copy data from
server reply.
> 
> Just a last question from which release of Lustre can we make use of the
new API.
> Is their any documentation which lists the use of the new API.If yes can
you please point me to that ?lustre 1.x have a old PtlRPC API, lustre 2.0 have a mix of new and old, but have
a migrate to use new API.
> 
> Thanks again.
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> 
> On Tue, Oct 12, 2010 at 11:46 PM, Alexey Lyashkov <alexey.lyashkov at
clusterstor.com> wrote:
> That is depend of rpc type - is that RPC want to be return lock to caller
or not, and is that rpc want to have special code to reconstruct in replay
phase.
> in general you need look to mdt/mdt_handler.c. mdt_get_info is good example
of simple rpc processing - but it use new PtlRPC api.
> that is API hide of low level request structures and provide api to access
to message buffer by identifier.
> to use that API you need define structure of own message in
ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with commands
and write own handler.
> 
> 
> On Oct 13, 2010, at 01:17, Vilobh Meshram wrote:
>> Thanks Alexey.It was helpful.
>> 
>> I have one more question :-
>> 
>> If we want to add a new RPC with a new opcode are there any guidlines
to be followed in the Lustre File System.
>> 
>> Also ,
>> 1)How does the MDS process the ptlrpc_request i.e how does the MDS
extract the buffer information from the ptlrpc_message.
>> 2)For every new RPC is the message length which is to be sent on wire (
including the fixed header size + the buffer size) dependent on the number of
buffers in the lustre request message i.e the count field in the
ptlrpc_prep_req() or the size of the size[] array.
>> 
>> 
>> Thanks,
>> Vilobh
>> Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio
>> 
>> 
>> On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov <alexey.lyashkov at
clusterstor.com> wrote:
>> Hi Vilobh,
>> 
>> ldlm_cli_cancel_req is good example to use old PtlRPC API.
>> for first you need allocate request buffer via ptlrpc_prep_req
>> next is - allocate reply buffer via ptlrpc_req_set_repsize
>> next - call ptlrpc_queue_wait to send message and wait reply.
>> 
>> osc_getattr_async good example for new PtlRPC API and async RPC
processing.
>> 
>> if that isn''t help you - please show a yours code to find a
error.
>> 
>> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote:
>> 
>>> I want to understand the message encoding and decoding logic in
lustre.I am planning to send a request to the MDS and based on the reply from
MDs want to populate the
>>> 
>>>    struct lustre_msg *rq_repbuf; /* client only, buf may be bigger
than msg */
>>>    struct lustre_msg *rq_repmsg;
>>> 
>>> I am trying this for a simple "Hello" message but not
seeing the expected output.Sometime I even see Kernel Crash.
>>> If you can please give me some insight on the way the Lustre File
system encodes decodes the messages sent accross nodes it would be helpful.
>>> 
>>> Thanks,
>>> Vilobh
>>> Graduate Research Associate
>>> Department of Computer Science
>>> The Ohio State University Columbus Ohio
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
>> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/943b354e/attachment.html

Vilobh Meshram

2010-Oct-13 04:35 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Thanks a lot Alexey for the reply.The information will be really useful.

Since I am using 1.8.1.1 for my research project I will have to rely on the
old API.Since in the source tree prior to 2.0 we do not have a
mdt/mdt_handler.c and layout.c files will have to work with the low level
buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a
place or a function which make use of the old API which I can use as a
reference to write the RPC for my task.

Thanks,
Vilobh
*Graduate Research Associate*
*Department of Computer Science*
*The Ohio State University Columbus Ohio*


On Wed, Oct 13, 2010 at 12:20 AM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
>
> On Oct 13, 2010, at 07:06, Vilobh Meshram wrote:
>
> Thanks Alexey for the reply.Thanks a lot.
>
> I will try out the steps mentioned by you and see if I can add a new RPC
> for the task which I am thinking of to implement in Lustre.
>
> The RPC of which I am thinking of will not return the lock to the
> caller.Yes that rpc will have special code to reconstruct in replay phase.
>
> In that case possible you need look to ''setattr''
functions - it have
> own mdt_reconstruct_setattr reconstructor, but that is need support on
> client side.
> typically that say - you need have special field in message to copy data
> from server reply.
>
>
> Just a last question from which release of Lustre can we make use of the
> new API.
>
> Is their any documentation which lists the use of the new API.If yes can
> you please point me to that ?
>
> lustre 1.x have a old PtlRPC API, lustre 2.0 have a mix of new and old, but
> have a migrate to use new API.
>
>
> Thanks again.
>
> Thanks,
> Vilobh
> *Graduate Research Associate*
> ***Department of Computer Science*
> ***The Ohio State University Columbus Ohio*
>
> On Tue, Oct 12, 2010 at 11:46 PM, Alexey Lyashkov <
> alexey.lyashkov at clusterstor.com> wrote:
>
>> That is depend of rpc type - is that RPC want to be return lock to
caller
>> or not, and is that rpc want to have special code to reconstruct in
replay
>> phase.
>> in general you need look to mdt/mdt_handler.c. mdt_get_info is good
>> example of simple rpc processing - but it use new PtlRPC api.
>> that is API hide of low level request structures and provide api to
access
>> to message buffer by identifier.
>> to use that API you need define structure of own message in
>> ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with
>> commands and write own handler.
>>
>>
>> On Oct 13, 2010, at 01:17, Vilobh Meshram wrote:
>>
>> Thanks Alexey.It was helpful.
>>
>> I have one more question :-
>>
>> If we want to add a new RPC with a new opcode are there any guidlines
to
>> be followed in the Lustre File System.
>>
>> Also ,
>> 1)How does the MDS process the ptlrpc_request i.e how does the MDS
extract
>> the buffer information from the ptlrpc_message.
>> 2)For every new RPC is the message length which is to be sent on wire (
>> including the fixed header size + the buffer size) dependent on the
number
>> of buffers in the lustre request message i.e the count field in the
>> ptlrpc_prep_req() or the size of the size[] array.
>>
>>
>> Thanks,
>> Vilobh
>> *Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio*
>>
>>
>> On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov <
>> alexey.lyashkov at clusterstor.com> wrote:
>>
>>> Hi Vilobh,
>>>
>>> ldlm_cli_cancel_req is good example to use old PtlRPC API.
>>> for first you need allocate request buffer via ptlrpc_prep_req
>>> next is - allocate reply buffer via ptlrpc_req_set_repsize
>>> next - call ptlrpc_queue_wait to send message and wait reply.
>>>
>>> osc_getattr_async good example for new PtlRPC API and async RPC
>>> processing.
>>>
>>> if that isn''t help you - please show a yours code to find
a error.
>>>
>>> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote:
>>>
>>> I want to understand the message encoding and decoding logic in
lustre.I
>>> am planning to send a request to the MDS and based on the reply
from MDs
>>> want to populate the
>>>
>>>    struct lustre_msg *rq_repbuf; /* client only, buf may be bigger
than
>>> msg */
>>>    struct lustre_msg *rq_repmsg;
>>>
>>> I am trying this for a simple "Hello" message but not
seeing the expected
>>> output.Sometime I even see Kernel Crash.
>>> If you can please give me some insight on the way the Lustre File
system
>>> encodes decodes the messages sent accross nodes it would be
helpful.
>>>
>>> Thanks,
>>> Vilobh
>>> *Graduate Research Associate
>>> Department of Computer Science
>>> The Ohio State University Columbus Ohio*
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>
>>>
>>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/041cef72/attachment-0001.html

Alexey Lyashkov

2010-Oct-13 04:41 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

mds_handle (start processing),
MDS_CHECK_RESENT() macro to handle reconstruction.
mds_set_info_rpc - to simple rpc processing, possible mds_setxattr
(mds_setxattr_internal) with generic reconstruction function.



On Oct 13, 2010, at 07:35, Vilobh Meshram wrote:
> Thanks a lot Alexey for the reply.The information will be really useful.
> 
> Since I am using 1.8.1.1 for my research project I will have to rely on the
old API.Since in the source tree prior to 2.0 we do not have a mdt/mdt_handler.c
and layout.c files will have to work with the low level buffer management
structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a place or a function
which make use of the old API which I can use as a reference to write the RPC
for my task.
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> 
> 
> On Wed, Oct 13, 2010 at 12:20 AM, Alexey Lyashkov <alexey.lyashkov at
clusterstor.com> wrote:
> 
> On Oct 13, 2010, at 07:06, Vilobh Meshram wrote:
> 
>> Thanks Alexey for the reply.Thanks a lot.
>> 
>> I will try out the steps mentioned by you and see if I can add a new
RPC for the task which I am thinking of to implement in Lustre.
>> 
>> The RPC of which I am thinking of will not return the lock to the
caller.Yes that rpc will have special code to reconstruct in replay phase.
> 
> In that case possible you need look to ''setattr''
functions - it have own mdt_reconstruct_setattr reconstructor, but that is need
support on client side.
> typically that say - you need have special field in message to copy data
from server reply.
> 
>> 
>> Just a last question from which release of Lustre can we make use of
the new API.
>> Is their any documentation which lists the use of the new API.If yes
can you please point me to that ?
> 
> lustre 1.x have a old PtlRPC API, lustre 2.0 have a mix of new and old, but
have a migrate to use new API.
> 
>> 
>> Thanks again.
>> 
>> Thanks,
>> Vilobh
>> Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio
>> 
>> On Tue, Oct 12, 2010 at 11:46 PM, Alexey Lyashkov <alexey.lyashkov
at clusterstor.com> wrote:
>> That is depend of rpc type - is that RPC want to be return lock to
caller or not, and is that rpc want to have special code to reconstruct in
replay phase.
>> in general you need look to mdt/mdt_handler.c. mdt_get_info is good
example of simple rpc processing - but it use new PtlRPC api.
>> that is API hide of low level request structures and provide api to
access to message buffer by identifier.
>> to use that API you need define structure of own message in
ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with commands
and write own handler.
>> 
>> 
>> On Oct 13, 2010, at 01:17, Vilobh Meshram wrote:
>>> Thanks Alexey.It was helpful.
>>> 
>>> I have one more question :-
>>> 
>>> If we want to add a new RPC with a new opcode are there any
guidlines to be followed in the Lustre File System.
>>> 
>>> Also ,
>>> 1)How does the MDS process the ptlrpc_request i.e how does the MDS
extract the buffer information from the ptlrpc_message.
>>> 2)For every new RPC is the message length which is to be sent on
wire ( including the fixed header size + the buffer size) dependent on the
number of buffers in the lustre request message i.e the count field in the
ptlrpc_prep_req() or the size of the size[] array.
>>> 
>>> 
>>> Thanks,
>>> Vilobh
>>> Graduate Research Associate
>>> Department of Computer Science
>>> The Ohio State University Columbus Ohio
>>> 
>>> 
>>> On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov
<alexey.lyashkov at clusterstor.com> wrote:
>>> Hi Vilobh,
>>> 
>>> ldlm_cli_cancel_req is good example to use old PtlRPC API.
>>> for first you need allocate request buffer via ptlrpc_prep_req
>>> next is - allocate reply buffer via ptlrpc_req_set_repsize
>>> next - call ptlrpc_queue_wait to send message and wait reply.
>>> 
>>> osc_getattr_async good example for new PtlRPC API and async RPC
processing.
>>> 
>>> if that isn''t help you - please show a yours code to find
a error.
>>> 
>>> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote:
>>> 
>>>> I want to understand the message encoding and decoding logic in
lustre.I am planning to send a request to the MDS and based on the reply from
MDs want to populate the
>>>> 
>>>>    struct lustre_msg *rq_repbuf; /* client only, buf may be
bigger than msg */
>>>>    struct lustre_msg *rq_repmsg;
>>>> 
>>>> I am trying this for a simple "Hello" message but not
seeing the expected output.Sometime I even see Kernel Crash.
>>>> If you can please give me some insight on the way the Lustre
File system encodes decodes the messages sent accross nodes it would be helpful.
>>>> 
>>>> Thanks,
>>>> Vilobh
>>>> Graduate Research Associate
>>>> Department of Computer Science
>>>> The Ohio State University Columbus Ohio
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> 
>>> 
>> 
>> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/d156f01c/attachment.html

Nicolas Williams

2010-Oct-13 05:42 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram
wrote:> Thanks a lot Alexey for the reply.The information will be really useful.
> 
> Since I am using 1.8.1.1 for my research project I will have to rely on the
> old API.Since in the source tree prior to 2.0 we do not have a
> mdt/mdt_handler.c and layout.c files will have to work with the low level
> buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you know
a
> place or a function which make use of the old API which I can use as a
> reference to write the RPC for my task.
The new API is _much_ easier to use than the old API.

To add an RPC you must:

 - decide what it looks like

   Every PTLRPC has an opcode and one or more "buffers", with each
   buffer containing a C struct, a string, whatever.  If a buffer
   contains a C struct, then it has to be fixed sized.  The first buffer
   is struct ptlrpc_body.

   A single RPC opcode can denote multiple different layouts, depending
   on contents of various buffers.  A single layout is called a
   "layout".  See below.

 - add any struct, enum, and other C types you need to lustre_idl.h

   You must make sure to use the base types we use in lustre_idl.h, such
   as __u64.

 - create swabber functions for your data, if necessary

 - add handlers for the new RPC to mdt_handler.c (for the MDS) or
   ost_handler.c (for the OST), and so on

   The handlers are responsible for knowing which buffers contain what,
   and for swabbing them.  You have to make sure that you don''t swab a
   buffer more than once.

The new API allows you define formats quite nicely, and it takes care of
calling swabbers and ensuring that no buffer is swabbed more than once.
The formats are defined in lustre/ptlrpc/layout.c and look like this:

struct req_format RQF_MDS_SYNC         DEFINE_REQ_FMT0("MDS_SYNC",
mdt_body_capa, mdt_body_only);
...
static const struct req_msg_field *mdt_body_capa[] = {
        &RMF_PTLRPC_BODY,
        &RMF_MDT_BODY,
        &RMF_CAPA1
};
static const struct req_msg_field *mdt_body_only[] = {
        &RMF_PTLRPC_BODY,
        &RMF_MDT_BODY
};
...

An RPC consists of a request and reply, with their formats given in the
DEFINE_REQ_FMT0() macro (there''s other macros).  Each message format
defines a layout of buffers or, as we call them now, "fields", and
each
field has a format definition as well, such as:

struct req_msg_field RMF_PTLRPC_BODY        
DEFINE_MSGF("ptlrpc_body", 0,
                    sizeof(struct ptlrpc_body), lustre_swab_ptlrpc_body, NULL);

for a struct buffer.  Other types of RMFs are possible (e.g., strings);
see layout.c.

So an MDS_SYNC RPC consists of a three-field (buffer) request and
two-field reply.  The request''s fields are: PTLRPC_BODY, MDT_BODY, and
CAPA1.  The reply''s fields are: PTLRPC_BODY and MDT_BODY.  PTLRPC_BODY
is a fixed-sized field containing a C structure, and that the swabber
for this field is lustre_swab_ptlrpc_body().  And so on.

If you look at Lustre 2.0''s mdt_handler.c and ost_handler.c
you''ll find
that one of the first things done is to initialize a "capsule", and
that
the expected message format of a request is decided based on its opcode.
That is, the mapping of opcode to RQF is not given by some array, but
decided as we go.  Indeed, the RQF of a capsule can be changed
mid-stream, with some constraints.

So, with the new API you:

 - add C types to lustre_idl.h for on-the-wire data
 - add any swabbers to lustre/ptlrpc/pack_generic.c (declare them in
   lustre_idl.h)
 - add RQFs and, possibly, RMFs to layout.c
 - declare the RQFs/RMFs in lustre/include/lustre_req_layout.h

 - on the server-side:
    - Modify the relevant handler to add an arm to the existing switch
      on the request''s opcode, call req_capsule_set() to set the
      capsule''s format, then call a function that will use
      req_capsule_*get*() to get at the fields (buffers) (both, request
      and reply buffers) to read from (request) or write to (reply).

 - on the client-side:
    - You''ll do something very similar, except that there''s no
handler
      function -- the pattern is less consistent, so you''ll have to
read
      mdc*.c and so on to get a flavor for this...  Typically you''ll
      allocate a request using ptlrpc_request_alloc_pack(), fill in its
      fields (again, using req_capsule_client_get() and friends), then
      you''ll send it using, for example, ptlrpc_queue_wait().

      Take a good look at mdc_request.c in 2.0 to get a better idea of
      how to build client stubs for your new RPCs.

I haven''t described the wirecheck part -- I can do that later, once
you''ve made enough progress.  (We have a wirecheck/wiretest program
pair
to check that only backwards interoperable changes are made to
lustre_idl.h.)

I hope that helps.  Yes, it''d be nice to have something closer to an
actual IDL.  The RQF/RMF/wirecheck/wiretest stuff could be extended to:

 - auto-generate swabbers from lustre_idl.h structs
 - provide a default opcode->RQF mapping
 - provide more static type safety (by having req_capsule_*get() be
   macros that cast the buffer address to the right type)
 - auto-generate simple request constructors (that take pointers to
   values of an RQF''s correct request field C types)

Compared to the old thing, the new API is much closer to an IDL.  It''s
a
good thing.  I strongly recommend that you use it,

Nico
--

Alexey Lyashkov

2010-Oct-13 05:54 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Oct 13, 2010, at 08:42, Nicolas Williams wrote:
> On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote:
>> Thanks a lot Alexey for the reply.The information will be really
useful.
>> 
>> Since I am using 1.8.1.1 for my research project I will have to rely on
the
>> old API.Since in the source tree prior to 2.0 we do not have a
>> mdt/mdt_handler.c and layout.c files will have to work with the low
level
>> buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you
know a
>> place or a function which make use of the old API which I can use as a
>> reference to write the RPC for my task.
> 
> 
> Compared to the old thing, the new API is much closer to an IDL. 
It''s a
> good thing.  I strongly recommend that you use it,
> main problem - lustre 1.8.1 don''t have the new API :)


--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

Vilobh Meshram

2010-Oct-13 06:07 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Amazing...Thanks Nicholas and Alexey for your time and detailed reply.

I will try out the new API to create new RPC as per the steps mentioned by
you for Lustre 2.0 (since I am using 1.8.1.1 right now) .

Thanks again.

Thanks,
Vilobh
*Graduate Research Associate*
*Department of Computer Science*
*The Ohio State University Columbus Ohio**
*

On Wed, Oct 13, 2010 at 1:42 AM, Nicolas Williams <
Nicolas.Williams at oracle.com> wrote:
> On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote:
> > Thanks a lot Alexey for the reply.The information will be really
useful.
> >
> > Since I am using 1.8.1.1 for my research project I will have to rely
on
> the
> > old API.Since in the source tree prior to 2.0 we do not have a
> > mdt/mdt_handler.c and layout.c files will have to work with the low
level
> > buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you
> know a
> > place or a function which make use of the old API which I can use as a
> > reference to write the RPC for my task.
>
> The new API is _much_ easier to use than the old API.
>
> To add an RPC you must:
>
>  - decide what it looks like
>
>   Every PTLRPC has an opcode and one or more "buffers", with each
>   buffer containing a C struct, a string, whatever.  If a buffer
>   contains a C struct, then it has to be fixed sized.  The first buffer
>   is struct ptlrpc_body.
>
>   A single RPC opcode can denote multiple different layouts, depending
>   on contents of various buffers.  A single layout is called a
>   "layout".  See below.
>
>  - add any struct, enum, and other C types you need to lustre_idl.h
>
>   You must make sure to use the base types we use in lustre_idl.h, such
>   as __u64.
>
>  - create swabber functions for your data, if necessary
>
>  - add handlers for the new RPC to mdt_handler.c (for the MDS) or
>   ost_handler.c (for the OST), and so on
>
>   The handlers are responsible for knowing which buffers contain what,
>   and for swabbing them.  You have to make sure that you don''t
swab a
>   buffer more than once.
>
> The new API allows you define formats quite nicely, and it takes care of
> calling swabbers and ensuring that no buffer is swabbed more than once.
> The formats are defined in lustre/ptlrpc/layout.c and look like this:
>
> struct req_format RQF_MDS_SYNC >       
DEFINE_REQ_FMT0("MDS_SYNC", mdt_body_capa, mdt_body_only);
> ...
> static const struct req_msg_field *mdt_body_capa[] = {
>        &RMF_PTLRPC_BODY,
>        &RMF_MDT_BODY,
>        &RMF_CAPA1
> };
> static const struct req_msg_field *mdt_body_only[] = {
>        &RMF_PTLRPC_BODY,
>        &RMF_MDT_BODY
> };
> ...
>
> An RPC consists of a request and reply, with their formats given in the
> DEFINE_REQ_FMT0() macro (there''s other macros).  Each message
format
> defines a layout of buffers or, as we call them now, "fields",
and each
> field has a format definition as well, such as:
>
> struct req_msg_field RMF_PTLRPC_BODY >       
DEFINE_MSGF("ptlrpc_body", 0,
>                    sizeof(struct ptlrpc_body), lustre_swab_ptlrpc_body,
> NULL);
>
> for a struct buffer.  Other types of RMFs are possible (e.g., strings);
> see layout.c.
>
> So an MDS_SYNC RPC consists of a three-field (buffer) request and
> two-field reply.  The request''s fields are: PTLRPC_BODY, MDT_BODY,
and
> CAPA1.  The reply''s fields are: PTLRPC_BODY and MDT_BODY. 
PTLRPC_BODY
> is a fixed-sized field containing a C structure, and that the swabber
> for this field is lustre_swab_ptlrpc_body().  And so on.
>
> If you look at Lustre 2.0''s mdt_handler.c and ost_handler.c
you''ll find
> that one of the first things done is to initialize a "capsule",
and that
> the expected message format of a request is decided based on its opcode.
> That is, the mapping of opcode to RQF is not given by some array, but
> decided as we go.  Indeed, the RQF of a capsule can be changed
> mid-stream, with some constraints.
>
> So, with the new API you:
>
>  - add C types to lustre_idl.h for on-the-wire data
>  - add any swabbers to lustre/ptlrpc/pack_generic.c (declare them in
>   lustre_idl.h)
>  - add RQFs and, possibly, RMFs to layout.c
>  - declare the RQFs/RMFs in lustre/include/lustre_req_layout.h
>
>  - on the server-side:
>    - Modify the relevant handler to add an arm to the existing switch
>      on the request''s opcode, call req_capsule_set() to set the
>      capsule''s format, then call a function that will use
>      req_capsule_*get*() to get at the fields (buffers) (both, request
>      and reply buffers) to read from (request) or write to (reply).
>
>  - on the client-side:
>    - You''ll do something very similar, except that
there''s no handler
>      function -- the pattern is less consistent, so you''ll have to
read
>      mdc*.c and so on to get a flavor for this...  Typically
you''ll
>      allocate a request using ptlrpc_request_alloc_pack(), fill in its
>      fields (again, using req_capsule_client_get() and friends), then
>      you''ll send it using, for example, ptlrpc_queue_wait().
>
>      Take a good look at mdc_request.c in 2.0 to get a better idea of
>      how to build client stubs for your new RPCs.
>
> I haven''t described the wirecheck part -- I can do that later,
once
> you''ve made enough progress.  (We have a wirecheck/wiretest
program pair
> to check that only backwards interoperable changes are made to
> lustre_idl.h.)
>
> I hope that helps.  Yes, it''d be nice to have something closer to
an
> actual IDL.  The RQF/RMF/wirecheck/wiretest stuff could be extended to:
>
>  - auto-generate swabbers from lustre_idl.h structs
>  - provide a default opcode->RQF mapping
>  - provide more static type safety (by having req_capsule_*get() be
>   macros that cast the buffer address to the right type)
>  - auto-generate simple request constructors (that take pointers to
>   values of an RQF''s correct request field C types)
>
> Compared to the old thing, the new API is much closer to an IDL. 
It''s a
> good thing.  I strongly recommend that you use it,
>
> Nico
> --
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/a967e3dd/attachment.html

Alexey Lyashkov

2010-Oct-13 06:25 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Oct 13, 2010, at 08:42, Nicolas Williams wrote:> 
> 
> - add handlers for the new RPC to mdt_handler.c (for the MDS) or
>   ost_handler.c (for the OST), and so on
> 
>   The handlers are responsible for knowing which buffers contain what,
>   and for swabbing them.  You have to make sure that you don''t
swab a
>   buffer more than once.
> BTW.
That not enough.
Some of operations wants to have own recostructors for replay/resend.
Some of operations want to have a return lock - as example MDS_GETATTR and
MDS_REINT commands.
....



--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

Nicolas Williams

2010-Oct-13 07:12 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Wed, Oct 13, 2010 at 09:25:00AM +0300, Alexey Lyashkov
wrote:> 
> On Oct 13, 2010, at 08:42, Nicolas Williams wrote:
> > 
> > 
> > - add handlers for the new RPC to mdt_handler.c (for the MDS) or
> >   ost_handler.c (for the OST), and so on
> > 
> >   The handlers are responsible for knowing which buffers contain what,
> >   and for swabbing them.  You have to make sure that you
don''t swab a
> >   buffer more than once.
> > 
> BTW.
> That not enough.
> Some of operations wants to have own recostructors for replay/resend.
I glossed over replay/resend, mostly because I know little about them,
but also because they are completely orthogonal to the message format
details.  If you want to add an RPC then the first step should be to get
the RPC format designed and surrounding code up and running, then you
can take care of replay/resend.
> Some of operations want to have a return lock - as example MDS_GETATTR
> and MDS_REINT commands.
That too is orthogonal to the message formats.  The message format has
to have a buffer (field) declared to carry lock (or capability, or
whatever) bits, and some function has to be invoked to populate the
buffer in the reply.

Nico
--

Nicolas Williams

2010-Oct-13 07:15 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Wed, Oct 13, 2010 at 08:54:57AM +0300, Alexey Lyashkov
wrote:> On Oct 13, 2010, at 08:42, Nicolas Williams wrote:
> > On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote:
> >> Thanks a lot Alexey for the reply.The information will be really
useful.
> >> 
> >> Since I am using 1.8.1.1 for my research project I will have to
rely on the
> >> old API.Since in the source tree prior to 2.0 we do not have a
> >> mdt/mdt_handler.c and layout.c files will have to work with the
low level
> >> buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do
you know a
> >> place or a function which make use of the old API which I can use
as a
> >> reference to write the RPC for my task.
> > 
> > 
> > Compared to the old thing, the new API is much closer to an IDL. 
It''s a
> > good thing.  I strongly recommend that you use it,
> > 
> main problem - lustre 1.8.1 don''t have the new API :)
You''ll note that Vilobh did not provide any rationale for his/her
choice
of Lustre version.  Without having any other good reason for picking 1.8
or 2.0, I strongly recommend 2.0.

Now, perhaps Vilobh has a need to interop with an installed base of 1.8.
That would be a good reason to do this in 1.8.  But the work will have
to be done for 2.0 as well.

Nico
--

Nicolas Williams

2010-Oct-13 07:17 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Wed, Oct 13, 2010 at 02:07:06AM -0400, Vilobh Meshram
wrote:> I will try out the new API to create new RPC as per the steps mentioned by
> you for Lustre 2.0 (since I am using 1.8.1.1 right now) .
The new API, incidentally, uses the old API under the hood.  That might
help guide you.  To understand usage patterns for the old API it should
help to look at 2.0 code, particularly layout.c code, then look at
corresponding 1.8 code.

Nico
--

Alexey Lyashkov

2010-Oct-13 07:27 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

eh.. Nicolas,

Format for messages which want to reconstructed after resend and don''t
want recontructed - is different.

As quick example it is OPEN request (via MDS_REINT command), 
that type message need a have extra buffer to store LOV EA, which to be send to
MDS in replay case (with additional flag in header).
(client have a copy data from a mds reply after ptlrpc finish processing
request).
That is why i say about "Reconstruct/replay case"

Also format is different is you want to use MDS_REINT + sub commands or you want
to use something similar to MDS_SET_INFO.
For MDS_SET_INFO you use single format for all messages (just simple key
<> value) buffer,
but for MDS_REINT you need two formats - one for generic MDS_REINT code (get
opcode from command, get locks, and possible other)
and own format for each opcode  - such as open, unlink, setxattr, setattr.
all of them have a different number of buffers (fields).

On Oct 13, 2010, at 10:12, Nicolas Williams wrote:
> On Wed, Oct 13, 2010 at 09:25:00AM +0300, Alexey Lyashkov wrote:
>> 
>> On Oct 13, 2010, at 08:42, Nicolas Williams wrote:
>>> 
>>> 
>>> - add handlers for the new RPC to mdt_handler.c (for the MDS) or
>>>  ost_handler.c (for the OST), and so on
>>> 
>>>  The handlers are responsible for knowing which buffers contain
what,
>>>  and for swabbing them.  You have to make sure that you
don''t swab a
>>>  buffer more than once.
>>> 
>> BTW.
>> That not enough.
>> Some of operations wants to have own recostructors for replay/resend.
> 
> I glossed over replay/resend, mostly because I know little about them,
> but also because they are completely orthogonal to the message format
> details.  If you want to add an RPC then the first step should be to get
> the RPC format designed and surrounding code up and running, then you
> can take care of replay/resend.
> 
>> Some of operations want to have a return lock - as example MDS_GETATTR
>> and MDS_REINT commands.
> 
> That too is orthogonal to the message formats.  The message format has
> to have a buffer (field) declared to carry lock (or capability, or
> whatever) bits, and some function has to be invoked to populate the
> buffer in the reply.
> 
> Nico
> --

Alexey Lyashkov

2010-Oct-13 07:32 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

MDS code in 1.8 is more simple, because they don''t have a parts of
clustered metadata project aka CMD3 ;-)
same for client, they don''t have many advantages - such as FID
assignments on client side, or extra MD layer (LMV) and  don''t have a
CLIO.

so 1.8 is good start to learn :)

On Oct 13, 2010, at 10:15, Nicolas Williams wrote:
> On Wed, Oct 13, 2010 at 08:54:57AM +0300, Alexey Lyashkov wrote:
>> On Oct 13, 2010, at 08:42, Nicolas Williams wrote:
>>> On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote:
>>>> Thanks a lot Alexey for the reply.The information will be
really useful.
>>>> 
>>>> Since I am using 1.8.1.1 for my research project I will have to
rely on the
>>>> old API.Since in the source tree prior to 2.0 we do not have a
>>>> mdt/mdt_handler.c and layout.c files will have to work with the
low level
>>>> buffer management
structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a
>>>> place or a function which make use of the old API which I can
use as a
>>>> reference to write the RPC for my task.
>>> 
>>> 
>>> Compared to the old thing, the new API is much closer to an IDL. 
It''s a
>>> good thing.  I strongly recommend that you use it,
>>> 
>> main problem - lustre 1.8.1 don''t have the new API :)
> 
> You''ll note that Vilobh did not provide any rationale for his/her
choice
> of Lustre version.  Without having any other good reason for picking 1.8
> or 2.0, I strongly recommend 2.0.
> 
> Now, perhaps Vilobh has a need to interop with an installed base of 1.8.
> That would be a good reason to do this in 1.8.  But the work will have
> to be done for 2.0 as well.
> 
> Nico
> -- 


--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

Nicolas Williams

2010-Oct-13 07:43 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Wed, Oct 13, 2010 at 10:27:55AM +0300, Alexey Lyashkov
wrote:> eh.. Nicolas,
> 
> Format for messages which want to reconstructed after resend and
don''t
> want recontructed - is different.
> 
> As quick example it is OPEN request (via MDS_REINT command), that type
> message need a have extra buffer to store LOV EA, which to be send to
> MDS in replay case (with additional flag in header).  (client have a
> copy data from a mds reply after ptlrpc finish processing request).
> That is why i say about "Reconstruct/replay case"
Sure, but this buffer needs to be declared a priori.  If you won''t know
whether you''ll need a buffer until later, that''s OK: you
declare it
anyways and you set its size to zero if you don''t need it.

You can''t change a capsule''s format to add buffers; you can
only set the
size of unnecessary buffers to zero.  This is because the header of a
ptlrpc (not the ptlrpc_body, mind you) has a count of buffers then a
variable length (64-bit aligned) set of that many 32-bit buffer lengths
(I''m going from memory here), and adding buffers can put a reply over
the expected max size on the client side, leading to it being dropped.

You can change a capsule''s format to change the definition of a field
from one without a swabber to one with a swabber.

You''ll see in many cases that the presence of a field (meaning, whether
it''s checked for or whether it has a non-zero size) is dependent on a
flag in the mdt or ost body, as you mention.  Replays are not the only
interesting case here.  Capabilities are another.

Some of these flags could be removed and replaced instead with checks of
buffer size (0 -> flag not set, >0 -> flag set).
> Also format is different is you want to use MDS_REINT + sub commands
> or you want to use something similar to MDS_SET_INFO.  For
> MDS_SET_INFO you use single format for all messages (just simple key
> <> value) buffer, but for MDS_REINT you need two formats - one for
> generic MDS_REINT code (get opcode from command, get locks, and
> possible other) and own format for each opcode  - such as open,
> unlink, setxattr, setattr.  all of them have a different number of
> buffers (fields).
The SET_INFO RPCs are kinda gross.  I should know, since I finished the
conversion of ost_handler.c to the new API.  You can see that I used
req_capsule_extend() to handle some SET_INFO cases.  No, I didn''t cover
this detail, nor others, because I figured Vilobh needed a starting
point, and that''s all I was going to provide tonight.

Nico
--

Alexey Lyashkov

2010-Oct-13 07:51 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Oct 13, 2010, at 10:43, Nicolas Williams wrote:
> On Wed, Oct 13, 2010 at 10:27:55AM +0300, Alexey Lyashkov wrote:
>> eh.. Nicolas,
>> 
>> Format for messages which want to reconstructed after resend and
don''t
>> want recontructed - is different.
>> 
>> As quick example it is OPEN request (via MDS_REINT command), that type
>> message need a have extra buffer to store LOV EA, which to be send to
>> MDS in replay case (with additional flag in header).  (client have a
>> copy data from a mds reply after ptlrpc finish processing request).
>> That is why i say about "Reconstruct/replay case"
> 
> Sure, but this buffer needs to be declared a priori.  If you won''t
know
> whether you''ll need a buffer until later, that''s OK: you
declare it
> anyways and you set its size to zero if you don''t need it.
> 
> You can''t change a capsule''s format to add buffers; you
can only set the
> size of unnecessary buffers to zero.  but you can reassing format for a message :)
if you look to MDT code, you can see - for MDS_REINT you have first format 
for operation inside REINT lustre use second format

static int mdt_reint(struct mdt_thread_info *info)
{
        long opc;
        int  rc;

        static const struct req_format *reint_fmts[REINT_MAX] = {
                [REINT_SETATTR]  = &RQF_MDS_REINT_SETATTR,
                [REINT_CREATE]   = &RQF_MDS_REINT_CREATE,
                [REINT_LINK]     = &RQF_MDS_REINT_LINK,
                [REINT_UNLINK]   = &RQF_MDS_REINT_UNLINK,
                [REINT_RENAME]   = &RQF_MDS_REINT_RENAME,
                [REINT_OPEN]     = &RQF_MDS_REINT_OPEN,
                [REINT_SETXATTR] = &RQF_MDS_REINT_SETXATTR
        };

..

understand me ?


--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

Vilobh Meshram

2010-Oct-13 23:51 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Alexey/Nicholas/All,

I had a look at the 2.0 side of the code and seems like there have been some
significant modifications at the MDS side e.g. seems like the request
processing part at MDS has been redefined.

I need to stick to 1.8.1.1 since my most of the modifications are at the MDS
side and the project also demands the same.So I will need to play around
with the low-level message packing and unpacking stuff which is pretty
complicated from my previous experience.

Here is my understanding of the way the request are processed .Please
correct me if I am wrong.
1) Seems like in Lustre code base for each RPC you have a static structure
"size" and count defined which defines the way the message would be
layed
out (after doing all the rounding off operations etc) i.e the offset at
which the buffers will be packed and so on.
2) At MDS side the swabber() function + some associated functions extract
the buffer information.

*What I need is :-*

1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I can
append some string such as "Hello" to the exsisting message sent by
the
Client (with the buffer size set at client side by the count,size fields).I
tried modifying the "size" of the request for one of the RPC in-built
in
Lustre

         __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
                                    [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) };

---->>
     __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
                                  [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) ,
                                  //how to add "char *str=Hello"
ofcourse we
will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz
for a specific kind of RPC there are limited number of such MACROS
                                            };

The thing I want to know is how can I send a buffer from the client side by
modifying the static structure "size" mentioned above.What all main
places
do I need to consider to make this work.

If the above step i.e appending a buffer in the "size" array is not
possible
then I can move to write a new RPC.


Thanks,
Vilobh
*Graduate Research Associate*
*Department of Computer Science*
*The Ohio State University Columbus Ohio*


On Wed, Oct 13, 2010 at 3:51 AM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
>
> On Oct 13, 2010, at 10:43, Nicolas Williams wrote:
>
> > On Wed, Oct 13, 2010 at 10:27:55AM +0300, Alexey Lyashkov wrote:
> >> eh.. Nicolas,
> >>
> >> Format for messages which want to reconstructed after resend and
don''t
> >> want recontructed - is different.
> >>
> >> As quick example it is OPEN request (via MDS_REINT command), that
type
> >> message need a have extra buffer to store LOV EA, which to be send
to
> >> MDS in replay case (with additional flag in header).  (client have
a
> >> copy data from a mds reply after ptlrpc finish processing
request).
> >> That is why i say about "Reconstruct/replay case"
> >
> > Sure, but this buffer needs to be declared a priori.  If you
won''t know
> > whether you''ll need a buffer until later, that''s OK:
you declare it
> > anyways and you set its size to zero if you don''t need it.
> >
> > You can''t change a capsule''s format to add buffers;
you can only set the
> > size of unnecessary buffers to zero.
> but you can reassing format for a message :)
> if you look to MDT code, you can see - for MDS_REINT you have first format
> for operation inside REINT lustre use second format
>
> static int mdt_reint(struct mdt_thread_info *info)
> {
>        long opc;
>        int  rc;
>
>        static const struct req_format *reint_fmts[REINT_MAX] = {
>                [REINT_SETATTR]  = &RQF_MDS_REINT_SETATTR,
>                [REINT_CREATE]   = &RQF_MDS_REINT_CREATE,
>                [REINT_LINK]     = &RQF_MDS_REINT_LINK,
>                [REINT_UNLINK]   = &RQF_MDS_REINT_UNLINK,
>                [REINT_RENAME]   = &RQF_MDS_REINT_RENAME,
>                [REINT_OPEN]     = &RQF_MDS_REINT_OPEN,
>                [REINT_SETXATTR] = &RQF_MDS_REINT_SETXATTR
>        };
>
> ..
>
> understand me ?
>
>
> --------------------------------------
> Alexey Lyashkov
> alexey.lyashkov at clusterstor.com
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/236950ac/attachment.html

Nicolas Williams

2010-Oct-14 00:28 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Wed, Oct 13, 2010 at 07:51:37PM -0400, Vilobh Meshram
wrote:> 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I can
> append some string such as "Hello" to the exsisting message sent
by the
> Client (with the buffer size set at client side by the count,size fields).I
> tried modifying the "size" of the request for one of the RPC
in-built in
> Lustre
Yes, it''s possible to add buffers to requests.  It''s not
possible to add
buffers to _replies_ to existing RPCs unless you know the client expects
those additional buffers -- existing clients expect a given maxsize for
each reply, and if your reply is bigger then it will get dropped.
>          __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
> ptlrpc_body),
>                                     [DLM_LOCKREQ_OFF]     = sizeof(struct
> ldlm_request) };
> 
> ---->>
>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>                                   [DLM_LOCKREQ_OFF]     = sizeof(struct
> ldlm_request) ,
>                                   //how to add "char *str=Hello"
ofcourse we
> will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz
> for a specific kind of RPC there are limited number of such MACROS
>                                             };
Add a buffer.  Don''t change the size of an existing buffer.
> The thing I want to know is how can I send a buffer from the client side by
> modifying the static structure "size" mentioned above.What all
main places
> do I need to consider to make this work.
Add an element to the size[] array, then set it to the correct size when
you know the length of the string.  Look at the SET_INFO RPCs.
> If the above step i.e appending a buffer in the "size" array is
not possible
> then I can move to write a new RPC.
The size[] array is just a convenient place to store the sizes of the
individual buffers while you construct them.

Nico
--

Vilobh Meshram

2010-Oct-14 01:41 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Thanks Nicolas.I will try it out by today/tommorow .

Seems like it will touch lot of places in the codebase :-)

Thanks again.

Thanks,
Vilobh
*Graduate Research Associate*
*Department of Computer Science*
*The Ohio State University Columbus Ohio*



On Wed, Oct 13, 2010 at 8:28 PM, Nicolas Williams <
Nicolas.Williams at oracle.com> wrote:
> On Wed, Oct 13, 2010 at 07:51:37PM -0400, Vilobh Meshram wrote:
> > 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I
can
> > append some string such as "Hello" to the exsisting message
sent by the
> > Client (with the buffer size set at client side by the count,size
> fields).I
> > tried modifying the "size" of the request for one of the RPC
in-built in
> > Lustre
>
> Yes, it''s possible to add buffers to requests.  It''s not
possible to add
> buffers to _replies_ to existing RPCs unless you know the client expects
> those additional buffers -- existing clients expect a given maxsize for
> each reply, and if your reply is bigger then it will get dropped.
>
> >          __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
> > ptlrpc_body),
> >                                     [DLM_LOCKREQ_OFF]     =
sizeof(struct
> > ldlm_request) };
> >
> > ---->>
> >      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
> ptlrpc_body),
> >                                   [DLM_LOCKREQ_OFF]     =
sizeof(struct
> > ldlm_request) ,
> >                                   //how to add "char
*str=Hello" ofcourse
> we
> > will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF
> bcz
> > for a specific kind of RPC there are limited number of such MACROS
> >                                             };
>
> Add a buffer.  Don''t change the size of an existing buffer.
>
> > The thing I want to know is how can I send a buffer from the client
side
> by
> > modifying the static structure "size" mentioned above.What
all main
> places
> > do I need to consider to make this work.
>
> Add an element to the size[] array, then set it to the correct size when
> you know the length of the string.  Look at the SET_INFO RPCs.
>
> > If the above step i.e appending a buffer in the "size" array
is not
> possible
> > then I can move to write a new RPC.
>
> The size[] array is just a convenient place to store the sizes of the
> individual buffers while you construct them.
>
> Nico
> --
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/916108f9/attachment.html

Alexey Lyashkov

2010-Oct-14 03:38 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
> On Wed, Oct 13, 2010 at 07:51:37PM -0400, Vilobh Meshram wrote:
>> 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I
can
>> append some string such as "Hello" to the exsisting message
sent by the
>> Client (with the buffer size set at client side by the count,size
fields).I
>> tried modifying the "size" of the request for one of the RPC
in-built in
>> Lustre
> 
> Yes, it''s possible to add buffers to requests.  It''s not
possible to add
> buffers to _replies_ to existing RPCs unless you know the client expects
> those additional buffers -- existing clients expect a given maxsize for
> each reply, and if your reply is bigger then it will get dropped.It is wrong for last ~1year.
~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and
resend a request.

        if (ev->mlength < ev->rlength ) {
                CDEBUG(D_RPCTRACE, "truncate req %p rpc %d - %d+%d\n",
req,
                       req->rq_replen, ev->rlength, ev->offset);
                req->rq_reply_truncate = 1;
                req->rq_replied = 1;
                req->rq_status = -EOVERFLOW;
                req->rq_nob_received = ev->rlength + ev->offset;

...

        if (req->rq_reply_truncate) {
                if (ptlrpc_no_resend(req)) {
                        DEBUG_REQ(D_ERROR, req, "reply buffer
overflow,"
                                  " expected: %d, actual size: %d",
                                  req->rq_nob_received,
req->rq_repbuf_len);
                        RETURN(-EOVERFLOW);
                }

                sptlrpc_cli_free_repbuf(req);
                /* Pass the required reply buffer size (include
                 * space for early reply).
                 * NB: no need to roundup because alloc_repbuf
                 * will roundup it */
                req->rq_replen       = req->rq_nob_received;
                req->rq_nob_received = 0;
                req->rq_resend       = 1;
                RETURN(0);
        }


--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

Nicolas Williams

2010-Oct-14 05:18 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov
wrote:> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
> > Yes, it''s possible to add buffers to requests.  It''s
not possible to add
> > buffers to _replies_ to existing RPCs unless you know the client
expects
> > those additional buffers -- existing clients expect a given maxsize
for
> > each reply, and if your reply is bigger then it will get dropped.
> It is wrong for last ~1year.
> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and
resend a request.
Ah, I didn''t know that was in 1.8.  Are there interop issues (with
older
clients) though with sending larger replies than expected?

Alexey Lyashkov

2010-Oct-14 05:46 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Oct 14, 2010, at 08:18, Nicolas Williams wrote:
> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote:
>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>>> Yes, it''s possible to add buffers to requests. 
It''s not possible to add
>>> buffers to _replies_ to existing RPCs unless you know the client
expects
>>> those additional buffers -- existing clients expect a given maxsize
for
>>> each reply, and if your reply is bigger then it will get dropped.
>> It is wrong for last ~1year.
>> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply,
and resend a request.
> 
> Ah, I didn''t know that was in 1.8.that is added near of 1.8.1 and easy to check by grep -rn rq_reply_truncate in
ptlrpc directory.

Severity   : normal
Bugzilla   : 19526
Description: can''t stat file in some situation.
Details    : improve initialize osc date when target is added to mds and
             ability to resend too big getattr request is client isn''t
have info
             about ost.

>  Are there interop issues (with older
> clients) though with sending larger replies than expected?I not clearly understand that question, but 

main propose of that change - problem with LOV EA buffer size for files with ACL
(look to some conf-sanity tests).
In some situation - MDS can have larger LOV EA buffer, when client expected 
(some files with wide striping have a reference to OST which removed from a
cluster, or configuration lost, or new OST added but OST isn''t
connected to client or bad call shrink reply buffer, or other ... - you can find
more references in bugzilla.)
in that case MDS have send larger buffer to client.
older client have infinity loop on connect  or in stat syscall (because of
messages without rq_no_resend flag)
new client have a resend message - and adjust maximal size of LOV EA after got a
valid reply, to avoid that''s problem in future.

--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

Alexey Lyashkov

2010-Oct-14 08:44 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On Oct 14, 2010, at 02:51, Vilobh Meshram wrote:> 
> 
> 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I can
append some string such as "Hello" to the exsisting message sent by
the Client (with the buffer size set at client side by the count,size fields).I
tried modifying the "size" of the request for one of the RPC in-built
in Lustre
> 
>          __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>                                     [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) };
> 
> ---->> 
>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>                                   [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) ,
>                                   //how to add "char *str=Hello"
ofcourse we will have sizeof(str) but how to choose the MACRO like
DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of such
MACROS
>              should be better if you complete describe - what you want to do, because some
requests can''t changed easy without compatibility lost, -
like ELC (early lock cancel) feature which add extra buffer in messages and have
special connect flag, to check request format changes.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/cbd91c27/attachment-0001.html

Andreas Dilger

2010-Oct-14 14:31 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

On 2010-10-13, at 23:18, Nicolas Williams wrote:> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote:
>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>>> Yes, it''s possible to add buffers to requests. 
It''s not possible to add
>>> buffers to _replies_ to existing RPCs unless you know the client
expects
>>> those additional buffers -- existing clients expect a given maxsize
for
>>> each reply, and if your reply is bigger then it will get dropped.
>> It is wrong for last ~1year.
>> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply,
and resend a request.
> 
> Ah, I didn''t know that was in 1.8.  Are there interop issues (with
older
> clients) though with sending larger replies than expected?
Nico, it has always been possible in the past to increase the size of any buffer
in a request, or in a reply (if the total reply size will fit into the
pre-allocated reply buffer).  An older peer would just ignore the bytes beyond
the known part of the buffer.

Is that not true with the 2.x RPC handling?


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Alexey Lyashkov

2010-Oct-14 14:40 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Andreas,

On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
> On 2010-10-13, at 23:18, Nicolas Williams wrote:
>> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote:
>>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>>>> Yes, it''s possible to add buffers to requests. 
It''s not possible to add
>>>> buffers to _replies_ to existing RPCs unless you know the
client expects
>>>> those additional buffers -- existing clients expect a given
maxsize for
>>>> each reply, and if your reply is bigger then it will get
dropped.
>>> It is wrong for last ~1year.
>>> ~1year ago i add code to ptlrpc layer which a adjust buffer for
reply, and resend a request.
>> 
>> Ah, I didn''t know that was in 1.8.  Are there interop issues
(with older
>> clients) though with sending larger replies than expected?
> 
> Nico, it has always been possible in the past to increase the size of any
buffer in a request, or in a reply (if the total reply size will fit into the
pre-allocated reply buffer).  An older peer would just ignore the bytes beyond
the known part of the buffer.
> I think that question don''t about rebalance buffers size in message,
i think that sending large reply in smaller reply buffer.
LNet don''t able to put large reply to small buffer (without truncate
flag, which is not exist in older ptlrpc version).
without that flag you will see messages >>                CERROR("Matching packet from %s, match "LPU64
                       " length %d too big: %d left, %d allowed\n",
                       libcfs_id2str(src), match_bits, rlength,
                       md->md_length - offset,
mlength);>>and LNet will drop message without notify PtlRPC.

> Is that not true with the 2.x RPC handling?
> 2.x able to rebalance space between buffers (but looks by hand), and able adjust
reply buffer after truncated reply.



--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

Vilobh Meshram

2010-Oct-14 15:04 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Alexey,

Thanks again for your reply.

I am trying to embed a buffer in the RPC which will get filled in with some
values which MDS is aware of which the client calling the RPC is not aware
of.It has nothing to do with locking.I just want to fill in the buffer which
I embedd in the RPC with some suitable data from the MDS end and then do
operations on that data at the client side.So I think the approach suggested
by you and Nicholas of just including the sizeof(str) [the size of the
expected information from the MDS] in the size[] array should be fine as
done below :-



__u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
                                    [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) };

---->>
     __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
                                  [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) ,
                                  //how to add "char *str=Hello"
ofcourse we
will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz
for a specific kind of RPC there are limited number of such MACROS


*Please correct me if I am wrong or please guide me if I need to consider
few corner cases to handle this use case.

*Thanks again.

Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*


On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
> Andreas,
>
> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
>
> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote:
> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
> >>>> Yes, it''s possible to add buffers to requests. 
It''s not possible to
> add
> >>>> buffers to _replies_ to existing RPCs unless you know the
client
> expects
> >>>> those additional buffers -- existing clients expect a
given maxsize
> for
> >>>> each reply, and if your reply is bigger then it will get
dropped.
> >>> It is wrong for last ~1year.
> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer
for reply,
> and resend a request.
> >>
> >> Ah, I didn''t know that was in 1.8.  Are there interop
issues (with older
> >> clients) though with sending larger replies than expected?
> >
> > Nico, it has always been possible in the past to increase the size of
any
> buffer in a request, or in a reply (if the total reply size will fit into
> the pre-allocated reply buffer).  An older peer would just ignore the bytes
> beyond the known part of the buffer.
> >
> I think that question don''t about rebalance buffers size in
message,
> i think that sending large reply in smaller reply buffer.
> LNet don''t able to put large reply to small buffer (without
truncate flag,
> which is not exist in older ptlrpc version).
> without that flag you will see messages
> >>
>                CERROR("Matching packet from %s, match "LPU64
>                       " length %d too big: %d left, %d
allowed\n",
>                       libcfs_id2str(src), match_bits, rlength,
>                       md->md_length - offset, mlength);
> >>
> and LNet will drop message without notify PtlRPC.
>
>
> > Is that not true with the 2.x RPC handling?
> >
> 2.x able to rebalance space between buffers (but looks by hand), and able
> adjust reply buffer after truncated reply.
>
>
>
> --------------------------------------
> Alexey Lyashkov
> alexey.lyashkov at clusterstor.com
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/8fa823d6/attachment.html

Alexey Lyashkov

2010-Oct-14 15:10 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Vilobh,

as i see, you touched code related to locking. struct ldm_request used to lock
enqueue process - that why i say about interop issue in ELC code, which solved
with export flag.
for common mdc requests you can resolve interop issue with flags in mdc_body
(mdt_body), but that not possible for ldlm requests.
 

On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:
> Hi Alexey,
> 
> Thanks again for your reply.
> 
> I am trying to embed a buffer in the RPC which will get filled in with some
values which MDS is aware of which the client calling the RPC is not aware of.It
has nothing to do with locking.I just want to fill in the buffer which I embedd
in the RPC with some suitable data from the MDS end and then do operations on
that data at the client side.So I think the approach suggested by you and
Nicholas of just including the sizeof(str) [the size of the expected information
from the MDS] in the size[] array should be fine as done below :-
> 
> 
> 
> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>                                     [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) };
> 
> ---->> 
>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>                                   [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) ,
>                                   //how to add "char *str=Hello"
ofcourse we will have sizeof(str) but how to choose the MACRO like
DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of such
MACROS
>  
> 
> Please correct me if I am wrong or please guide me if I need to consider
few corner cases to handle this use case.
> 
> Thanks again.
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> 
> 
> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <alexey.lyashkov at
clusterstor.com> wrote:
> Andreas,
> 
> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
> 
> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote:
> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
> >>>> Yes, it''s possible to add buffers to requests. 
It''s not possible to add
> >>>> buffers to _replies_ to existing RPCs unless you know the
client expects
> >>>> those additional buffers -- existing clients expect a
given maxsize for
> >>>> each reply, and if your reply is bigger then it will get
dropped.
> >>> It is wrong for last ~1year.
> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer
for reply, and resend a request.
> >>
> >> Ah, I didn''t know that was in 1.8.  Are there interop
issues (with older
> >> clients) though with sending larger replies than expected?
> >
> > Nico, it has always been possible in the past to increase the size of
any buffer in a request, or in a reply (if the total reply size will fit into
the pre-allocated reply buffer).  An older peer would just ignore the bytes
beyond the known part of the buffer.
> >
> I think that question don''t about rebalance buffers size in
message,
> i think that sending large reply in smaller reply buffer.
> LNet don''t able to put large reply to small buffer (without
truncate flag, which is not exist in older ptlrpc version).
> without that flag you will see messages
> >>
>                CERROR("Matching packet from %s, match "LPU64
>                       " length %d too big: %d left, %d
allowed\n",
>                       libcfs_id2str(src), match_bits, rlength,
>                       md->md_length - offset, mlength);
> >>
> and LNet will drop message without notify PtlRPC.
> 
> 
> > Is that not true with the 2.x RPC handling?
> >
> 2.x able to rebalance space between buffers (but looks by hand), and able
adjust reply buffer after truncated reply.
> 
> 
> 
> --------------------------------------
> Alexey Lyashkov
> alexey.lyashkov at clusterstor.com
> 
> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/46135b50/attachment-0001.html

Vilobh Meshram

2010-Oct-14 15:29 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Alexey,

Thanks again for the reply.

Can you briefly give me some pointers about this interop issue and in which
kind of RPC should this issue arise ? How should we resolve this what kind
of flag needs to be set in ?

I went through the bugzilla entry mentioned by you it seems like for RPCs
dealing with LDLM may cause this issue.Please correct me if I am wrong.

Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*


On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
> Hi Vilobh,
>
> as i see, you touched code related to locking. struct ldm_request used to
> lock enqueue process - that why i say about interop issue in ELC code,
which
> solved with export flag.
> for common mdc requests you can resolve interop issue with flags in
> mdc_body (mdt_body), but that not possible for ldlm requests.
>
>
> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:
>
> Hi Alexey,
>
> Thanks again for your reply.
>
> I am trying to embed a buffer in the RPC which will get filled in with some
> values which MDS is aware of which the client calling the RPC is not aware
> of.It has nothing to do with locking.I just want to fill in the buffer
> which I embedd in the RPC with some suitable data from the MDS end and then
> do operations on that data at the client side.So I think the approach
> suggested by you and Nicholas of just including the sizeof(str) [the size
of
> the expected information from the MDS] in the size[] array should be fine
as
> done below :-
>
>
>
> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>                                     [DLM_LOCKREQ_OFF]     = sizeof(struct
> ldlm_request) };
>
> ---->>
>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>                                   [DLM_LOCKREQ_OFF]     = sizeof(struct
> ldlm_request) ,
>                                   //how to add "char *str=Hello"
ofcourse
> we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF
> bcz for a specific kind of RPC there are limited number of such MACROS
>
>
> *Please correct me if I am wrong or please guide me if I need to consider
> few corner cases to handle this use case.
>
> *Thanks again.
>
> Thanks,
> Vilobh
> *Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio*
>
>
> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <
> alexey.lyashkov at clusterstor.com> wrote:
>
>> Andreas,
>>
>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
>>
>> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov
wrote:
>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>> >>>> Yes, it''s possible to add buffers to
requests.  It''s not possible to
>> add
>> >>>> buffers to _replies_ to existing RPCs unless you know
the client
>> expects
>> >>>> those additional buffers -- existing clients expect a
given maxsize
>> for
>> >>>> each reply, and if your reply is bigger then it will
get dropped.
>> >>> It is wrong for last ~1year.
>> >>> ~1year ago i add code to ptlrpc layer which a adjust
buffer for reply,
>> and resend a request.
>> >>
>> >> Ah, I didn''t know that was in 1.8.  Are there interop
issues (with
>> older
>> >> clients) though with sending larger replies than expected?
>> >
>> > Nico, it has always been possible in the past to increase the size
of
>> any buffer in a request, or in a reply (if the total reply size will
fit
>> into the pre-allocated reply buffer).  An older peer would just ignore
the
>> bytes beyond the known part of the buffer.
>> >
>> I think that question don''t about rebalance buffers size in
message,
>> i think that sending large reply in smaller reply buffer.
>> LNet don''t able to put large reply to small buffer (without
truncate flag,
>> which is not exist in older ptlrpc version).
>> without that flag you will see messages
>> >>
>>                CERROR("Matching packet from %s, match "LPU64
>>                       " length %d too big: %d left, %d
allowed\n",
>>                       libcfs_id2str(src), match_bits, rlength,
>>                       md->md_length - offset, mlength);
>> >>
>> and LNet will drop message without notify PtlRPC.
>>
>>
>> > Is that not true with the 2.x RPC handling?
>> >
>> 2.x able to rebalance space between buffers (but looks by hand), and
able
>> adjust reply buffer after truncated reply.
>>
>>
>>
>> --------------------------------------
>> Alexey Lyashkov
>> alexey.lyashkov at clusterstor.com
>>
>>
>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/1f1f85b3/attachment.html

Alexey Lyashkov

2010-Oct-14 15:45 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Vilobh,

interop == interoperability between nodes with different version of software.

in general we have two ways to solve that - for requests with mdc_body - you can
set flag in body and analyze that flag in server/client side.
if you want add new operation - better way add new flag into  connect_data 
(look to OBD_CONNECT_* macroses handling)
that flag can checked via export->connect_flags on client or server side for
remote side features.
as example 1.x and 2.0 have a different format for setattr requests :
int mdc_setattr
...
       if (mdc_exp_is_2_0_server(exp)) {
                size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr);
                size[REQ_REC_OFF + 1] = 0; /* capa */
                size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch);
                size[REQ_REC_OFF + 3] = ealen;
                size[REQ_REC_OFF + 4] = ea2len;
                size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request);
                offset = REQ_REC_OFF + 5;
                bufcount = 6;
                replybufcount = 6;
        } else {
                bufcount = 4;
        }
 
example of client features are checking version based recovery support for
client
mds_version_get_check
...
        if (inode == NULL || !exp_connect_vbr(req->rq_export))


I hope that help you.


On Oct 14, 2010, at 18:29, Vilobh Meshram wrote:
> Hi Alexey,
> 
> Thanks again for the reply.
> 
> Can you briefly give me some pointers about this interop issue and in which
kind of RPC should this issue arise ? How should we resolve this what kind of
flag needs to be set in ?
> 
> I went through the bugzilla entry mentioned by you it seems like for RPCs
dealing with LDLM may cause this issue.Please correct me if I am wrong.
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> 
> 
> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <alexey.lyashkov at
clusterstor.com> wrote:
> Hi Vilobh,
> 
> as i see, you touched code related to locking. struct ldm_request used to
lock enqueue process - that why i say about interop issue in ELC code, which
solved with export flag.
> for common mdc requests you can resolve interop issue with flags in
mdc_body (mdt_body), but that not possible for ldlm requests.
>  
> 
> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:
> 
>> Hi Alexey,
>> 
>> Thanks again for your reply.
>> 
>> I am trying to embed a buffer in the RPC which will get filled in with
some values which MDS is aware of which the client calling the RPC is not aware
of.It has nothing to do with locking.I just want to fill in the buffer which I
embedd in the RPC with some suitable data from the MDS end and then do
operations on that data at the client side.So I think the approach suggested by
you and Nicholas of just including the sizeof(str) [the size of the expected
information from the MDS] in the size[] array should be fine as done below :-
>> 
>> 
>> 
>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>>                                     [DLM_LOCKREQ_OFF]     =
sizeof(struct ldlm_request) };
>> 
>> ---->> 
>>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>>                                   [DLM_LOCKREQ_OFF]     = sizeof(struct
ldlm_request) ,
>>                                   //how to add "char
*str=Hello" ofcourse we will have sizeof(str) but how to choose the MACRO
like DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of
such MACROS
>>  
>> 
>> Please correct me if I am wrong or please guide me if I need to
consider few corner cases to handle this use case.
>> 
>> Thanks again.
>> 
>> Thanks,
>> Vilobh
>> Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio
>> 
>> 
>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <alexey.lyashkov
at clusterstor.com> wrote:
>> Andreas,
>> 
>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
>> 
>> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov
wrote:
>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>> >>>> Yes, it''s possible to add buffers to
requests.  It''s not possible to add
>> >>>> buffers to _replies_ to existing RPCs unless you know
the client expects
>> >>>> those additional buffers -- existing clients expect a
given maxsize for
>> >>>> each reply, and if your reply is bigger then it will
get dropped.
>> >>> It is wrong for last ~1year.
>> >>> ~1year ago i add code to ptlrpc layer which a adjust
buffer for reply, and resend a request.
>> >>
>> >> Ah, I didn''t know that was in 1.8.  Are there interop
issues (with older
>> >> clients) though with sending larger replies than expected?
>> >
>> > Nico, it has always been possible in the past to increase the size
of any buffer in a request, or in a reply (if the total reply size will fit into
the pre-allocated reply buffer).  An older peer would just ignore the bytes
beyond the known part of the buffer.
>> >
>> I think that question don''t about rebalance buffers size in
message,
>> i think that sending large reply in smaller reply buffer.
>> LNet don''t able to put large reply to small buffer (without
truncate flag, which is not exist in older ptlrpc version).
>> without that flag you will see messages
>> >>
>>                CERROR("Matching packet from %s, match "LPU64
>>                       " length %d too big: %d left, %d
allowed\n",
>>                       libcfs_id2str(src), match_bits, rlength,
>>                       md->md_length - offset, mlength);
>> >>
>> and LNet will drop message without notify PtlRPC.
>> 
>> 
>> > Is that not true with the 2.x RPC handling?
>> >
>> 2.x able to rebalance space between buffers (but looks by hand), and
able adjust reply buffer after truncated reply.
>> 
>> 
>> 
>> --------------------------------------
>> Alexey Lyashkov
>> alexey.lyashkov at clusterstor.com
>> 
>> 
>> 
>> 
>> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/9e2aae97/attachment-0001.html

Vilobh Meshram

2010-Oct-14 16:25 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Alexey,

That surely helps.Thanks for all the help till now.

Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*


On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
> Hi Vilobh,
>
> interop == interoperability between nodes with different version of
> software.
>
> in general we have two ways to solve that - for requests with mdc_body -
> you can set flag in body and analyze that flag in server/client side.
> if you want add new operation - better way add new flag into  connect_data
>  (look to OBD_CONNECT_* macroses handling)
> that flag can checked via export->connect_flags on client or server side
> for remote side features.
> as example 1.x and 2.0 have a different format for setattr requests :
> int mdc_setattr
> ...
>        if (mdc_exp_is_2_0_server(exp)) {
>
>                 size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr);
>
>                 size[REQ_REC_OFF + 1] = 0; /* capa */
>
>                 size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch);
>
>                 size[REQ_REC_OFF + 3] = ealen;
>
>                 size[REQ_REC_OFF + 4] = ea2len;
>
>                 size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request);
>
>                 offset = REQ_REC_OFF + 5;
>
>                 bufcount = 6;
>
>                 replybufcount = 6;
>
>         } else {
>
>                 bufcount = 4;
>
>         }
>
>
> example of client features are checking version based recovery support for
> client
> mds_version_get_check
> ...
>         if (inode == NULL || !exp_connect_vbr(req->rq_export))
>
>
>
> I hope that help you.
>
>
> On Oct 14, 2010, at 18:29, Vilobh Meshram wrote:
>
> Hi Alexey,
>
> Thanks again for the reply.
>
> Can you briefly give me some pointers about this interop issue and in which
> kind of RPC should this issue arise ? How should we resolve this what kind
> of flag needs to be set in ?
>
> I went through the bugzilla entry mentioned by you it seems like for RPCs
> dealing with LDLM may cause this issue.Please correct me if I am wrong.
>
> Thanks,
> Vilobh
> *Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio*
>
>
> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <
> alexey.lyashkov at clusterstor.com> wrote:
>
>> Hi Vilobh,
>>
>> as i see, you touched code related to locking. struct ldm_request used
to
>> lock enqueue process - that why i say about interop issue in ELC code,
which
>> solved with export flag.
>> for common mdc requests you can resolve interop issue with flags in
>> mdc_body (mdt_body), but that not possible for ldlm requests.
>>
>>
>> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:
>>
>> Hi Alexey,
>>
>> Thanks again for your reply.
>>
>> I am trying to embed a buffer in the RPC which will get filled in with
>> some values which MDS is aware of which the client calling the RPC is
not
>> aware of.It has nothing to do with locking.I just want to fill in the
>> buffer which I embedd in the RPC with some suitable data from the MDS
end
>> and then do operations on that data at the client side.So I think the
>> approach suggested by you and Nicholas of just including the
sizeof(str)
>> [the size of the expected information from the MDS] in the size[] array
>> should be fine as done below :-
>>
>>
>>
>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>>                                     [DLM_LOCKREQ_OFF]     =
sizeof(struct
>> ldlm_request) };
>>
>> ---->>
>>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>>                                   [DLM_LOCKREQ_OFF]     = sizeof(struct
>> ldlm_request) ,
>>                                   //how to add "char
*str=Hello" ofcourse
>> we will have sizeof(str) but how to choose the MACRO like
DLM_LOCKREQ_OFF
>> bcz for a specific kind of RPC there are limited number of such MACROS
>>
>>
>> *Please correct me if I am wrong or please guide me if I need to
consider
>> few corner cases to handle this use case.
>>
>> *Thanks again.
>>
>> Thanks,
>> Vilobh
>> *Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio*
>>
>>
>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <
>> alexey.lyashkov at clusterstor.com> wrote:
>>
>>> Andreas,
>>>
>>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
>>>
>>> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
>>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov
wrote:
>>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>>> >>>> Yes, it''s possible to add buffers to
requests.  It''s not possible to
>>> add
>>> >>>> buffers to _replies_ to existing RPCs unless you
know the client
>>> expects
>>> >>>> those additional buffers -- existing clients
expect a given maxsize
>>> for
>>> >>>> each reply, and if your reply is bigger then it
will get dropped.
>>> >>> It is wrong for last ~1year.
>>> >>> ~1year ago i add code to ptlrpc layer which a adjust
buffer for
>>> reply, and resend a request.
>>> >>
>>> >> Ah, I didn''t know that was in 1.8.  Are there
interop issues (with
>>> older
>>> >> clients) though with sending larger replies than expected?
>>> >
>>> > Nico, it has always been possible in the past to increase the
size of
>>> any buffer in a request, or in a reply (if the total reply size
will fit
>>> into the pre-allocated reply buffer).  An older peer would just
ignore the
>>> bytes beyond the known part of the buffer.
>>> >
>>> I think that question don''t about rebalance buffers size
in message,
>>> i think that sending large reply in smaller reply buffer.
>>> LNet don''t able to put large reply to small buffer
(without truncate
>>> flag, which is not exist in older ptlrpc version).
>>> without that flag you will see messages
>>> >>
>>>                CERROR("Matching packet from %s, match
"LPU64
>>>                       " length %d too big: %d left, %d
allowed\n",
>>>                       libcfs_id2str(src), match_bits, rlength,
>>>                       md->md_length - offset, mlength);
>>> >>
>>> and LNet will drop message without notify PtlRPC.
>>>
>>>
>>> > Is that not true with the 2.x RPC handling?
>>> >
>>> 2.x able to rebalance space between buffers (but looks by hand),
and able
>>> adjust reply buffer after truncated reply.
>>>
>>>
>>>
>>> --------------------------------------
>>> Alexey Lyashkov
>>> alexey.lyashkov at clusterstor.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/83686fd2/attachment.html

Vilobh Meshram

2010-Oct-15 00:58 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Alexey/Nicholas,

I modified the code in following way in the way Nicholas suggested yesterday
in-order to get some information filled in a fixed sized buffer sent from
client side.Here I am sending a buffer called "str" (whose size is 16)
which
will be updated at the MDS side by the string "hello"(whose size is 7
much
less than original size of buffer "str" i.e 16).But I am not able to
perform
the operation successfully and I am getting an error
"LustreError: 4209:0:(file.c:3143:ll_inode_revalidate_fini()) failure -14
inode 31257"

which seems to be related to  DLM_REPLY_REC_OFF since I have modified this
offset in my code.Can you please review my code and suggest me if I am
making any mistake.I will be done with my task if I can resolve this
problem.

Following are the modifications .The text in BOLD and Italics (blue color)
are my modification at Client and MDS side for *Lustre 1.8.1.1*:-

*At Client side :- lustre/ldlm/ldlm_lockd.c**

* 655 int ldlm_cli_enqueue(.........)
 665         __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
 666                         [DLM_LOCKREQ_OFF]     = sizeof(*body),
 667                         [DLM_REPLY_REC_OFF]   = lvb_len ? lvb_len :
 668                                                 sizeof(struct ost_lvb),
* 669                                                 16};*

 717         if (reqp == NULL || *reqp == NULL) {
 *718                 req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0);
                                                               |
                                                              |
                                                             v

                      575 struct ptlrpc_request *ldlm_prep_elc_req(.......)
                      584         void *str=NULL;
                      585         char *bufs[4] = {NULL,NULL,NULL,str};
                      616         req ptlrpc_prep_req(class_exp2cliimp(exp),
version,
                      617                               opc, bufcount, size,
bufs**);


At MDS side :- lustre/ldlm/ldlm_lockd.c

 992 int ldlm_handle_enqueue(.........)
 996 {
1000         void *str;
         __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
                         [DLM_LOCKREPLY_OFF]   = sizeof(*dlm_rep)
1009         char *org = "hello";


*1119 existing_lock:
1120
1121         if (flags & LDLM_FL_HAS_INTENT) {
1122                 /* In this case, the reply buffer is allocated deep in
1123                  * local_lock_enqueue by the policy function. */
1124                 cookie = req;
1125         } else {
*1126                 int buffers = 4;*
1127
1128                 lock_res_and_lock(lock);
1129                 if (lock->l_resource->lr_lvb_len) {
*                       size[DLM_REPLY_REC_OFF]
lock->l_resource->lr_lvb_len;
                       buffers = 4;*
1132                 }
1133                 unlock_res_and_lock(lock);
1134
1135                 if
(OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR))
1136                         GOTO(out, rc = -ENOMEM);
*             str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1, 1);
             memcpy ( str , org , 7);
             size[DLM_REPLY_REC_OFF + 1] = 16;


*

Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*


On Thu, Oct 14, 2010 at 12:25 PM, Vilobh Meshram
<vilobh.meshram at gmail.com>wrote:
> Hi Alexey,
>
> That surely helps.Thanks for all the help till now.
>
> Thanks,
> Vilobh
> *Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio*
>
>
> On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov <
> alexey.lyashkov at clusterstor.com> wrote:
>
>> Hi Vilobh,
>>
>> interop == interoperability between nodes with different version of
>> software.
>>
>> in general we have two ways to solve that - for requests with mdc_body
-
>> you can set flag in body and analyze that flag in server/client side.
>> if you want add new operation - better way add new flag into 
connect_data
>>  (look to OBD_CONNECT_* macroses handling)
>> that flag can checked via export->connect_flags on client or server
side
>> for remote side features.
>> as example 1.x and 2.0 have a different format for setattr requests :
>> int mdc_setattr
>> ...
>>        if (mdc_exp_is_2_0_server(exp)) {
>>
>>                 size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr);
>>
>>                 size[REQ_REC_OFF + 1] = 0; /* capa */
>>
>>                 size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch);
>>
>>                 size[REQ_REC_OFF + 3] = ealen;
>>
>>                 size[REQ_REC_OFF + 4] = ea2len;
>>
>>                 size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request);
>>
>>                 offset = REQ_REC_OFF + 5;
>>
>>                 bufcount = 6;
>>
>>                 replybufcount = 6;
>>
>>         } else {
>>
>>                 bufcount = 4;
>>
>>         }
>>
>>
>> example of client features are checking version based recovery support
for
>> client
>> mds_version_get_check
>> ...
>>         if (inode == NULL || !exp_connect_vbr(req->rq_export))
>>
>>
>>
>> I hope that help you.
>>
>>
>> On Oct 14, 2010, at 18:29, Vilobh Meshram wrote:
>>
>> Hi Alexey,
>>
>> Thanks again for the reply.
>>
>> Can you briefly give me some pointers about this interop issue and in
>> which kind of RPC should this issue arise ? How should we resolve this
what
>> kind of flag needs to be set in ?
>>
>> I went through the bugzilla entry mentioned by you it seems like for
RPCs
>> dealing with LDLM may cause this issue.Please correct me if I am wrong.
>>
>> Thanks,
>> Vilobh
>> *Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio*
>>
>>
>> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <
>> alexey.lyashkov at clusterstor.com> wrote:
>>
>>> Hi Vilobh,
>>>
>>> as i see, you touched code related to locking. struct ldm_request
used to
>>> lock enqueue process - that why i say about interop issue in ELC
code, which
>>> solved with export flag.
>>> for common mdc requests you can resolve interop issue with flags in
>>> mdc_body (mdt_body), but that not possible for ldlm requests.
>>>
>>>
>>> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:
>>>
>>> Hi Alexey,
>>>
>>> Thanks again for your reply.
>>>
>>> I am trying to embed a buffer in the RPC which will get filled in
with
>>> some values which MDS is aware of which the client calling the RPC
is not
>>> aware of.It has nothing to do with locking.I just want to fill in
the
>>> buffer which I embedd in the RPC with some suitable data from the
MDS end
>>> and then do operations on that data at the client side.So I think
the
>>> approach suggested by you and Nicholas of just including the
sizeof(str)
>>> [the size of the expected information from the MDS] in the size[]
array
>>> should be fine as done below :-
>>>
>>>
>>>
>>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>>>                                     [DLM_LOCKREQ_OFF]     =
sizeof(struct
>>> ldlm_request) };
>>>
>>> ---->>
>>>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
>>> ptlrpc_body),
>>>                                   [DLM_LOCKREQ_OFF]     =
sizeof(struct
>>> ldlm_request) ,
>>>                                   //how to add "char
*str=Hello" ofcourse
>>> we will have sizeof(str) but how to choose the MACRO like
DLM_LOCKREQ_OFF
>>> bcz for a specific kind of RPC there are limited number of such
MACROS
>>>
>>>
>>> *Please correct me if I am wrong or please guide me if I need to
>>> consider few corner cases to handle this use case.
>>>
>>> *Thanks again.
>>>
>>> Thanks,
>>> Vilobh
>>> *Graduate Research Associate
>>> Department of Computer Science
>>> The Ohio State University Columbus Ohio*
>>>
>>>
>>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <
>>> alexey.lyashkov at clusterstor.com> wrote:
>>>
>>>> Andreas,
>>>>
>>>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
>>>>
>>>> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
>>>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey
Lyashkov wrote:
>>>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>>>> >>>> Yes, it''s possible to add buffers to
requests.  It''s not possible
>>>> to add
>>>> >>>> buffers to _replies_ to existing RPCs unless
you know the client
>>>> expects
>>>> >>>> those additional buffers -- existing clients
expect a given maxsize
>>>> for
>>>> >>>> each reply, and if your reply is bigger then
it will get dropped.
>>>> >>> It is wrong for last ~1year.
>>>> >>> ~1year ago i add code to ptlrpc layer which a
adjust buffer for
>>>> reply, and resend a request.
>>>> >>
>>>> >> Ah, I didn''t know that was in 1.8.  Are there
interop issues (with
>>>> older
>>>> >> clients) though with sending larger replies than
expected?
>>>> >
>>>> > Nico, it has always been possible in the past to increase
the size of
>>>> any buffer in a request, or in a reply (if the total reply size
will fit
>>>> into the pre-allocated reply buffer).  An older peer would just
ignore the
>>>> bytes beyond the known part of the buffer.
>>>> >
>>>> I think that question don''t about rebalance buffers
size in message,
>>>> i think that sending large reply in smaller reply buffer.
>>>> LNet don''t able to put large reply to small buffer
(without truncate
>>>> flag, which is not exist in older ptlrpc version).
>>>> without that flag you will see messages
>>>> >>
>>>>                CERROR("Matching packet from %s, match
"LPU64
>>>>                       " length %d too big: %d left, %d
allowed\n",
>>>>                       libcfs_id2str(src), match_bits, rlength,
>>>>                       md->md_length - offset, mlength);
>>>> >>
>>>> and LNet will drop message without notify PtlRPC.
>>>>
>>>>
>>>> > Is that not true with the 2.x RPC handling?
>>>> >
>>>> 2.x able to rebalance space between buffers (but looks by
hand), and
>>>> able adjust reply buffer after truncated reply.
>>>>
>>>>
>>>>
>>>> --------------------------------------
>>>> Alexey Lyashkov
>>>> alexey.lyashkov at clusterstor.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/302f016d/attachment-0001.html

Alexey Lyashkov

2010-Oct-15 07:39 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

can you please attach diff file ?

On Oct 15, 2010, at 03:58, Vilobh Meshram wrote:
> Hi Alexey/Nicholas, 
> 
> I modified the code in following way in the way Nicholas suggested
yesterday in-order to get some information filled in a fixed sized buffer sent
from client side.Here I am sending a buffer called "str" (whose size
is 16) which will be updated at the MDS side by the string
"hello"(whose size is 7 much less than original size of buffer
"str" i.e 16).But I am not able to perform the operation successfully
and I am getting an error
> "LustreError: 4209:0:(file.c:3143:ll_inode_revalidate_fini()) failure
-14 inode 31257"
> 
> which seems to be related to  DLM_REPLY_REC_OFF since I have modified this
offset in my code.Can you please review my code and suggest me if I am making
any mistake.I will be done with my task if I can resolve this problem.
> 
> Following are the modifications .The text in BOLD and Italics (blue color)
are my modification at Client and MDS side for Lustre 1.8.1.1:-
> 
> At Client side :- lustre/ldlm/ldlm_lockd.c
> 
>  655 int ldlm_cli_enqueue(.........)
>  665         __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>  666                         [DLM_LOCKREQ_OFF]     = sizeof(*body),
>  667                         [DLM_REPLY_REC_OFF]   = lvb_len ? lvb_len :
>  668                                                 sizeof(struct
ost_lvb),
>  669                                                 16};
> 
>  717         if (reqp == NULL || *reqp == NULL) {
>  718                 req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0);
>                                                                |
>                                                               |
>                                                              v
> 
>                       575 struct ptlrpc_request *ldlm_prep_elc_req(.......)
>                       584         void *str=NULL;
>                       585         char *bufs[4] = {NULL,NULL,NULL,str};
>                       616         req =
ptlrpc_prep_req(class_exp2cliimp(exp), version,
>                       617                               opc, bufcount,
size, bufs);
> 
> 
> At MDS side :- lustre/ldlm/ldlm_lockd.c
> 
>  992 int ldlm_handle_enqueue(.........)
>  996 {
> 1000         void *str;
>          __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>                          [DLM_LOCKREPLY_OFF]   = sizeof(*dlm_rep)
> 1009         char *org = "hello";
> 
> 
> 1119 existing_lock:
> 1120 
> 1121         if (flags & LDLM_FL_HAS_INTENT) {
> 1122                 /* In this case, the reply buffer is allocated deep in
> 1123                  * local_lock_enqueue by the policy function. */
> 1124                 cookie = req;
> 1125         } else {
> 1126                 int buffers = 4;
> 1127 
> 1128                 lock_res_and_lock(lock);
> 1129                 if (lock->l_resource->lr_lvb_len) {
>                        size[DLM_REPLY_REC_OFF] =
lock->l_resource->lr_lvb_len;
>                        buffers = 4;
> 1132                 }
> 1133                 unlock_res_and_lock(lock);
> 1134 
> 1135                 if
(OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR))
> 1136                         GOTO(out, rc = -ENOMEM);
>              str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1,
1);
>              memcpy ( str , org , 7);
>              size[DLM_REPLY_REC_OFF + 1] = 16;
> 
> 
> 
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> 
> 
> On Thu, Oct 14, 2010 at 12:25 PM, Vilobh Meshram <vilobh.meshram at
gmail.com> wrote:
> Hi Alexey,
> 
> That surely helps.Thanks for all the help till now.
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> 
> 
> On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov <alexey.lyashkov at
clusterstor.com> wrote:
> Hi Vilobh,
> 
> interop == interoperability between nodes with different version of
software.
> 
> in general we have two ways to solve that - for requests with mdc_body -
you can set flag in body and analyze that flag in server/client side.
> if you want add new operation - better way add new flag into  connect_data 
(look to OBD_CONNECT_* macroses handling)
> that flag can checked via export->connect_flags on client or server side
for remote side features.
> as example 1.x and 2.0 have a different format for setattr requests :
> int mdc_setattr
> ...
>        if (mdc_exp_is_2_0_server(exp)) {
>                 size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr);
>                 size[REQ_REC_OFF + 1] = 0; /* capa */
>                 size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch);
>                 size[REQ_REC_OFF + 3] = ealen;
>                 size[REQ_REC_OFF + 4] = ea2len;
>                 size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request);
>                 offset = REQ_REC_OFF + 5;
>                 bufcount = 6;
>                 replybufcount = 6;
>         } else {
>                 bufcount = 4;
>         }
>  
> example of client features are checking version based recovery support for
client
> mds_version_get_check
> ...
>         if (inode == NULL || !exp_connect_vbr(req->rq_export))
> 
> 
> I hope that help you.
> 
> 
> On Oct 14, 2010, at 18:29, Vilobh Meshram wrote:
> 
>> Hi Alexey,
>> 
>> Thanks again for the reply.
>> 
>> Can you briefly give me some pointers about this interop issue and in
which kind of RPC should this issue arise ? How should we resolve this what kind
of flag needs to be set in ?
>> 
>> I went through the bugzilla entry mentioned by you it seems like for
RPCs dealing with LDLM may cause this issue.Please correct me if I am wrong.
>> 
>> Thanks,
>> Vilobh
>> Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio
>> 
>> 
>> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <alexey.lyashkov
at clusterstor.com> wrote:
>> Hi Vilobh,
>> 
>> as i see, you touched code related to locking. struct ldm_request used
to lock enqueue process - that why i say about interop issue in ELC code, which
solved with export flag.
>> for common mdc requests you can resolve interop issue with flags in
mdc_body (mdt_body), but that not possible for ldlm requests.
>>  
>> 
>> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:
>> 
>>> Hi Alexey,
>>> 
>>> Thanks again for your reply.
>>> 
>>> I am trying to embed a buffer in the RPC which will get filled in
with some values which MDS is aware of which the client calling the RPC is not
aware of.It has nothing to do with locking.I just want to fill in the buffer
which I embedd in the RPC with some suitable data from the MDS end and then do
operations on that data at the client side.So I think the approach suggested by
you and Nicholas of just including the sizeof(str) [the size of the expected
information from the MDS] in the size[] array should be fine as done below :-
>>> 
>>> 
>>> 
>>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>>>                                     [DLM_LOCKREQ_OFF]     =
sizeof(struct ldlm_request) };
>>> 
>>> ---->> 
>>>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>>>                                   [DLM_LOCKREQ_OFF]     =
sizeof(struct ldlm_request) ,
>>>                                   //how to add "char
*str=Hello" ofcourse we will have sizeof(str) but how to choose the MACRO
like DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of
such MACROS
>>>  
>>> 
>>> Please correct me if I am wrong or please guide me if I need to
consider few corner cases to handle this use case.
>>> 
>>> Thanks again.
>>> 
>>> Thanks,
>>> Vilobh
>>> Graduate Research Associate
>>> Department of Computer Science
>>> The Ohio State University Columbus Ohio
>>> 
>>> 
>>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov
<alexey.lyashkov at clusterstor.com> wrote:
>>> Andreas,
>>> 
>>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
>>> 
>>> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
>>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov
wrote:
>>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>>> >>>> Yes, it''s possible to add buffers to
requests.  It''s not possible to add
>>> >>>> buffers to _replies_ to existing RPCs unless you
know the client expects
>>> >>>> those additional buffers -- existing clients
expect a given maxsize for
>>> >>>> each reply, and if your reply is bigger then it
will get dropped.
>>> >>> It is wrong for last ~1year.
>>> >>> ~1year ago i add code to ptlrpc layer which a adjust
buffer for reply, and resend a request.
>>> >>
>>> >> Ah, I didn''t know that was in 1.8.  Are there
interop issues (with older
>>> >> clients) though with sending larger replies than expected?
>>> >
>>> > Nico, it has always been possible in the past to increase the
size of any buffer in a request, or in a reply (if the total reply size will fit
into the pre-allocated reply buffer).  An older peer would just ignore the bytes
beyond the known part of the buffer.
>>> >
>>> I think that question don''t about rebalance buffers size
in message,
>>> i think that sending large reply in smaller reply buffer.
>>> LNet don''t able to put large reply to small buffer
(without truncate flag, which is not exist in older ptlrpc version).
>>> without that flag you will see messages
>>> >>
>>>                CERROR("Matching packet from %s, match
"LPU64
>>>                       " length %d too big: %d left, %d
allowed\n",
>>>                       libcfs_id2str(src), match_bits, rlength,
>>>                       md->md_length - offset, mlength);
>>> >>
>>> and LNet will drop message without notify PtlRPC.
>>> 
>>> 
>>> > Is that not true with the 2.x RPC handling?
>>> >
>>> 2.x able to rebalance space between buffers (but looks by hand),
and able adjust reply buffer after truncated reply.
>>> 
>>> 
>>> 
>>> --------------------------------------
>>> Alexey Lyashkov
>>> alexey.lyashkov at clusterstor.com
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101015/9666f350/attachment-0001.html

Vilobh Meshram

2010-Oct-15 16:25 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

Hi Alexey,

I have attached the diff file .Please have a look at it and please let me
know your comments /suggestions.

Thanks again.

Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*


On Fri, Oct 15, 2010 at 3:39 AM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
> can you please attach diff file ?
>
> On Oct 15, 2010, at 03:58, Vilobh Meshram wrote:
>
> Hi Alexey/Nicholas,
>
> I modified the code in following way in the way Nicholas suggested
> yesterday in-order to get some information filled in a fixed sized buffer
> sent from client side.Here I am sending a buffer called "str"
(whose size is
> 16) which will be updated at the MDS side by the string
"hello"(whose size
> is 7 much less than original size of buffer "str" i.e 16).But I
am not able
> to perform the operation successfully and I am getting an error
> "LustreError: 4209:0:(file.c:3143:ll_inode_revalidate_fini()) failure
-14
> inode 31257"
>
> which seems to be related to  DLM_REPLY_REC_OFF since I have modified this
> offset in my code.Can you please review my code and suggest me if I am
> making any mistake.I will be done with my task if I can resolve this
> problem.
>
> Following are the modifications .The text in BOLD and Italics (blue color)
> are my modification at Client and MDS side for *Lustre 1.8.1.1*:-
>
> *At Client side :- lustre/ldlm/ldlm_lockd.c**
>
> * 655 int ldlm_cli_enqueue(.........)
>  665         __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
> ptlrpc_body),
>  666                         [DLM_LOCKREQ_OFF]     = sizeof(*body),
>  667                         [DLM_REPLY_REC_OFF]   = lvb_len ? lvb_len :
>  668                                                 sizeof(struct
> ost_lvb),
> * 669                                                 16};*
>
>  717         if (reqp == NULL || *reqp == NULL) {
>  *718                 req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0);
>                                                                |
>                                                               |
>                                                              v
>
>                       575 struct ptlrpc_request *ldlm_prep_elc_req(.......)
>                       584         void *str=NULL;
>                       585         char *bufs[4] = {NULL,NULL,NULL,str};
>                       616         req >
ptlrpc_prep_req(class_exp2cliimp(exp), version,
>                       617                               opc, bufcount,
> size, bufs**);
>
>
> At MDS side :- lustre/ldlm/ldlm_lockd.c
>
>  992 int ldlm_handle_enqueue(.........)
>  996 {
> 1000         void *str;
>          __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
> ptlrpc_body),
>                          [DLM_LOCKREPLY_OFF]   = sizeof(*dlm_rep)
> 1009         char *org = "hello";
>
>
> *1119 existing_lock:
> 1120
> 1121         if (flags & LDLM_FL_HAS_INTENT) {
> 1122                 /* In this case, the reply buffer is allocated deep in
> 1123                  * local_lock_enqueue by the policy function. */
> 1124                 cookie = req;
> 1125         } else {
> *1126                 int buffers = 4;*
> 1127
> 1128                 lock_res_and_lock(lock);
> 1129                 if (lock->l_resource->lr_lvb_len) {
> *                       size[DLM_REPLY_REC_OFF] >
lock->l_resource->lr_lvb_len;
>                        buffers = 4;*
> 1132                 }
> 1133                 unlock_res_and_lock(lock);
> 1134
> 1135                 if
> (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR))
> 1136                         GOTO(out, rc = -ENOMEM);
> *             str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1,
> 1);
>              memcpy ( str , org , 7);
>              size[DLM_REPLY_REC_OFF + 1] = 16;
>
>
> *
>
> Thanks,
> Vilobh
> *Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio*
>
>
> On Thu, Oct 14, 2010 at 12:25 PM, Vilobh Meshram <vilobh.meshram at
gmail.com
> > wrote:
>
>> Hi Alexey,
>>
>> That surely helps.Thanks for all the help till now.
>>
>> Thanks,
>> Vilobh
>> *Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio*
>>
>>
>> On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov <
>> alexey.lyashkov at clusterstor.com> wrote:
>>
>>> Hi Vilobh,
>>>
>>> interop == interoperability between nodes with different version of
>>> software.
>>>
>>> in general we have two ways to solve that - for requests with
mdc_body -
>>> you can set flag in body and analyze that flag in server/client
side.
>>> if you want add new operation - better way add new flag into
>>>  connect_data  (look to OBD_CONNECT_* macroses handling)
>>> that flag can checked via export->connect_flags on client or
server side
>>> for remote side features.
>>> as example 1.x and 2.0 have a different format for setattr requests
:
>>> int mdc_setattr
>>> ...
>>>        if (mdc_exp_is_2_0_server(exp)) {
>>>
>>>                 size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr);
>>>
>>>                 size[REQ_REC_OFF + 1] = 0; /* capa */
>>>
>>>                 size[REQ_REC_OFF + 2] = 0; //sizeof (struct
mdt_epoch);
>>>
>>>                 size[REQ_REC_OFF + 3] = ealen;
>>>
>>>                 size[REQ_REC_OFF + 4] = ea2len;
>>>
>>>                 size[REQ_REC_OFF + 5] = sizeof(struct
ldlm_request);
>>>
>>>                 offset = REQ_REC_OFF + 5;
>>>
>>>                 bufcount = 6;
>>>
>>>                 replybufcount = 6;
>>>
>>>         } else {
>>>
>>>                 bufcount = 4;
>>>
>>>         }
>>>
>>>
>>> example of client features are checking version based recovery
support
>>> for client
>>> mds_version_get_check
>>> ...
>>>         if (inode == NULL || !exp_connect_vbr(req->rq_export))
>>>
>>>
>>>
>>> I hope that help you.
>>>
>>>
>>> On Oct 14, 2010, at 18:29, Vilobh Meshram wrote:
>>>
>>> Hi Alexey,
>>>
>>> Thanks again for the reply.
>>>
>>> Can you briefly give me some pointers about this interop issue and
in
>>> which kind of RPC should this issue arise ? How should we resolve
this what
>>> kind of flag needs to be set in ?
>>>
>>> I went through the bugzilla entry mentioned by you it seems like
for RPCs
>>> dealing with LDLM may cause this issue.Please correct me if I am
wrong.
>>>
>>> Thanks,
>>> Vilobh
>>> *Graduate Research Associate
>>> Department of Computer Science
>>> The Ohio State University Columbus Ohio*
>>>
>>>
>>> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <
>>> alexey.lyashkov at clusterstor.com> wrote:
>>>
>>>> Hi Vilobh,
>>>>
>>>> as i see, you touched code related to locking. struct
ldm_request used
>>>> to lock enqueue process - that why i say about interop issue in
ELC code,
>>>> which solved with export flag.
>>>> for common mdc requests you can resolve interop issue with
flags in
>>>> mdc_body (mdt_body), but that not possible for ldlm requests.
>>>>
>>>>
>>>> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:
>>>>
>>>> Hi Alexey,
>>>>
>>>> Thanks again for your reply.
>>>>
>>>> I am trying to embed a buffer in the RPC which will get filled
in with
>>>> some values which MDS is aware of which the client calling the
RPC is not
>>>> aware of.It has nothing to do with locking.I just want to fill
in the
>>>> buffer which I embedd in the RPC with some suitable data from
the MDS end
>>>> and then do operations on that data at the client side.So I
think the
>>>> approach suggested by you and Nicholas of just including the
sizeof(str)
>>>> [the size of the expected information from the MDS] in the
size[] array
>>>> should be fine as done below :-
>>>>
>>>>
>>>>
>>>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
ptlrpc_body),
>>>>                                     [DLM_LOCKREQ_OFF]    
>>>> sizeof(struct ldlm_request) };
>>>>
>>>> ---->>
>>>>      __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
>>>> ptlrpc_body),
>>>>                                   [DLM_LOCKREQ_OFF]     =
sizeof(struct
>>>> ldlm_request) ,
>>>>                                   //how to add "char
*str=Hello"
>>>> ofcourse we will have sizeof(str) but how to choose the MACRO
like DLM_LOCKREQ_OFF
>>>> bcz for a specific kind of RPC there are limited number of such
MACROS
>>>>
>>>>
>>>> *Please correct me if I am wrong or please guide me if I need
to
>>>> consider few corner cases to handle this use case.
>>>>
>>>> *Thanks again.
>>>>
>>>> Thanks,
>>>> Vilobh
>>>> *Graduate Research Associate
>>>> Department of Computer Science
>>>> The Ohio State University Columbus Ohio*
>>>>
>>>>
>>>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <
>>>> alexey.lyashkov at clusterstor.com> wrote:
>>>>
>>>>> Andreas,
>>>>>
>>>>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
>>>>>
>>>>> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
>>>>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey
Lyashkov wrote:
>>>>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams
wrote:
>>>>> >>>> Yes, it''s possible to add buffers
to requests.  It''s not possible
>>>>> to add
>>>>> >>>> buffers to _replies_ to existing RPCs
unless you know the client
>>>>> expects
>>>>> >>>> those additional buffers -- existing
clients expect a given
>>>>> maxsize for
>>>>> >>>> each reply, and if your reply is bigger
then it will get dropped.
>>>>> >>> It is wrong for last ~1year.
>>>>> >>> ~1year ago i add code to ptlrpc layer which a
adjust buffer for
>>>>> reply, and resend a request.
>>>>> >>
>>>>> >> Ah, I didn''t know that was in 1.8.  Are
there interop issues (with
>>>>> older
>>>>> >> clients) though with sending larger replies than
expected?
>>>>> >
>>>>> > Nico, it has always been possible in the past to
increase the size of
>>>>> any buffer in a request, or in a reply (if the total reply
size will fit
>>>>> into the pre-allocated reply buffer).  An older peer would
just ignore the
>>>>> bytes beyond the known part of the buffer.
>>>>> >
>>>>> I think that question don''t about rebalance
buffers size in message,
>>>>> i think that sending large reply in smaller reply buffer.
>>>>> LNet don''t able to put large reply to small buffer
(without truncate
>>>>> flag, which is not exist in older ptlrpc version).
>>>>> without that flag you will see messages
>>>>> >>
>>>>>                CERROR("Matching packet from %s, match
"LPU64
>>>>>                       " length %d too big: %d left, %d
allowed\n",
>>>>>                       libcfs_id2str(src), match_bits,
rlength,
>>>>>                       md->md_length - offset, mlength);
>>>>> >>
>>>>> and LNet will drop message without notify PtlRPC.
>>>>>
>>>>>
>>>>> > Is that not true with the 2.x RPC handling?
>>>>> >
>>>>> 2.x able to rebalance space between buffers (but looks by
hand), and
>>>>> able adjust reply buffer after truncated reply.
>>>>>
>>>>>
>>>>>
>>>>> --------------------------------------
>>>>> Alexey Lyashkov
>>>>> alexey.lyashkov at clusterstor.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101015/3b409614/attachment-0001.html
-------------- next part --------------
Index: lustre.spec
==================================================================---
lustre.spec	(revision 8279)
+++ lustre.spec	(working copy)
@@ -1,7 +1,7 @@
 # lustre.spec
 %{!?version: %define version 1.8.1.1}
-%{!?kversion: %define kversion }
-%{!?release: %define release }
+%{!?kversion: %define kversion 2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust}
+%{!?release: %define release
2.6.18_128.7.1.el5_lustre.1.8.1.1smp_cust_201010141227}
 %{!?lustre_name: %define lustre_name lustre}
 
 %define is_client %(bash -c "if [[ %{lustre_name} = *-client ]]; then echo
-n ''1''; else echo -n ''0''; fi")
@@ -104,7 +104,7 @@
 
 # Set an explicit path to our Linux tree, if we can.
 cd $RPM_BUILD_DIR/lustre-%{version}
-./configure ''--disable-modules''
''--disable-utils'' ''--disable-liblustre''
''--disable-tests'' ''--disable-doc''
--with-lustre-hack --with-sockets %{?configure_flags:configure_flags} \
+./configure ''--with-o2ib=/usr/local/ofed/src/ofa_kernel''
''--with-linux=/lib/modules/2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust/build''
--with-lustre-hack --with-sockets %{?configure_flags:configure_flags} \
 	--sysconfdir=%{_sysconfdir} \
 	--mandir=%{_mandir} \
 	--libdir=%{_libdir}
Index: lustre/mds/handler.c
==================================================================---
lustre/mds/handler.c	(revision 8279)
+++ lustre/mds/handler.c	(working copy)
@@ -1687,7 +1687,7 @@
                                mds->mds_max_mdsize,
                                mds->mds_max_cookiesize };
                 int bufcount;
-
+                printk("Inside function %s a hit for case
MDS_REINT",__func__);
                 /* NB only peek inside req now; mds_reint() will swab it */
                 if (opcp == NULL) {
                         CERROR ("Can''t inspect opcode\n");
@@ -1704,15 +1704,18 @@
 
                 switch (opc) {
                 case REINT_CREATE:
+                        printk("Inside function %s a hit for case
REINT_CREATE",__func__);
                         op = PTLRPC_LAST_CNTR + MDS_REINT_CREATE;
                         break;
                 case REINT_LINK:
                         op = PTLRPC_LAST_CNTR + MDS_REINT_LINK;
                         break;
                 case REINT_OPEN:
+                        printk("Inside function %s a hit for case
REINT_OPEN",__func__);
                         op = PTLRPC_LAST_CNTR + MDS_REINT_OPEN;
                         break;
                 case REINT_SETATTR:
+                        printk("Inside function %s a hit for case
REINT_SETATTR",__func__);
                         op = PTLRPC_LAST_CNTR + MDS_REINT_SETATTR;
                         break;
                 case REINT_RENAME:
@@ -1745,8 +1748,9 @@
                         if (opc == REINT_UNLINK || opc == REINT_RENAME)
                                 size[DLM_REPLY_REC_OFF + 1] = 0;
                 }
-
+                printk("Inside function %s in case MDS_REINT before
calling lustre_pack_reply",__func__);
                 rc = lustre_pack_reply(req, bufcount, size, NULL);
+                printk("Inside function %s in case MDS_REINT after calling
lustre_pack_reply",__func__);
                 if (rc)
                         break;
 
@@ -1756,6 +1760,7 @@
         }
 
         case MDS_CLOSE:
+                printk("Inside function %s in case
MDS_CLOSE",__func__);
                 DEBUG_REQ(D_INODE, req, "close");
                 OBD_FAIL_RETURN(OBD_FAIL_MDS_CLOSE_NET, 0);
                 rc = mds_close(req, REQ_REC_OFF);
@@ -1798,6 +1803,7 @@
                 break;
 #endif
         case OBD_PING:
+                printk("Inside function %s got a hit at case
OBD_PING",__func__);
                 DEBUG_REQ(D_INODE, req, "ping");
                 rc = target_handle_ping(req);
                 if (req->rq_export->exp_delayed)
@@ -1811,6 +1817,7 @@
                 break;
 
         case LDLM_ENQUEUE:
+                printk("\n Inside function %s got a hit at case
LDLM_ENQUEUE",__func__);
                 DEBUG_REQ(D_INODE, req, "enqueue");
                 OBD_FAIL_RETURN(OBD_FAIL_LDLM_ENQUEUE, 0);
                 rc = ldlm_handle_enqueue(req, ldlm_server_completion_ast,
Index: lustre/ldlm/ldlm_request.c
==================================================================---
lustre/ldlm/ldlm_request.c	(revision 8279)
+++ lustre/ldlm/ldlm_request.c	(working copy)
@@ -581,6 +581,8 @@
         int flags, avail, to_free, pack = 0;
         struct ldlm_request *dlm = NULL;
         struct ptlrpc_request *req;
+        void *str=NULL;
+        char *bufs[4] = {NULL,NULL,NULL,str};
         CFS_LIST_HEAD(head);
         ENTRY;
 
@@ -609,8 +611,10 @@
                 size[bufoff] = ldlm_request_bufsize(pack, opc);
         }
 
+        printk("\n Inside function %s before calling
ptlrpc_prep_req",__func__);
+        printk("\n OPC for LDLM_ENQUEUE is %d",opc);
         req = ptlrpc_prep_req(class_exp2cliimp(exp), version,
-                              opc, bufcount, size, NULL);
+                              opc, bufcount, size, bufs);
         req->rq_export = class_export_get(exp);
         if (exp_connect_cancelset(exp) && req) {
                 if (canceloff) {
@@ -658,10 +662,11 @@
         struct ldlm_lock *lock;
         struct ldlm_request *body;
         struct ldlm_reply *reply;
-        __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
+        __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
                         [DLM_LOCKREQ_OFF]     = sizeof(*body),
                         [DLM_REPLY_REC_OFF]   = lvb_len ? lvb_len :
-                                                sizeof(struct ost_lvb) };
+                                                sizeof(struct ost_lvb),
+                                                16};
         int is_replay = *flags & LDLM_FL_REPLAY;
         int req_passed_in = 1, rc, err;
         struct ptlrpc_request *req;
@@ -710,7 +715,7 @@
         /* lock not sent to server yet */
 
         if (reqp == NULL || *reqp == NULL) {
-                req = ldlm_prep_enqueue_req(exp, 2, size, NULL, 0);
+                req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0);
                 if (req == NULL) {
                         failed_lock_cleanup(ns, lock, lockh,
einfo->ei_mode);
                         LDLM_LOCK_PUT(lock);
Index: lustre/ldlm/ldlm_lockd.c
==================================================================---
lustre/ldlm/ldlm_lockd.c	(revision 8279)
+++ lustre/ldlm/ldlm_lockd.c	(working copy)
@@ -997,13 +997,17 @@
         struct obd_device *obddev = req->rq_export->exp_obd;
         struct ldlm_reply *dlm_rep;
         struct ldlm_request *dlm_req;
-        __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
-                        [DLM_LOCKREPLY_OFF]   = sizeof(*dlm_rep) };
+        void *str;
+        __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
+                        [DLM_LOCKREPLY_OFF]   = sizeof(*dlm_rep)
+                                                 };
         int rc = 0;
         __u32 flags;
         ldlm_error_t err = ELDLM_OK;
         struct ldlm_lock *lock = NULL;
         void *cookie = NULL;
+        char *org = "hello";
+
         ENTRY;
 
         LDLM_DEBUG_NOLOCK("server-side enqueue handler START");
@@ -1119,19 +1123,24 @@
                  * local_lock_enqueue by the policy function. */
                 cookie = req;
         } else {
-                int buffers = 2;
+                int buffers = 4;
 
                 lock_res_and_lock(lock);
                 if (lock->l_resource->lr_lvb_len) {
                         size[DLM_REPLY_REC_OFF] =
lock->l_resource->lr_lvb_len;
-                        buffers = 3;
+                        buffers = 4;
                 }
                 unlock_res_and_lock(lock);
 
                 if (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR))
                         GOTO(out, rc = -ENOMEM);
+                str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1,
1);
+                memcpy ( str , org , 7);
+                size[DLM_REPLY_REC_OFF + 1] = 16;
 
+                printk("\n Inside function %s before calling
1.LUSTRE_PACK_REPLY",__func__);
                 rc = lustre_pack_reply(req, buffers, size, NULL);
+                printk("\n Inside function %s after calling
1.LUSTRE_PACK_REPLY",__func__);
                 if (rc)
                         GOTO(out, rc);
         }
@@ -1215,7 +1224,9 @@
  out:
         req->rq_status = rc ?: err;  /* return either error - bug 11190 */
         if (!req->rq_packed_final) {
+                printk("\n Inside function %s before calling
2.LUSTRE_PACK_REPLY",__func__);
                 err = lustre_pack_reply(req, 1, NULL, NULL);
+                printk("\n Inside function %s after calling
2.LUSTRE_PACK_REPLY",__func__);
                 if (rc == 0)
                         rc = err;
         }

Alexey Lyashkov

2010-Oct-15 17:22 UTC

head link

[Lustre-devel] Query to understand the Lustre request/reply message

first comment. please use diff -p to see what function has changed.
second, please use CDEBUG() if need :) you can set debug level via sysctl -w
lnet.debug=-1, sysctl -w lnet.debug_subsystem=-1
after it you can get very detail log via lctl dk (dump_kernel) > $log-file.
other comments i say tomorrow.

On Oct 15, 2010, at 19:25, Vilobh Meshram wrote:
> Hi Alexey,
> 
> I have attached the diff file .Please have a look at it and please let me
know your comments /suggestions.
> 
> Thanks again.
> 
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20101015/c7a2c826/attachment.html

Lustre devel - Oct 2010 - Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message

[Lustre-devel] Query to understand the Lustre request/reply message