Vilobh Meshram
2010-Oct-12 17:55 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
I want to understand the message encoding and decoding logic in lustre.I am planning to send a request to the MDS and based on the reply from MDs want to populate the struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg */ struct lustre_msg *rq_repmsg; I am trying this for a simple "Hello" message but not seeing the expected output.Sometime I even see Kernel Crash. If you can please give me some insight on the way the Lustre File system encodes decodes the messages sent accross nodes it would be helpful. Thanks, Vilobh *Graduate Research Associate Department of Computer Science The Ohio State University Columbus Ohio* -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101012/1769fe4d/attachment.html
Alexey Lyashkov
2010-Oct-12 18:21 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Vilobh, ldlm_cli_cancel_req is good example to use old PtlRPC API. for first you need allocate request buffer via ptlrpc_prep_req next is - allocate reply buffer via ptlrpc_req_set_repsize next - call ptlrpc_queue_wait to send message and wait reply. osc_getattr_async good example for new PtlRPC API and async RPC processing. if that isn''t help you - please show a yours code to find a error. On Oct 12, 2010, at 20:55, Vilobh Meshram wrote:> I want to understand the message encoding and decoding logic in lustre.I am planning to send a request to the MDS and based on the reply from MDs want to populate the > > struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg */ > struct lustre_msg *rq_repmsg; > > I am trying this for a simple "Hello" message but not seeing the expected output.Sometime I even see Kernel Crash. > If you can please give me some insight on the way the Lustre File system encodes decodes the messages sent accross nodes it would be helpful. > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101012/299c21a8/attachment.html
Vilobh Meshram
2010-Oct-12 22:17 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Thanks Alexey.It was helpful. I have one more question :- If we want to add a new RPC with a new opcode are there any guidlines to be followed in the Lustre File System. Also , 1)How does the MDS process the ptlrpc_request i.e how does the MDS extract the buffer information from the ptlrpc_message. 2)For every new RPC is the message length which is to be sent on wire ( including the fixed header size + the buffer size) dependent on the number of buffers in the lustre request message i.e the count field in the ptlrpc_prep_req() or the size of the size[] array. Thanks, Vilobh *Graduate Research Associate Department of Computer Science The Ohio State University Columbus Ohio* On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov < alexey.lyashkov at clusterstor.com> wrote:> Hi Vilobh, > > ldlm_cli_cancel_req is good example to use old PtlRPC API. > for first you need allocate request buffer via ptlrpc_prep_req > next is - allocate reply buffer via ptlrpc_req_set_repsize > next - call ptlrpc_queue_wait to send message and wait reply. > > osc_getattr_async good example for new PtlRPC API and async RPC processing. > > if that isn''t help you - please show a yours code to find a error. > > On Oct 12, 2010, at 20:55, Vilobh Meshram wrote: > > I want to understand the message encoding and decoding logic in lustre.I am > planning to send a request to the MDS and based on the reply from MDs want > to populate the > > struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg > */ > struct lustre_msg *rq_repmsg; > > I am trying this for a simple "Hello" message but not seeing the expected > output.Sometime I even see Kernel Crash. > If you can please give me some insight on the way the Lustre File system > encodes decodes the messages sent accross nodes it would be helpful. > > Thanks, > Vilobh > *Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio* > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101012/4ba0960f/attachment.html
Alexey Lyashkov
2010-Oct-13 03:46 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
That is depend of rpc type - is that RPC want to be return lock to caller or not, and is that rpc want to have special code to reconstruct in replay phase. in general you need look to mdt/mdt_handler.c. mdt_get_info is good example of simple rpc processing - but it use new PtlRPC api. that is API hide of low level request structures and provide api to access to message buffer by identifier. to use that API you need define structure of own message in ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with commands and write own handler. On Oct 13, 2010, at 01:17, Vilobh Meshram wrote:> Thanks Alexey.It was helpful. > > I have one more question :- > > If we want to add a new RPC with a new opcode are there any guidlines to be followed in the Lustre File System. > > Also , > 1)How does the MDS process the ptlrpc_request i.e how does the MDS extract the buffer information from the ptlrpc_message. > 2)For every new RPC is the message length which is to be sent on wire ( including the fixed header size + the buffer size) dependent on the number of buffers in the lustre request message i.e the count field in the ptlrpc_prep_req() or the size of the size[] array. > > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio > > > On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: > Hi Vilobh, > > ldlm_cli_cancel_req is good example to use old PtlRPC API. > for first you need allocate request buffer via ptlrpc_prep_req > next is - allocate reply buffer via ptlrpc_req_set_repsize > next - call ptlrpc_queue_wait to send message and wait reply. > > osc_getattr_async good example for new PtlRPC API and async RPC processing. > > if that isn''t help you - please show a yours code to find a error. > > On Oct 12, 2010, at 20:55, Vilobh Meshram wrote: > >> I want to understand the message encoding and decoding logic in lustre.I am planning to send a request to the MDS and based on the reply from MDs want to populate the >> >> struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg */ >> struct lustre_msg *rq_repmsg; >> >> I am trying this for a simple "Hello" message but not seeing the expected output.Sometime I even see Kernel Crash. >> If you can please give me some insight on the way the Lustre File system encodes decodes the messages sent accross nodes it would be helpful. >> >> Thanks, >> Vilobh >> Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/e749ba6f/attachment.html
Vilobh Meshram
2010-Oct-13 04:06 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Thanks Alexey for the reply.Thanks a lot. I will try out the steps mentioned by you and see if I can add a new RPC for the task which I am thinking of to implement in Lustre. The RPC of which I am thinking of will not return the lock to the caller.Yes that rpc will have special code to reconstruct in replay phase. Just a last question from which release of Lustre can we make use of the new API.Is their any documentation which lists the use of the new API.If yes can you please point me to that ? Thanks again. Thanks, Vilobh *Graduate Research Associate* ***Department of Computer Science* ***The Ohio State University Columbus Ohio* On Tue, Oct 12, 2010 at 11:46 PM, Alexey Lyashkov < alexey.lyashkov at clusterstor.com> wrote:> That is depend of rpc type - is that RPC want to be return lock to caller > or not, and is that rpc want to have special code to reconstruct in replay > phase. > in general you need look to mdt/mdt_handler.c. mdt_get_info is good example > of simple rpc processing - but it use new PtlRPC api. > that is API hide of low level request structures and provide api to access > to message buffer by identifier. > to use that API you need define structure of own message in > ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with > commands and write own handler. > > > On Oct 13, 2010, at 01:17, Vilobh Meshram wrote: > > Thanks Alexey.It was helpful. > > I have one more question :- > > If we want to add a new RPC with a new opcode are there any guidlines to be > followed in the Lustre File System. > > Also , > 1)How does the MDS process the ptlrpc_request i.e how does the MDS extract > the buffer information from the ptlrpc_message. > 2)For every new RPC is the message length which is to be sent on wire ( > including the fixed header size + the buffer size) dependent on the number > of buffers in the lustre request message i.e the count field in the > ptlrpc_prep_req() or the size of the size[] array. > > > Thanks, > Vilobh > *Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio* > > > On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov < > alexey.lyashkov at clusterstor.com> wrote: > >> Hi Vilobh, >> >> ldlm_cli_cancel_req is good example to use old PtlRPC API. >> for first you need allocate request buffer via ptlrpc_prep_req >> next is - allocate reply buffer via ptlrpc_req_set_repsize >> next - call ptlrpc_queue_wait to send message and wait reply. >> >> osc_getattr_async good example for new PtlRPC API and async RPC >> processing. >> >> if that isn''t help you - please show a yours code to find a error. >> >> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote: >> >> I want to understand the message encoding and decoding logic in lustre.I >> am planning to send a request to the MDS and based on the reply from MDs >> want to populate the >> >> struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than >> msg */ >> struct lustre_msg *rq_repmsg; >> >> I am trying this for a simple "Hello" message but not seeing the expected >> output.Sometime I even see Kernel Crash. >> If you can please give me some insight on the way the Lustre File system >> encodes decodes the messages sent accross nodes it would be helpful. >> >> Thanks, >> Vilobh >> *Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio* >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/45c860df/attachment-0001.html
Alexey Lyashkov
2010-Oct-13 04:20 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Oct 13, 2010, at 07:06, Vilobh Meshram wrote:> Thanks Alexey for the reply.Thanks a lot. > > I will try out the steps mentioned by you and see if I can add a new RPC for the task which I am thinking of to implement in Lustre. > > The RPC of which I am thinking of will not return the lock to the caller.Yes that rpc will have special code to reconstruct in replay phase.In that case possible you need look to ''setattr'' functions - it have own mdt_reconstruct_setattr reconstructor, but that is need support on client side. typically that say - you need have special field in message to copy data from server reply.> > Just a last question from which release of Lustre can we make use of the new API. > Is their any documentation which lists the use of the new API.If yes can you please point me to that ?lustre 1.x have a old PtlRPC API, lustre 2.0 have a mix of new and old, but have a migrate to use new API.> > Thanks again. > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio > > On Tue, Oct 12, 2010 at 11:46 PM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: > That is depend of rpc type - is that RPC want to be return lock to caller or not, and is that rpc want to have special code to reconstruct in replay phase. > in general you need look to mdt/mdt_handler.c. mdt_get_info is good example of simple rpc processing - but it use new PtlRPC api. > that is API hide of low level request structures and provide api to access to message buffer by identifier. > to use that API you need define structure of own message in ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with commands and write own handler. > > > On Oct 13, 2010, at 01:17, Vilobh Meshram wrote: >> Thanks Alexey.It was helpful. >> >> I have one more question :- >> >> If we want to add a new RPC with a new opcode are there any guidlines to be followed in the Lustre File System. >> >> Also , >> 1)How does the MDS process the ptlrpc_request i.e how does the MDS extract the buffer information from the ptlrpc_message. >> 2)For every new RPC is the message length which is to be sent on wire ( including the fixed header size + the buffer size) dependent on the number of buffers in the lustre request message i.e the count field in the ptlrpc_prep_req() or the size of the size[] array. >> >> >> Thanks, >> Vilobh >> Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio >> >> >> On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: >> Hi Vilobh, >> >> ldlm_cli_cancel_req is good example to use old PtlRPC API. >> for first you need allocate request buffer via ptlrpc_prep_req >> next is - allocate reply buffer via ptlrpc_req_set_repsize >> next - call ptlrpc_queue_wait to send message and wait reply. >> >> osc_getattr_async good example for new PtlRPC API and async RPC processing. >> >> if that isn''t help you - please show a yours code to find a error. >> >> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote: >> >>> I want to understand the message encoding and decoding logic in lustre.I am planning to send a request to the MDS and based on the reply from MDs want to populate the >>> >>> struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg */ >>> struct lustre_msg *rq_repmsg; >>> >>> I am trying this for a simple "Hello" message but not seeing the expected output.Sometime I even see Kernel Crash. >>> If you can please give me some insight on the way the Lustre File system encodes decodes the messages sent accross nodes it would be helpful. >>> >>> Thanks, >>> Vilobh >>> Graduate Research Associate >>> Department of Computer Science >>> The Ohio State University Columbus Ohio >>> _______________________________________________ >>> Lustre-devel mailing list >>> Lustre-devel at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/943b354e/attachment.html
Vilobh Meshram
2010-Oct-13 04:35 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Thanks a lot Alexey for the reply.The information will be really useful. Since I am using 1.8.1.1 for my research project I will have to rely on the old API.Since in the source tree prior to 2.0 we do not have a mdt/mdt_handler.c and layout.c files will have to work with the low level buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a place or a function which make use of the old API which I can use as a reference to write the RPC for my task. Thanks, Vilobh *Graduate Research Associate* *Department of Computer Science* *The Ohio State University Columbus Ohio* On Wed, Oct 13, 2010 at 12:20 AM, Alexey Lyashkov < alexey.lyashkov at clusterstor.com> wrote:> > On Oct 13, 2010, at 07:06, Vilobh Meshram wrote: > > Thanks Alexey for the reply.Thanks a lot. > > I will try out the steps mentioned by you and see if I can add a new RPC > for the task which I am thinking of to implement in Lustre. > > The RPC of which I am thinking of will not return the lock to the > caller.Yes that rpc will have special code to reconstruct in replay phase. > > In that case possible you need look to ''setattr'' functions - it have > own mdt_reconstruct_setattr reconstructor, but that is need support on > client side. > typically that say - you need have special field in message to copy data > from server reply. > > > Just a last question from which release of Lustre can we make use of the > new API. > > Is their any documentation which lists the use of the new API.If yes can > you please point me to that ? > > lustre 1.x have a old PtlRPC API, lustre 2.0 have a mix of new and old, but > have a migrate to use new API. > > > Thanks again. > > Thanks, > Vilobh > *Graduate Research Associate* > ***Department of Computer Science* > ***The Ohio State University Columbus Ohio* > > On Tue, Oct 12, 2010 at 11:46 PM, Alexey Lyashkov < > alexey.lyashkov at clusterstor.com> wrote: > >> That is depend of rpc type - is that RPC want to be return lock to caller >> or not, and is that rpc want to have special code to reconstruct in replay >> phase. >> in general you need look to mdt/mdt_handler.c. mdt_get_info is good >> example of simple rpc processing - but it use new PtlRPC api. >> that is API hide of low level request structures and provide api to access >> to message buffer by identifier. >> to use that API you need define structure of own message in >> ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with >> commands and write own handler. >> >> >> On Oct 13, 2010, at 01:17, Vilobh Meshram wrote: >> >> Thanks Alexey.It was helpful. >> >> I have one more question :- >> >> If we want to add a new RPC with a new opcode are there any guidlines to >> be followed in the Lustre File System. >> >> Also , >> 1)How does the MDS process the ptlrpc_request i.e how does the MDS extract >> the buffer information from the ptlrpc_message. >> 2)For every new RPC is the message length which is to be sent on wire ( >> including the fixed header size + the buffer size) dependent on the number >> of buffers in the lustre request message i.e the count field in the >> ptlrpc_prep_req() or the size of the size[] array. >> >> >> Thanks, >> Vilobh >> *Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio* >> >> >> On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov < >> alexey.lyashkov at clusterstor.com> wrote: >> >>> Hi Vilobh, >>> >>> ldlm_cli_cancel_req is good example to use old PtlRPC API. >>> for first you need allocate request buffer via ptlrpc_prep_req >>> next is - allocate reply buffer via ptlrpc_req_set_repsize >>> next - call ptlrpc_queue_wait to send message and wait reply. >>> >>> osc_getattr_async good example for new PtlRPC API and async RPC >>> processing. >>> >>> if that isn''t help you - please show a yours code to find a error. >>> >>> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote: >>> >>> I want to understand the message encoding and decoding logic in lustre.I >>> am planning to send a request to the MDS and based on the reply from MDs >>> want to populate the >>> >>> struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than >>> msg */ >>> struct lustre_msg *rq_repmsg; >>> >>> I am trying this for a simple "Hello" message but not seeing the expected >>> output.Sometime I even see Kernel Crash. >>> If you can please give me some insight on the way the Lustre File system >>> encodes decodes the messages sent accross nodes it would be helpful. >>> >>> Thanks, >>> Vilobh >>> *Graduate Research Associate >>> Department of Computer Science >>> The Ohio State University Columbus Ohio* >>> _______________________________________________ >>> Lustre-devel mailing list >>> Lustre-devel at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >>> >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/041cef72/attachment-0001.html
Alexey Lyashkov
2010-Oct-13 04:41 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
mds_handle (start processing), MDS_CHECK_RESENT() macro to handle reconstruction. mds_set_info_rpc - to simple rpc processing, possible mds_setxattr (mds_setxattr_internal) with generic reconstruction function. On Oct 13, 2010, at 07:35, Vilobh Meshram wrote:> Thanks a lot Alexey for the reply.The information will be really useful. > > Since I am using 1.8.1.1 for my research project I will have to rely on the old API.Since in the source tree prior to 2.0 we do not have a mdt/mdt_handler.c and layout.c files will have to work with the low level buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a place or a function which make use of the old API which I can use as a reference to write the RPC for my task. > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio > > > On Wed, Oct 13, 2010 at 12:20 AM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: > > On Oct 13, 2010, at 07:06, Vilobh Meshram wrote: > >> Thanks Alexey for the reply.Thanks a lot. >> >> I will try out the steps mentioned by you and see if I can add a new RPC for the task which I am thinking of to implement in Lustre. >> >> The RPC of which I am thinking of will not return the lock to the caller.Yes that rpc will have special code to reconstruct in replay phase. > > In that case possible you need look to ''setattr'' functions - it have own mdt_reconstruct_setattr reconstructor, but that is need support on client side. > typically that say - you need have special field in message to copy data from server reply. > >> >> Just a last question from which release of Lustre can we make use of the new API. >> Is their any documentation which lists the use of the new API.If yes can you please point me to that ? > > lustre 1.x have a old PtlRPC API, lustre 2.0 have a mix of new and old, but have a migrate to use new API. > >> >> Thanks again. >> >> Thanks, >> Vilobh >> Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio >> >> On Tue, Oct 12, 2010 at 11:46 PM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: >> That is depend of rpc type - is that RPC want to be return lock to caller or not, and is that rpc want to have special code to reconstruct in replay phase. >> in general you need look to mdt/mdt_handler.c. mdt_get_info is good example of simple rpc processing - but it use new PtlRPC api. >> that is API hide of low level request structures and provide api to access to message buffer by identifier. >> to use that API you need define structure of own message in ptlrpc/layout.c, and own command in enum mds_cmd_t, adjust array with commands and write own handler. >> >> >> On Oct 13, 2010, at 01:17, Vilobh Meshram wrote: >>> Thanks Alexey.It was helpful. >>> >>> I have one more question :- >>> >>> If we want to add a new RPC with a new opcode are there any guidlines to be followed in the Lustre File System. >>> >>> Also , >>> 1)How does the MDS process the ptlrpc_request i.e how does the MDS extract the buffer information from the ptlrpc_message. >>> 2)For every new RPC is the message length which is to be sent on wire ( including the fixed header size + the buffer size) dependent on the number of buffers in the lustre request message i.e the count field in the ptlrpc_prep_req() or the size of the size[] array. >>> >>> >>> Thanks, >>> Vilobh >>> Graduate Research Associate >>> Department of Computer Science >>> The Ohio State University Columbus Ohio >>> >>> >>> On Tue, Oct 12, 2010 at 2:21 PM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: >>> Hi Vilobh, >>> >>> ldlm_cli_cancel_req is good example to use old PtlRPC API. >>> for first you need allocate request buffer via ptlrpc_prep_req >>> next is - allocate reply buffer via ptlrpc_req_set_repsize >>> next - call ptlrpc_queue_wait to send message and wait reply. >>> >>> osc_getattr_async good example for new PtlRPC API and async RPC processing. >>> >>> if that isn''t help you - please show a yours code to find a error. >>> >>> On Oct 12, 2010, at 20:55, Vilobh Meshram wrote: >>> >>>> I want to understand the message encoding and decoding logic in lustre.I am planning to send a request to the MDS and based on the reply from MDs want to populate the >>>> >>>> struct lustre_msg *rq_repbuf; /* client only, buf may be bigger than msg */ >>>> struct lustre_msg *rq_repmsg; >>>> >>>> I am trying this for a simple "Hello" message but not seeing the expected output.Sometime I even see Kernel Crash. >>>> If you can please give me some insight on the way the Lustre File system encodes decodes the messages sent accross nodes it would be helpful. >>>> >>>> Thanks, >>>> Vilobh >>>> Graduate Research Associate >>>> Department of Computer Science >>>> The Ohio State University Columbus Ohio >>>> _______________________________________________ >>>> Lustre-devel mailing list >>>> Lustre-devel at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/d156f01c/attachment.html
Nicolas Williams
2010-Oct-13 05:42 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote:> Thanks a lot Alexey for the reply.The information will be really useful. > > Since I am using 1.8.1.1 for my research project I will have to rely on the > old API.Since in the source tree prior to 2.0 we do not have a > mdt/mdt_handler.c and layout.c files will have to work with the low level > buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a > place or a function which make use of the old API which I can use as a > reference to write the RPC for my task.The new API is _much_ easier to use than the old API. To add an RPC you must: - decide what it looks like Every PTLRPC has an opcode and one or more "buffers", with each buffer containing a C struct, a string, whatever. If a buffer contains a C struct, then it has to be fixed sized. The first buffer is struct ptlrpc_body. A single RPC opcode can denote multiple different layouts, depending on contents of various buffers. A single layout is called a "layout". See below. - add any struct, enum, and other C types you need to lustre_idl.h You must make sure to use the base types we use in lustre_idl.h, such as __u64. - create swabber functions for your data, if necessary - add handlers for the new RPC to mdt_handler.c (for the MDS) or ost_handler.c (for the OST), and so on The handlers are responsible for knowing which buffers contain what, and for swabbing them. You have to make sure that you don''t swab a buffer more than once. The new API allows you define formats quite nicely, and it takes care of calling swabbers and ensuring that no buffer is swabbed more than once. The formats are defined in lustre/ptlrpc/layout.c and look like this: struct req_format RQF_MDS_SYNC DEFINE_REQ_FMT0("MDS_SYNC", mdt_body_capa, mdt_body_only); ... static const struct req_msg_field *mdt_body_capa[] = { &RMF_PTLRPC_BODY, &RMF_MDT_BODY, &RMF_CAPA1 }; static const struct req_msg_field *mdt_body_only[] = { &RMF_PTLRPC_BODY, &RMF_MDT_BODY }; ... An RPC consists of a request and reply, with their formats given in the DEFINE_REQ_FMT0() macro (there''s other macros). Each message format defines a layout of buffers or, as we call them now, "fields", and each field has a format definition as well, such as: struct req_msg_field RMF_PTLRPC_BODY DEFINE_MSGF("ptlrpc_body", 0, sizeof(struct ptlrpc_body), lustre_swab_ptlrpc_body, NULL); for a struct buffer. Other types of RMFs are possible (e.g., strings); see layout.c. So an MDS_SYNC RPC consists of a three-field (buffer) request and two-field reply. The request''s fields are: PTLRPC_BODY, MDT_BODY, and CAPA1. The reply''s fields are: PTLRPC_BODY and MDT_BODY. PTLRPC_BODY is a fixed-sized field containing a C structure, and that the swabber for this field is lustre_swab_ptlrpc_body(). And so on. If you look at Lustre 2.0''s mdt_handler.c and ost_handler.c you''ll find that one of the first things done is to initialize a "capsule", and that the expected message format of a request is decided based on its opcode. That is, the mapping of opcode to RQF is not given by some array, but decided as we go. Indeed, the RQF of a capsule can be changed mid-stream, with some constraints. So, with the new API you: - add C types to lustre_idl.h for on-the-wire data - add any swabbers to lustre/ptlrpc/pack_generic.c (declare them in lustre_idl.h) - add RQFs and, possibly, RMFs to layout.c - declare the RQFs/RMFs in lustre/include/lustre_req_layout.h - on the server-side: - Modify the relevant handler to add an arm to the existing switch on the request''s opcode, call req_capsule_set() to set the capsule''s format, then call a function that will use req_capsule_*get*() to get at the fields (buffers) (both, request and reply buffers) to read from (request) or write to (reply). - on the client-side: - You''ll do something very similar, except that there''s no handler function -- the pattern is less consistent, so you''ll have to read mdc*.c and so on to get a flavor for this... Typically you''ll allocate a request using ptlrpc_request_alloc_pack(), fill in its fields (again, using req_capsule_client_get() and friends), then you''ll send it using, for example, ptlrpc_queue_wait(). Take a good look at mdc_request.c in 2.0 to get a better idea of how to build client stubs for your new RPCs. I haven''t described the wirecheck part -- I can do that later, once you''ve made enough progress. (We have a wirecheck/wiretest program pair to check that only backwards interoperable changes are made to lustre_idl.h.) I hope that helps. Yes, it''d be nice to have something closer to an actual IDL. The RQF/RMF/wirecheck/wiretest stuff could be extended to: - auto-generate swabbers from lustre_idl.h structs - provide a default opcode->RQF mapping - provide more static type safety (by having req_capsule_*get() be macros that cast the buffer address to the right type) - auto-generate simple request constructors (that take pointers to values of an RQF''s correct request field C types) Compared to the old thing, the new API is much closer to an IDL. It''s a good thing. I strongly recommend that you use it, Nico --
Alexey Lyashkov
2010-Oct-13 05:54 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Oct 13, 2010, at 08:42, Nicolas Williams wrote:> On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote: >> Thanks a lot Alexey for the reply.The information will be really useful. >> >> Since I am using 1.8.1.1 for my research project I will have to rely on the >> old API.Since in the source tree prior to 2.0 we do not have a >> mdt/mdt_handler.c and layout.c files will have to work with the low level >> buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a >> place or a function which make use of the old API which I can use as a >> reference to write the RPC for my task. > > > Compared to the old thing, the new API is much closer to an IDL. It''s a > good thing. I strongly recommend that you use it, >main problem - lustre 1.8.1 don''t have the new API :) -------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com
Vilobh Meshram
2010-Oct-13 06:07 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Amazing...Thanks Nicholas and Alexey for your time and detailed reply. I will try out the new API to create new RPC as per the steps mentioned by you for Lustre 2.0 (since I am using 1.8.1.1 right now) . Thanks again. Thanks, Vilobh *Graduate Research Associate* *Department of Computer Science* *The Ohio State University Columbus Ohio** * On Wed, Oct 13, 2010 at 1:42 AM, Nicolas Williams < Nicolas.Williams at oracle.com> wrote:> On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote: > > Thanks a lot Alexey for the reply.The information will be really useful. > > > > Since I am using 1.8.1.1 for my research project I will have to rely on > the > > old API.Since in the source tree prior to 2.0 we do not have a > > mdt/mdt_handler.c and layout.c files will have to work with the low level > > buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you > know a > > place or a function which make use of the old API which I can use as a > > reference to write the RPC for my task. > > The new API is _much_ easier to use than the old API. > > To add an RPC you must: > > - decide what it looks like > > Every PTLRPC has an opcode and one or more "buffers", with each > buffer containing a C struct, a string, whatever. If a buffer > contains a C struct, then it has to be fixed sized. The first buffer > is struct ptlrpc_body. > > A single RPC opcode can denote multiple different layouts, depending > on contents of various buffers. A single layout is called a > "layout". See below. > > - add any struct, enum, and other C types you need to lustre_idl.h > > You must make sure to use the base types we use in lustre_idl.h, such > as __u64. > > - create swabber functions for your data, if necessary > > - add handlers for the new RPC to mdt_handler.c (for the MDS) or > ost_handler.c (for the OST), and so on > > The handlers are responsible for knowing which buffers contain what, > and for swabbing them. You have to make sure that you don''t swab a > buffer more than once. > > The new API allows you define formats quite nicely, and it takes care of > calling swabbers and ensuring that no buffer is swabbed more than once. > The formats are defined in lustre/ptlrpc/layout.c and look like this: > > struct req_format RQF_MDS_SYNC > DEFINE_REQ_FMT0("MDS_SYNC", mdt_body_capa, mdt_body_only); > ... > static const struct req_msg_field *mdt_body_capa[] = { > &RMF_PTLRPC_BODY, > &RMF_MDT_BODY, > &RMF_CAPA1 > }; > static const struct req_msg_field *mdt_body_only[] = { > &RMF_PTLRPC_BODY, > &RMF_MDT_BODY > }; > ... > > An RPC consists of a request and reply, with their formats given in the > DEFINE_REQ_FMT0() macro (there''s other macros). Each message format > defines a layout of buffers or, as we call them now, "fields", and each > field has a format definition as well, such as: > > struct req_msg_field RMF_PTLRPC_BODY > DEFINE_MSGF("ptlrpc_body", 0, > sizeof(struct ptlrpc_body), lustre_swab_ptlrpc_body, > NULL); > > for a struct buffer. Other types of RMFs are possible (e.g., strings); > see layout.c. > > So an MDS_SYNC RPC consists of a three-field (buffer) request and > two-field reply. The request''s fields are: PTLRPC_BODY, MDT_BODY, and > CAPA1. The reply''s fields are: PTLRPC_BODY and MDT_BODY. PTLRPC_BODY > is a fixed-sized field containing a C structure, and that the swabber > for this field is lustre_swab_ptlrpc_body(). And so on. > > If you look at Lustre 2.0''s mdt_handler.c and ost_handler.c you''ll find > that one of the first things done is to initialize a "capsule", and that > the expected message format of a request is decided based on its opcode. > That is, the mapping of opcode to RQF is not given by some array, but > decided as we go. Indeed, the RQF of a capsule can be changed > mid-stream, with some constraints. > > So, with the new API you: > > - add C types to lustre_idl.h for on-the-wire data > - add any swabbers to lustre/ptlrpc/pack_generic.c (declare them in > lustre_idl.h) > - add RQFs and, possibly, RMFs to layout.c > - declare the RQFs/RMFs in lustre/include/lustre_req_layout.h > > - on the server-side: > - Modify the relevant handler to add an arm to the existing switch > on the request''s opcode, call req_capsule_set() to set the > capsule''s format, then call a function that will use > req_capsule_*get*() to get at the fields (buffers) (both, request > and reply buffers) to read from (request) or write to (reply). > > - on the client-side: > - You''ll do something very similar, except that there''s no handler > function -- the pattern is less consistent, so you''ll have to read > mdc*.c and so on to get a flavor for this... Typically you''ll > allocate a request using ptlrpc_request_alloc_pack(), fill in its > fields (again, using req_capsule_client_get() and friends), then > you''ll send it using, for example, ptlrpc_queue_wait(). > > Take a good look at mdc_request.c in 2.0 to get a better idea of > how to build client stubs for your new RPCs. > > I haven''t described the wirecheck part -- I can do that later, once > you''ve made enough progress. (We have a wirecheck/wiretest program pair > to check that only backwards interoperable changes are made to > lustre_idl.h.) > > I hope that helps. Yes, it''d be nice to have something closer to an > actual IDL. The RQF/RMF/wirecheck/wiretest stuff could be extended to: > > - auto-generate swabbers from lustre_idl.h structs > - provide a default opcode->RQF mapping > - provide more static type safety (by having req_capsule_*get() be > macros that cast the buffer address to the right type) > - auto-generate simple request constructors (that take pointers to > values of an RQF''s correct request field C types) > > Compared to the old thing, the new API is much closer to an IDL. It''s a > good thing. I strongly recommend that you use it, > > Nico > -- >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/a967e3dd/attachment.html
Alexey Lyashkov
2010-Oct-13 06:25 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Oct 13, 2010, at 08:42, Nicolas Williams wrote:> > > - add handlers for the new RPC to mdt_handler.c (for the MDS) or > ost_handler.c (for the OST), and so on > > The handlers are responsible for knowing which buffers contain what, > and for swabbing them. You have to make sure that you don''t swab a > buffer more than once. >BTW. That not enough. Some of operations wants to have own recostructors for replay/resend. Some of operations want to have a return lock - as example MDS_GETATTR and MDS_REINT commands. .... -------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com
Nicolas Williams
2010-Oct-13 07:12 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Wed, Oct 13, 2010 at 09:25:00AM +0300, Alexey Lyashkov wrote:> > On Oct 13, 2010, at 08:42, Nicolas Williams wrote: > > > > > > - add handlers for the new RPC to mdt_handler.c (for the MDS) or > > ost_handler.c (for the OST), and so on > > > > The handlers are responsible for knowing which buffers contain what, > > and for swabbing them. You have to make sure that you don''t swab a > > buffer more than once. > > > BTW. > That not enough. > Some of operations wants to have own recostructors for replay/resend.I glossed over replay/resend, mostly because I know little about them, but also because they are completely orthogonal to the message format details. If you want to add an RPC then the first step should be to get the RPC format designed and surrounding code up and running, then you can take care of replay/resend.> Some of operations want to have a return lock - as example MDS_GETATTR > and MDS_REINT commands.That too is orthogonal to the message formats. The message format has to have a buffer (field) declared to carry lock (or capability, or whatever) bits, and some function has to be invoked to populate the buffer in the reply. Nico --
Nicolas Williams
2010-Oct-13 07:15 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Wed, Oct 13, 2010 at 08:54:57AM +0300, Alexey Lyashkov wrote:> On Oct 13, 2010, at 08:42, Nicolas Williams wrote: > > On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote: > >> Thanks a lot Alexey for the reply.The information will be really useful. > >> > >> Since I am using 1.8.1.1 for my research project I will have to rely on the > >> old API.Since in the source tree prior to 2.0 we do not have a > >> mdt/mdt_handler.c and layout.c files will have to work with the low level > >> buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a > >> place or a function which make use of the old API which I can use as a > >> reference to write the RPC for my task. > > > > > > Compared to the old thing, the new API is much closer to an IDL. It''s a > > good thing. I strongly recommend that you use it, > > > main problem - lustre 1.8.1 don''t have the new API :)You''ll note that Vilobh did not provide any rationale for his/her choice of Lustre version. Without having any other good reason for picking 1.8 or 2.0, I strongly recommend 2.0. Now, perhaps Vilobh has a need to interop with an installed base of 1.8. That would be a good reason to do this in 1.8. But the work will have to be done for 2.0 as well. Nico --
Nicolas Williams
2010-Oct-13 07:17 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Wed, Oct 13, 2010 at 02:07:06AM -0400, Vilobh Meshram wrote:> I will try out the new API to create new RPC as per the steps mentioned by > you for Lustre 2.0 (since I am using 1.8.1.1 right now) .The new API, incidentally, uses the old API under the hood. That might help guide you. To understand usage patterns for the old API it should help to look at 2.0 code, particularly layout.c code, then look at corresponding 1.8 code. Nico --
Alexey Lyashkov
2010-Oct-13 07:27 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
eh.. Nicolas, Format for messages which want to reconstructed after resend and don''t want recontructed - is different. As quick example it is OPEN request (via MDS_REINT command), that type message need a have extra buffer to store LOV EA, which to be send to MDS in replay case (with additional flag in header). (client have a copy data from a mds reply after ptlrpc finish processing request). That is why i say about "Reconstruct/replay case" Also format is different is you want to use MDS_REINT + sub commands or you want to use something similar to MDS_SET_INFO. For MDS_SET_INFO you use single format for all messages (just simple key <> value) buffer, but for MDS_REINT you need two formats - one for generic MDS_REINT code (get opcode from command, get locks, and possible other) and own format for each opcode - such as open, unlink, setxattr, setattr. all of them have a different number of buffers (fields). On Oct 13, 2010, at 10:12, Nicolas Williams wrote:> On Wed, Oct 13, 2010 at 09:25:00AM +0300, Alexey Lyashkov wrote: >> >> On Oct 13, 2010, at 08:42, Nicolas Williams wrote: >>> >>> >>> - add handlers for the new RPC to mdt_handler.c (for the MDS) or >>> ost_handler.c (for the OST), and so on >>> >>> The handlers are responsible for knowing which buffers contain what, >>> and for swabbing them. You have to make sure that you don''t swab a >>> buffer more than once. >>> >> BTW. >> That not enough. >> Some of operations wants to have own recostructors for replay/resend. > > I glossed over replay/resend, mostly because I know little about them, > but also because they are completely orthogonal to the message format > details. If you want to add an RPC then the first step should be to get > the RPC format designed and surrounding code up and running, then you > can take care of replay/resend. > >> Some of operations want to have a return lock - as example MDS_GETATTR >> and MDS_REINT commands. > > That too is orthogonal to the message formats. The message format has > to have a buffer (field) declared to carry lock (or capability, or > whatever) bits, and some function has to be invoked to populate the > buffer in the reply. > > Nico > --
Alexey Lyashkov
2010-Oct-13 07:32 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
MDS code in 1.8 is more simple, because they don''t have a parts of clustered metadata project aka CMD3 ;-) same for client, they don''t have many advantages - such as FID assignments on client side, or extra MD layer (LMV) and don''t have a CLIO. so 1.8 is good start to learn :) On Oct 13, 2010, at 10:15, Nicolas Williams wrote:> On Wed, Oct 13, 2010 at 08:54:57AM +0300, Alexey Lyashkov wrote: >> On Oct 13, 2010, at 08:42, Nicolas Williams wrote: >>> On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote: >>>> Thanks a lot Alexey for the reply.The information will be really useful. >>>> >>>> Since I am using 1.8.1.1 for my research project I will have to rely on the >>>> old API.Since in the source tree prior to 2.0 we do not have a >>>> mdt/mdt_handler.c and layout.c files will have to work with the low level >>>> buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you know a >>>> place or a function which make use of the old API which I can use as a >>>> reference to write the RPC for my task. >>> >>> >>> Compared to the old thing, the new API is much closer to an IDL. It''s a >>> good thing. I strongly recommend that you use it, >>> >> main problem - lustre 1.8.1 don''t have the new API :) > > You''ll note that Vilobh did not provide any rationale for his/her choice > of Lustre version. Without having any other good reason for picking 1.8 > or 2.0, I strongly recommend 2.0. > > Now, perhaps Vilobh has a need to interop with an installed base of 1.8. > That would be a good reason to do this in 1.8. But the work will have > to be done for 2.0 as well. > > Nico > ---------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com
Nicolas Williams
2010-Oct-13 07:43 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Wed, Oct 13, 2010 at 10:27:55AM +0300, Alexey Lyashkov wrote:> eh.. Nicolas, > > Format for messages which want to reconstructed after resend and don''t > want recontructed - is different. > > As quick example it is OPEN request (via MDS_REINT command), that type > message need a have extra buffer to store LOV EA, which to be send to > MDS in replay case (with additional flag in header). (client have a > copy data from a mds reply after ptlrpc finish processing request). > That is why i say about "Reconstruct/replay case"Sure, but this buffer needs to be declared a priori. If you won''t know whether you''ll need a buffer until later, that''s OK: you declare it anyways and you set its size to zero if you don''t need it. You can''t change a capsule''s format to add buffers; you can only set the size of unnecessary buffers to zero. This is because the header of a ptlrpc (not the ptlrpc_body, mind you) has a count of buffers then a variable length (64-bit aligned) set of that many 32-bit buffer lengths (I''m going from memory here), and adding buffers can put a reply over the expected max size on the client side, leading to it being dropped. You can change a capsule''s format to change the definition of a field from one without a swabber to one with a swabber. You''ll see in many cases that the presence of a field (meaning, whether it''s checked for or whether it has a non-zero size) is dependent on a flag in the mdt or ost body, as you mention. Replays are not the only interesting case here. Capabilities are another. Some of these flags could be removed and replaced instead with checks of buffer size (0 -> flag not set, >0 -> flag set).> Also format is different is you want to use MDS_REINT + sub commands > or you want to use something similar to MDS_SET_INFO. For > MDS_SET_INFO you use single format for all messages (just simple key > <> value) buffer, but for MDS_REINT you need two formats - one for > generic MDS_REINT code (get opcode from command, get locks, and > possible other) and own format for each opcode - such as open, > unlink, setxattr, setattr. all of them have a different number of > buffers (fields).The SET_INFO RPCs are kinda gross. I should know, since I finished the conversion of ost_handler.c to the new API. You can see that I used req_capsule_extend() to handle some SET_INFO cases. No, I didn''t cover this detail, nor others, because I figured Vilobh needed a starting point, and that''s all I was going to provide tonight. Nico --
Alexey Lyashkov
2010-Oct-13 07:51 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Oct 13, 2010, at 10:43, Nicolas Williams wrote:> On Wed, Oct 13, 2010 at 10:27:55AM +0300, Alexey Lyashkov wrote: >> eh.. Nicolas, >> >> Format for messages which want to reconstructed after resend and don''t >> want recontructed - is different. >> >> As quick example it is OPEN request (via MDS_REINT command), that type >> message need a have extra buffer to store LOV EA, which to be send to >> MDS in replay case (with additional flag in header). (client have a >> copy data from a mds reply after ptlrpc finish processing request). >> That is why i say about "Reconstruct/replay case" > > Sure, but this buffer needs to be declared a priori. If you won''t know > whether you''ll need a buffer until later, that''s OK: you declare it > anyways and you set its size to zero if you don''t need it. > > You can''t change a capsule''s format to add buffers; you can only set the > size of unnecessary buffers to zero.but you can reassing format for a message :) if you look to MDT code, you can see - for MDS_REINT you have first format for operation inside REINT lustre use second format static int mdt_reint(struct mdt_thread_info *info) { long opc; int rc; static const struct req_format *reint_fmts[REINT_MAX] = { [REINT_SETATTR] = &RQF_MDS_REINT_SETATTR, [REINT_CREATE] = &RQF_MDS_REINT_CREATE, [REINT_LINK] = &RQF_MDS_REINT_LINK, [REINT_UNLINK] = &RQF_MDS_REINT_UNLINK, [REINT_RENAME] = &RQF_MDS_REINT_RENAME, [REINT_OPEN] = &RQF_MDS_REINT_OPEN, [REINT_SETXATTR] = &RQF_MDS_REINT_SETXATTR }; .. understand me ? -------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com
Vilobh Meshram
2010-Oct-13 23:51 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Alexey/Nicholas/All, I had a look at the 2.0 side of the code and seems like there have been some significant modifications at the MDS side e.g. seems like the request processing part at MDS has been redefined. I need to stick to 1.8.1.1 since my most of the modifications are at the MDS side and the project also demands the same.So I will need to play around with the low-level message packing and unpacking stuff which is pretty complicated from my previous experience. Here is my understanding of the way the request are processed .Please correct me if I am wrong. 1) Seems like in Lustre code base for each RPC you have a static structure "size" and count defined which defines the way the message would be layed out (after doing all the rounding off operations etc) i.e the offset at which the buffers will be packed and so on. 2) At MDS side the swabber() function + some associated functions extract the buffer information. *What I need is :-* 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I can append some string such as "Hello" to the exsisting message sent by the Client (with the buffer size set at client side by the count,size fields).I tried modifying the "size" of the request for one of the RPC in-built in Lustre __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) }; ---->> __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) , //how to add "char *str=Hello" ofcourse we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of such MACROS }; The thing I want to know is how can I send a buffer from the client side by modifying the static structure "size" mentioned above.What all main places do I need to consider to make this work. If the above step i.e appending a buffer in the "size" array is not possible then I can move to write a new RPC. Thanks, Vilobh *Graduate Research Associate* *Department of Computer Science* *The Ohio State University Columbus Ohio* On Wed, Oct 13, 2010 at 3:51 AM, Alexey Lyashkov < alexey.lyashkov at clusterstor.com> wrote:> > On Oct 13, 2010, at 10:43, Nicolas Williams wrote: > > > On Wed, Oct 13, 2010 at 10:27:55AM +0300, Alexey Lyashkov wrote: > >> eh.. Nicolas, > >> > >> Format for messages which want to reconstructed after resend and don''t > >> want recontructed - is different. > >> > >> As quick example it is OPEN request (via MDS_REINT command), that type > >> message need a have extra buffer to store LOV EA, which to be send to > >> MDS in replay case (with additional flag in header). (client have a > >> copy data from a mds reply after ptlrpc finish processing request). > >> That is why i say about "Reconstruct/replay case" > > > > Sure, but this buffer needs to be declared a priori. If you won''t know > > whether you''ll need a buffer until later, that''s OK: you declare it > > anyways and you set its size to zero if you don''t need it. > > > > You can''t change a capsule''s format to add buffers; you can only set the > > size of unnecessary buffers to zero. > but you can reassing format for a message :) > if you look to MDT code, you can see - for MDS_REINT you have first format > for operation inside REINT lustre use second format > > static int mdt_reint(struct mdt_thread_info *info) > { > long opc; > int rc; > > static const struct req_format *reint_fmts[REINT_MAX] = { > [REINT_SETATTR] = &RQF_MDS_REINT_SETATTR, > [REINT_CREATE] = &RQF_MDS_REINT_CREATE, > [REINT_LINK] = &RQF_MDS_REINT_LINK, > [REINT_UNLINK] = &RQF_MDS_REINT_UNLINK, > [REINT_RENAME] = &RQF_MDS_REINT_RENAME, > [REINT_OPEN] = &RQF_MDS_REINT_OPEN, > [REINT_SETXATTR] = &RQF_MDS_REINT_SETXATTR > }; > > .. > > understand me ? > > > -------------------------------------- > Alexey Lyashkov > alexey.lyashkov at clusterstor.com > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/236950ac/attachment.html
Nicolas Williams
2010-Oct-14 00:28 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Wed, Oct 13, 2010 at 07:51:37PM -0400, Vilobh Meshram wrote:> 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I can > append some string such as "Hello" to the exsisting message sent by the > Client (with the buffer size set at client side by the count,size fields).I > tried modifying the "size" of the request for one of the RPC in-built in > LustreYes, it''s possible to add buffers to requests. It''s not possible to add buffers to _replies_ to existing RPCs unless you know the client expects those additional buffers -- existing clients expect a given maxsize for each reply, and if your reply is bigger then it will get dropped.> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct > ptlrpc_body), > [DLM_LOCKREQ_OFF] = sizeof(struct > ldlm_request) }; > > ---->> > __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > [DLM_LOCKREQ_OFF] = sizeof(struct > ldlm_request) , > //how to add "char *str=Hello" ofcourse we > will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz > for a specific kind of RPC there are limited number of such MACROS > };Add a buffer. Don''t change the size of an existing buffer.> The thing I want to know is how can I send a buffer from the client side by > modifying the static structure "size" mentioned above.What all main places > do I need to consider to make this work.Add an element to the size[] array, then set it to the correct size when you know the length of the string. Look at the SET_INFO RPCs.> If the above step i.e appending a buffer in the "size" array is not possible > then I can move to write a new RPC.The size[] array is just a convenient place to store the sizes of the individual buffers while you construct them. Nico --
Vilobh Meshram
2010-Oct-14 01:41 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Thanks Nicolas.I will try it out by today/tommorow . Seems like it will touch lot of places in the codebase :-) Thanks again. Thanks, Vilobh *Graduate Research Associate* *Department of Computer Science* *The Ohio State University Columbus Ohio* On Wed, Oct 13, 2010 at 8:28 PM, Nicolas Williams < Nicolas.Williams at oracle.com> wrote:> On Wed, Oct 13, 2010 at 07:51:37PM -0400, Vilobh Meshram wrote: > > 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I can > > append some string such as "Hello" to the exsisting message sent by the > > Client (with the buffer size set at client side by the count,size > fields).I > > tried modifying the "size" of the request for one of the RPC in-built in > > Lustre > > Yes, it''s possible to add buffers to requests. It''s not possible to add > buffers to _replies_ to existing RPCs unless you know the client expects > those additional buffers -- existing clients expect a given maxsize for > each reply, and if your reply is bigger then it will get dropped. > > > __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct > > ptlrpc_body), > > [DLM_LOCKREQ_OFF] = sizeof(struct > > ldlm_request) }; > > > > ---->> > > __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct > ptlrpc_body), > > [DLM_LOCKREQ_OFF] = sizeof(struct > > ldlm_request) , > > //how to add "char *str=Hello" ofcourse > we > > will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF > bcz > > for a specific kind of RPC there are limited number of such MACROS > > }; > > Add a buffer. Don''t change the size of an existing buffer. > > > The thing I want to know is how can I send a buffer from the client side > by > > modifying the static structure "size" mentioned above.What all main > places > > do I need to consider to make this work. > > Add an element to the size[] array, then set it to the correct size when > you know the length of the string. Look at the SET_INFO RPCs. > > > If the above step i.e appending a buffer in the "size" array is not > possible > > then I can move to write a new RPC. > > The size[] array is just a convenient place to store the sizes of the > individual buffers while you construct them. > > Nico > -- >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101013/916108f9/attachment.html
Alexey Lyashkov
2010-Oct-14 03:38 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Oct 14, 2010, at 03:28, Nicolas Williams wrote:> On Wed, Oct 13, 2010 at 07:51:37PM -0400, Vilobh Meshram wrote: >> 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I can >> append some string such as "Hello" to the exsisting message sent by the >> Client (with the buffer size set at client side by the count,size fields).I >> tried modifying the "size" of the request for one of the RPC in-built in >> Lustre > > Yes, it''s possible to add buffers to requests. It''s not possible to add > buffers to _replies_ to existing RPCs unless you know the client expects > those additional buffers -- existing clients expect a given maxsize for > each reply, and if your reply is bigger then it will get dropped.It is wrong for last ~1year. ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and resend a request. if (ev->mlength < ev->rlength ) { CDEBUG(D_RPCTRACE, "truncate req %p rpc %d - %d+%d\n", req, req->rq_replen, ev->rlength, ev->offset); req->rq_reply_truncate = 1; req->rq_replied = 1; req->rq_status = -EOVERFLOW; req->rq_nob_received = ev->rlength + ev->offset; ... if (req->rq_reply_truncate) { if (ptlrpc_no_resend(req)) { DEBUG_REQ(D_ERROR, req, "reply buffer overflow," " expected: %d, actual size: %d", req->rq_nob_received, req->rq_repbuf_len); RETURN(-EOVERFLOW); } sptlrpc_cli_free_repbuf(req); /* Pass the required reply buffer size (include * space for early reply). * NB: no need to roundup because alloc_repbuf * will roundup it */ req->rq_replen = req->rq_nob_received; req->rq_nob_received = 0; req->rq_resend = 1; RETURN(0); } -------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com
Nicolas Williams
2010-Oct-14 05:18 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote:> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: > > Yes, it''s possible to add buffers to requests. It''s not possible to add > > buffers to _replies_ to existing RPCs unless you know the client expects > > those additional buffers -- existing clients expect a given maxsize for > > each reply, and if your reply is bigger then it will get dropped. > It is wrong for last ~1year. > ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and resend a request.Ah, I didn''t know that was in 1.8. Are there interop issues (with older clients) though with sending larger replies than expected?
Alexey Lyashkov
2010-Oct-14 05:46 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Oct 14, 2010, at 08:18, Nicolas Williams wrote:> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >>> Yes, it''s possible to add buffers to requests. It''s not possible to add >>> buffers to _replies_ to existing RPCs unless you know the client expects >>> those additional buffers -- existing clients expect a given maxsize for >>> each reply, and if your reply is bigger then it will get dropped. >> It is wrong for last ~1year. >> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and resend a request. > > Ah, I didn''t know that was in 1.8.that is added near of 1.8.1 and easy to check by grep -rn rq_reply_truncate in ptlrpc directory. Severity : normal Bugzilla : 19526 Description: can''t stat file in some situation. Details : improve initialize osc date when target is added to mds and ability to resend too big getattr request is client isn''t have info about ost.> Are there interop issues (with older > clients) though with sending larger replies than expected?I not clearly understand that question, but main propose of that change - problem with LOV EA buffer size for files with ACL (look to some conf-sanity tests). In some situation - MDS can have larger LOV EA buffer, when client expected (some files with wide striping have a reference to OST which removed from a cluster, or configuration lost, or new OST added but OST isn''t connected to client or bad call shrink reply buffer, or other ... - you can find more references in bugzilla.) in that case MDS have send larger buffer to client. older client have infinity loop on connect or in stat syscall (because of messages without rq_no_resend flag) new client have a resend message - and adjust maximal size of LOV EA after got a valid reply, to avoid that''s problem in future. -------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com
Alexey Lyashkov
2010-Oct-14 08:44 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On Oct 14, 2010, at 02:51, Vilobh Meshram wrote:> > > 1) Is it possible that without writing a new RPC in Lustre 1.8.1.1 I can append some string such as "Hello" to the exsisting message sent by the Client (with the buffer size set at client side by the count,size fields).I tried modifying the "size" of the request for one of the RPC in-built in Lustre > > __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) }; > > ---->> > __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) , > //how to add "char *str=Hello" ofcourse we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of such MACROS >should be better if you complete describe - what you want to do, because some requests can''t changed easy without compatibility lost, - like ELC (early lock cancel) feature which add extra buffer in messages and have special connect flag, to check request format changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/cbd91c27/attachment-0001.html
Andreas Dilger
2010-Oct-14 14:31 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
On 2010-10-13, at 23:18, Nicolas Williams wrote:> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >>> Yes, it''s possible to add buffers to requests. It''s not possible to add >>> buffers to _replies_ to existing RPCs unless you know the client expects >>> those additional buffers -- existing clients expect a given maxsize for >>> each reply, and if your reply is bigger then it will get dropped. >> It is wrong for last ~1year. >> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and resend a request. > > Ah, I didn''t know that was in 1.8. Are there interop issues (with older > clients) though with sending larger replies than expected?Nico, it has always been possible in the past to increase the size of any buffer in a request, or in a reply (if the total reply size will fit into the pre-allocated reply buffer). An older peer would just ignore the bytes beyond the known part of the buffer. Is that not true with the 2.x RPC handling? Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Alexey Lyashkov
2010-Oct-14 14:40 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Andreas, On Oct 14, 2010, at 17:31, Andreas Dilger wrote:> On 2010-10-13, at 23:18, Nicolas Williams wrote: >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >>>> Yes, it''s possible to add buffers to requests. It''s not possible to add >>>> buffers to _replies_ to existing RPCs unless you know the client expects >>>> those additional buffers -- existing clients expect a given maxsize for >>>> each reply, and if your reply is bigger then it will get dropped. >>> It is wrong for last ~1year. >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and resend a request. >> >> Ah, I didn''t know that was in 1.8. Are there interop issues (with older >> clients) though with sending larger replies than expected? > > Nico, it has always been possible in the past to increase the size of any buffer in a request, or in a reply (if the total reply size will fit into the pre-allocated reply buffer). An older peer would just ignore the bytes beyond the known part of the buffer. >I think that question don''t about rebalance buffers size in message, i think that sending large reply in smaller reply buffer. LNet don''t able to put large reply to small buffer (without truncate flag, which is not exist in older ptlrpc version). without that flag you will see messages>>CERROR("Matching packet from %s, match "LPU64 " length %d too big: %d left, %d allowed\n", libcfs_id2str(src), match_bits, rlength, md->md_length - offset, mlength);>>and LNet will drop message without notify PtlRPC.> Is that not true with the 2.x RPC handling? >2.x able to rebalance space between buffers (but looks by hand), and able adjust reply buffer after truncated reply. -------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com
Vilobh Meshram
2010-Oct-14 15:04 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Alexey, Thanks again for your reply. I am trying to embed a buffer in the RPC which will get filled in with some values which MDS is aware of which the client calling the RPC is not aware of.It has nothing to do with locking.I just want to fill in the buffer which I embedd in the RPC with some suitable data from the MDS end and then do operations on that data at the client side.So I think the approach suggested by you and Nicholas of just including the sizeof(str) [the size of the expected information from the MDS] in the size[] array should be fine as done below :- __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) }; ---->> __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) , //how to add "char *str=Hello" ofcourse we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of such MACROS *Please correct me if I am wrong or please guide me if I need to consider few corner cases to handle this use case. *Thanks again. Thanks, Vilobh *Graduate Research Associate Department of Computer Science The Ohio State University Columbus Ohio* On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov < alexey.lyashkov at clusterstor.com> wrote:> Andreas, > > On Oct 14, 2010, at 17:31, Andreas Dilger wrote: > > > On 2010-10-13, at 23:18, Nicolas Williams wrote: > >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: > >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: > >>>> Yes, it''s possible to add buffers to requests. It''s not possible to > add > >>>> buffers to _replies_ to existing RPCs unless you know the client > expects > >>>> those additional buffers -- existing clients expect a given maxsize > for > >>>> each reply, and if your reply is bigger then it will get dropped. > >>> It is wrong for last ~1year. > >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, > and resend a request. > >> > >> Ah, I didn''t know that was in 1.8. Are there interop issues (with older > >> clients) though with sending larger replies than expected? > > > > Nico, it has always been possible in the past to increase the size of any > buffer in a request, or in a reply (if the total reply size will fit into > the pre-allocated reply buffer). An older peer would just ignore the bytes > beyond the known part of the buffer. > > > I think that question don''t about rebalance buffers size in message, > i think that sending large reply in smaller reply buffer. > LNet don''t able to put large reply to small buffer (without truncate flag, > which is not exist in older ptlrpc version). > without that flag you will see messages > >> > CERROR("Matching packet from %s, match "LPU64 > " length %d too big: %d left, %d allowed\n", > libcfs_id2str(src), match_bits, rlength, > md->md_length - offset, mlength); > >> > and LNet will drop message without notify PtlRPC. > > > > Is that not true with the 2.x RPC handling? > > > 2.x able to rebalance space between buffers (but looks by hand), and able > adjust reply buffer after truncated reply. > > > > -------------------------------------- > Alexey Lyashkov > alexey.lyashkov at clusterstor.com > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/8fa823d6/attachment.html
Alexey Lyashkov
2010-Oct-14 15:10 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Vilobh, as i see, you touched code related to locking. struct ldm_request used to lock enqueue process - that why i say about interop issue in ELC code, which solved with export flag. for common mdc requests you can resolve interop issue with flags in mdc_body (mdt_body), but that not possible for ldlm requests. On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:> Hi Alexey, > > Thanks again for your reply. > > I am trying to embed a buffer in the RPC which will get filled in with some values which MDS is aware of which the client calling the RPC is not aware of.It has nothing to do with locking.I just want to fill in the buffer which I embedd in the RPC with some suitable data from the MDS end and then do operations on that data at the client side.So I think the approach suggested by you and Nicholas of just including the sizeof(str) [the size of the expected information from the MDS] in the size[] array should be fine as done below :- > > > > __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) }; > > ---->> > __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) , > //how to add "char *str=Hello" ofcourse we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of such MACROS > > > Please correct me if I am wrong or please guide me if I need to consider few corner cases to handle this use case. > > Thanks again. > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio > > > On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: > Andreas, > > On Oct 14, 2010, at 17:31, Andreas Dilger wrote: > > > On 2010-10-13, at 23:18, Nicolas Williams wrote: > >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: > >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: > >>>> Yes, it''s possible to add buffers to requests. It''s not possible to add > >>>> buffers to _replies_ to existing RPCs unless you know the client expects > >>>> those additional buffers -- existing clients expect a given maxsize for > >>>> each reply, and if your reply is bigger then it will get dropped. > >>> It is wrong for last ~1year. > >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and resend a request. > >> > >> Ah, I didn''t know that was in 1.8. Are there interop issues (with older > >> clients) though with sending larger replies than expected? > > > > Nico, it has always been possible in the past to increase the size of any buffer in a request, or in a reply (if the total reply size will fit into the pre-allocated reply buffer). An older peer would just ignore the bytes beyond the known part of the buffer. > > > I think that question don''t about rebalance buffers size in message, > i think that sending large reply in smaller reply buffer. > LNet don''t able to put large reply to small buffer (without truncate flag, which is not exist in older ptlrpc version). > without that flag you will see messages > >> > CERROR("Matching packet from %s, match "LPU64 > " length %d too big: %d left, %d allowed\n", > libcfs_id2str(src), match_bits, rlength, > md->md_length - offset, mlength); > >> > and LNet will drop message without notify PtlRPC. > > > > Is that not true with the 2.x RPC handling? > > > 2.x able to rebalance space between buffers (but looks by hand), and able adjust reply buffer after truncated reply. > > > > -------------------------------------- > Alexey Lyashkov > alexey.lyashkov at clusterstor.com > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/46135b50/attachment-0001.html
Vilobh Meshram
2010-Oct-14 15:29 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Alexey, Thanks again for the reply. Can you briefly give me some pointers about this interop issue and in which kind of RPC should this issue arise ? How should we resolve this what kind of flag needs to be set in ? I went through the bugzilla entry mentioned by you it seems like for RPCs dealing with LDLM may cause this issue.Please correct me if I am wrong. Thanks, Vilobh *Graduate Research Associate Department of Computer Science The Ohio State University Columbus Ohio* On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov < alexey.lyashkov at clusterstor.com> wrote:> Hi Vilobh, > > as i see, you touched code related to locking. struct ldm_request used to > lock enqueue process - that why i say about interop issue in ELC code, which > solved with export flag. > for common mdc requests you can resolve interop issue with flags in > mdc_body (mdt_body), but that not possible for ldlm requests. > > > On Oct 14, 2010, at 18:04, Vilobh Meshram wrote: > > Hi Alexey, > > Thanks again for your reply. > > I am trying to embed a buffer in the RPC which will get filled in with some > values which MDS is aware of which the client calling the RPC is not aware > of.It has nothing to do with locking.I just want to fill in the buffer > which I embedd in the RPC with some suitable data from the MDS end and then > do operations on that data at the client side.So I think the approach > suggested by you and Nicholas of just including the sizeof(str) [the size of > the expected information from the MDS] in the size[] array should be fine as > done below :- > > > > __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > [DLM_LOCKREQ_OFF] = sizeof(struct > ldlm_request) }; > > ---->> > __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > [DLM_LOCKREQ_OFF] = sizeof(struct > ldlm_request) , > //how to add "char *str=Hello" ofcourse > we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF > bcz for a specific kind of RPC there are limited number of such MACROS > > > *Please correct me if I am wrong or please guide me if I need to consider > few corner cases to handle this use case. > > *Thanks again. > > Thanks, > Vilobh > *Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio* > > > On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov < > alexey.lyashkov at clusterstor.com> wrote: > >> Andreas, >> >> On Oct 14, 2010, at 17:31, Andreas Dilger wrote: >> >> > On 2010-10-13, at 23:18, Nicolas Williams wrote: >> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >> >>>> Yes, it''s possible to add buffers to requests. It''s not possible to >> add >> >>>> buffers to _replies_ to existing RPCs unless you know the client >> expects >> >>>> those additional buffers -- existing clients expect a given maxsize >> for >> >>>> each reply, and if your reply is bigger then it will get dropped. >> >>> It is wrong for last ~1year. >> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, >> and resend a request. >> >> >> >> Ah, I didn''t know that was in 1.8. Are there interop issues (with >> older >> >> clients) though with sending larger replies than expected? >> > >> > Nico, it has always been possible in the past to increase the size of >> any buffer in a request, or in a reply (if the total reply size will fit >> into the pre-allocated reply buffer). An older peer would just ignore the >> bytes beyond the known part of the buffer. >> > >> I think that question don''t about rebalance buffers size in message, >> i think that sending large reply in smaller reply buffer. >> LNet don''t able to put large reply to small buffer (without truncate flag, >> which is not exist in older ptlrpc version). >> without that flag you will see messages >> >> >> CERROR("Matching packet from %s, match "LPU64 >> " length %d too big: %d left, %d allowed\n", >> libcfs_id2str(src), match_bits, rlength, >> md->md_length - offset, mlength); >> >> >> and LNet will drop message without notify PtlRPC. >> >> >> > Is that not true with the 2.x RPC handling? >> > >> 2.x able to rebalance space between buffers (but looks by hand), and able >> adjust reply buffer after truncated reply. >> >> >> >> -------------------------------------- >> Alexey Lyashkov >> alexey.lyashkov at clusterstor.com >> >> >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/1f1f85b3/attachment.html
Alexey Lyashkov
2010-Oct-14 15:45 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Vilobh, interop == interoperability between nodes with different version of software. in general we have two ways to solve that - for requests with mdc_body - you can set flag in body and analyze that flag in server/client side. if you want add new operation - better way add new flag into connect_data (look to OBD_CONNECT_* macroses handling) that flag can checked via export->connect_flags on client or server side for remote side features. as example 1.x and 2.0 have a different format for setattr requests : int mdc_setattr ... if (mdc_exp_is_2_0_server(exp)) { size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr); size[REQ_REC_OFF + 1] = 0; /* capa */ size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch); size[REQ_REC_OFF + 3] = ealen; size[REQ_REC_OFF + 4] = ea2len; size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request); offset = REQ_REC_OFF + 5; bufcount = 6; replybufcount = 6; } else { bufcount = 4; } example of client features are checking version based recovery support for client mds_version_get_check ... if (inode == NULL || !exp_connect_vbr(req->rq_export)) I hope that help you. On Oct 14, 2010, at 18:29, Vilobh Meshram wrote:> Hi Alexey, > > Thanks again for the reply. > > Can you briefly give me some pointers about this interop issue and in which kind of RPC should this issue arise ? How should we resolve this what kind of flag needs to be set in ? > > I went through the bugzilla entry mentioned by you it seems like for RPCs dealing with LDLM may cause this issue.Please correct me if I am wrong. > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio > > > On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: > Hi Vilobh, > > as i see, you touched code related to locking. struct ldm_request used to lock enqueue process - that why i say about interop issue in ELC code, which solved with export flag. > for common mdc requests you can resolve interop issue with flags in mdc_body (mdt_body), but that not possible for ldlm requests. > > > On Oct 14, 2010, at 18:04, Vilobh Meshram wrote: > >> Hi Alexey, >> >> Thanks again for your reply. >> >> I am trying to embed a buffer in the RPC which will get filled in with some values which MDS is aware of which the client calling the RPC is not aware of.It has nothing to do with locking.I just want to fill in the buffer which I embedd in the RPC with some suitable data from the MDS end and then do operations on that data at the client side.So I think the approach suggested by you and Nicholas of just including the sizeof(str) [the size of the expected information from the MDS] in the size[] array should be fine as done below :- >> >> >> >> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), >> [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) }; >> >> ---->> >> __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), >> [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) , >> //how to add "char *str=Hello" ofcourse we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of such MACROS >> >> >> Please correct me if I am wrong or please guide me if I need to consider few corner cases to handle this use case. >> >> Thanks again. >> >> Thanks, >> Vilobh >> Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio >> >> >> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: >> Andreas, >> >> On Oct 14, 2010, at 17:31, Andreas Dilger wrote: >> >> > On 2010-10-13, at 23:18, Nicolas Williams wrote: >> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >> >>>> Yes, it''s possible to add buffers to requests. It''s not possible to add >> >>>> buffers to _replies_ to existing RPCs unless you know the client expects >> >>>> those additional buffers -- existing clients expect a given maxsize for >> >>>> each reply, and if your reply is bigger then it will get dropped. >> >>> It is wrong for last ~1year. >> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and resend a request. >> >> >> >> Ah, I didn''t know that was in 1.8. Are there interop issues (with older >> >> clients) though with sending larger replies than expected? >> > >> > Nico, it has always been possible in the past to increase the size of any buffer in a request, or in a reply (if the total reply size will fit into the pre-allocated reply buffer). An older peer would just ignore the bytes beyond the known part of the buffer. >> > >> I think that question don''t about rebalance buffers size in message, >> i think that sending large reply in smaller reply buffer. >> LNet don''t able to put large reply to small buffer (without truncate flag, which is not exist in older ptlrpc version). >> without that flag you will see messages >> >> >> CERROR("Matching packet from %s, match "LPU64 >> " length %d too big: %d left, %d allowed\n", >> libcfs_id2str(src), match_bits, rlength, >> md->md_length - offset, mlength); >> >> >> and LNet will drop message without notify PtlRPC. >> >> >> > Is that not true with the 2.x RPC handling? >> > >> 2.x able to rebalance space between buffers (but looks by hand), and able adjust reply buffer after truncated reply. >> >> >> >> -------------------------------------- >> Alexey Lyashkov >> alexey.lyashkov at clusterstor.com >> >> >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/9e2aae97/attachment-0001.html
Vilobh Meshram
2010-Oct-14 16:25 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Alexey, That surely helps.Thanks for all the help till now. Thanks, Vilobh *Graduate Research Associate Department of Computer Science The Ohio State University Columbus Ohio* On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov < alexey.lyashkov at clusterstor.com> wrote:> Hi Vilobh, > > interop == interoperability between nodes with different version of > software. > > in general we have two ways to solve that - for requests with mdc_body - > you can set flag in body and analyze that flag in server/client side. > if you want add new operation - better way add new flag into connect_data > (look to OBD_CONNECT_* macroses handling) > that flag can checked via export->connect_flags on client or server side > for remote side features. > as example 1.x and 2.0 have a different format for setattr requests : > int mdc_setattr > ... > if (mdc_exp_is_2_0_server(exp)) { > > size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr); > > size[REQ_REC_OFF + 1] = 0; /* capa */ > > size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch); > > size[REQ_REC_OFF + 3] = ealen; > > size[REQ_REC_OFF + 4] = ea2len; > > size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request); > > offset = REQ_REC_OFF + 5; > > bufcount = 6; > > replybufcount = 6; > > } else { > > bufcount = 4; > > } > > > example of client features are checking version based recovery support for > client > mds_version_get_check > ... > if (inode == NULL || !exp_connect_vbr(req->rq_export)) > > > > I hope that help you. > > > On Oct 14, 2010, at 18:29, Vilobh Meshram wrote: > > Hi Alexey, > > Thanks again for the reply. > > Can you briefly give me some pointers about this interop issue and in which > kind of RPC should this issue arise ? How should we resolve this what kind > of flag needs to be set in ? > > I went through the bugzilla entry mentioned by you it seems like for RPCs > dealing with LDLM may cause this issue.Please correct me if I am wrong. > > Thanks, > Vilobh > *Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio* > > > On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov < > alexey.lyashkov at clusterstor.com> wrote: > >> Hi Vilobh, >> >> as i see, you touched code related to locking. struct ldm_request used to >> lock enqueue process - that why i say about interop issue in ELC code, which >> solved with export flag. >> for common mdc requests you can resolve interop issue with flags in >> mdc_body (mdt_body), but that not possible for ldlm requests. >> >> >> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote: >> >> Hi Alexey, >> >> Thanks again for your reply. >> >> I am trying to embed a buffer in the RPC which will get filled in with >> some values which MDS is aware of which the client calling the RPC is not >> aware of.It has nothing to do with locking.I just want to fill in the >> buffer which I embedd in the RPC with some suitable data from the MDS end >> and then do operations on that data at the client side.So I think the >> approach suggested by you and Nicholas of just including the sizeof(str) >> [the size of the expected information from the MDS] in the size[] array >> should be fine as done below :- >> >> >> >> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), >> [DLM_LOCKREQ_OFF] = sizeof(struct >> ldlm_request) }; >> >> ---->> >> __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), >> [DLM_LOCKREQ_OFF] = sizeof(struct >> ldlm_request) , >> //how to add "char *str=Hello" ofcourse >> we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF >> bcz for a specific kind of RPC there are limited number of such MACROS >> >> >> *Please correct me if I am wrong or please guide me if I need to consider >> few corner cases to handle this use case. >> >> *Thanks again. >> >> Thanks, >> Vilobh >> *Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio* >> >> >> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov < >> alexey.lyashkov at clusterstor.com> wrote: >> >>> Andreas, >>> >>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote: >>> >>> > On 2010-10-13, at 23:18, Nicolas Williams wrote: >>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >>> >>>> Yes, it''s possible to add buffers to requests. It''s not possible to >>> add >>> >>>> buffers to _replies_ to existing RPCs unless you know the client >>> expects >>> >>>> those additional buffers -- existing clients expect a given maxsize >>> for >>> >>>> each reply, and if your reply is bigger then it will get dropped. >>> >>> It is wrong for last ~1year. >>> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for >>> reply, and resend a request. >>> >> >>> >> Ah, I didn''t know that was in 1.8. Are there interop issues (with >>> older >>> >> clients) though with sending larger replies than expected? >>> > >>> > Nico, it has always been possible in the past to increase the size of >>> any buffer in a request, or in a reply (if the total reply size will fit >>> into the pre-allocated reply buffer). An older peer would just ignore the >>> bytes beyond the known part of the buffer. >>> > >>> I think that question don''t about rebalance buffers size in message, >>> i think that sending large reply in smaller reply buffer. >>> LNet don''t able to put large reply to small buffer (without truncate >>> flag, which is not exist in older ptlrpc version). >>> without that flag you will see messages >>> >> >>> CERROR("Matching packet from %s, match "LPU64 >>> " length %d too big: %d left, %d allowed\n", >>> libcfs_id2str(src), match_bits, rlength, >>> md->md_length - offset, mlength); >>> >> >>> and LNet will drop message without notify PtlRPC. >>> >>> >>> > Is that not true with the 2.x RPC handling? >>> > >>> 2.x able to rebalance space between buffers (but looks by hand), and able >>> adjust reply buffer after truncated reply. >>> >>> >>> >>> -------------------------------------- >>> Alexey Lyashkov >>> alexey.lyashkov at clusterstor.com >>> >>> >>> >>> >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/83686fd2/attachment.html
Vilobh Meshram
2010-Oct-15 00:58 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Alexey/Nicholas, I modified the code in following way in the way Nicholas suggested yesterday in-order to get some information filled in a fixed sized buffer sent from client side.Here I am sending a buffer called "str" (whose size is 16) which will be updated at the MDS side by the string "hello"(whose size is 7 much less than original size of buffer "str" i.e 16).But I am not able to perform the operation successfully and I am getting an error "LustreError: 4209:0:(file.c:3143:ll_inode_revalidate_fini()) failure -14 inode 31257" which seems to be related to DLM_REPLY_REC_OFF since I have modified this offset in my code.Can you please review my code and suggest me if I am making any mistake.I will be done with my task if I can resolve this problem. Following are the modifications .The text in BOLD and Italics (blue color) are my modification at Client and MDS side for *Lustre 1.8.1.1*:- *At Client side :- lustre/ldlm/ldlm_lockd.c** * 655 int ldlm_cli_enqueue(.........) 665 __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), 666 [DLM_LOCKREQ_OFF] = sizeof(*body), 667 [DLM_REPLY_REC_OFF] = lvb_len ? lvb_len : 668 sizeof(struct ost_lvb), * 669 16};* 717 if (reqp == NULL || *reqp == NULL) { *718 req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0); | | v 575 struct ptlrpc_request *ldlm_prep_elc_req(.......) 584 void *str=NULL; 585 char *bufs[4] = {NULL,NULL,NULL,str}; 616 req ptlrpc_prep_req(class_exp2cliimp(exp), version, 617 opc, bufcount, size, bufs**); At MDS side :- lustre/ldlm/ldlm_lockd.c 992 int ldlm_handle_enqueue(.........) 996 { 1000 void *str; __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), [DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep) 1009 char *org = "hello"; *1119 existing_lock: 1120 1121 if (flags & LDLM_FL_HAS_INTENT) { 1122 /* In this case, the reply buffer is allocated deep in 1123 * local_lock_enqueue by the policy function. */ 1124 cookie = req; 1125 } else { *1126 int buffers = 4;* 1127 1128 lock_res_and_lock(lock); 1129 if (lock->l_resource->lr_lvb_len) { * size[DLM_REPLY_REC_OFF] lock->l_resource->lr_lvb_len; buffers = 4;* 1132 } 1133 unlock_res_and_lock(lock); 1134 1135 if (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR)) 1136 GOTO(out, rc = -ENOMEM); * str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1, 1); memcpy ( str , org , 7); size[DLM_REPLY_REC_OFF + 1] = 16; * Thanks, Vilobh *Graduate Research Associate Department of Computer Science The Ohio State University Columbus Ohio* On Thu, Oct 14, 2010 at 12:25 PM, Vilobh Meshram <vilobh.meshram at gmail.com>wrote:> Hi Alexey, > > That surely helps.Thanks for all the help till now. > > Thanks, > Vilobh > *Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio* > > > On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov < > alexey.lyashkov at clusterstor.com> wrote: > >> Hi Vilobh, >> >> interop == interoperability between nodes with different version of >> software. >> >> in general we have two ways to solve that - for requests with mdc_body - >> you can set flag in body and analyze that flag in server/client side. >> if you want add new operation - better way add new flag into connect_data >> (look to OBD_CONNECT_* macroses handling) >> that flag can checked via export->connect_flags on client or server side >> for remote side features. >> as example 1.x and 2.0 have a different format for setattr requests : >> int mdc_setattr >> ... >> if (mdc_exp_is_2_0_server(exp)) { >> >> size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr); >> >> size[REQ_REC_OFF + 1] = 0; /* capa */ >> >> size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch); >> >> size[REQ_REC_OFF + 3] = ealen; >> >> size[REQ_REC_OFF + 4] = ea2len; >> >> size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request); >> >> offset = REQ_REC_OFF + 5; >> >> bufcount = 6; >> >> replybufcount = 6; >> >> } else { >> >> bufcount = 4; >> >> } >> >> >> example of client features are checking version based recovery support for >> client >> mds_version_get_check >> ... >> if (inode == NULL || !exp_connect_vbr(req->rq_export)) >> >> >> >> I hope that help you. >> >> >> On Oct 14, 2010, at 18:29, Vilobh Meshram wrote: >> >> Hi Alexey, >> >> Thanks again for the reply. >> >> Can you briefly give me some pointers about this interop issue and in >> which kind of RPC should this issue arise ? How should we resolve this what >> kind of flag needs to be set in ? >> >> I went through the bugzilla entry mentioned by you it seems like for RPCs >> dealing with LDLM may cause this issue.Please correct me if I am wrong. >> >> Thanks, >> Vilobh >> *Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio* >> >> >> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov < >> alexey.lyashkov at clusterstor.com> wrote: >> >>> Hi Vilobh, >>> >>> as i see, you touched code related to locking. struct ldm_request used to >>> lock enqueue process - that why i say about interop issue in ELC code, which >>> solved with export flag. >>> for common mdc requests you can resolve interop issue with flags in >>> mdc_body (mdt_body), but that not possible for ldlm requests. >>> >>> >>> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote: >>> >>> Hi Alexey, >>> >>> Thanks again for your reply. >>> >>> I am trying to embed a buffer in the RPC which will get filled in with >>> some values which MDS is aware of which the client calling the RPC is not >>> aware of.It has nothing to do with locking.I just want to fill in the >>> buffer which I embedd in the RPC with some suitable data from the MDS end >>> and then do operations on that data at the client side.So I think the >>> approach suggested by you and Nicholas of just including the sizeof(str) >>> [the size of the expected information from the MDS] in the size[] array >>> should be fine as done below :- >>> >>> >>> >>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), >>> [DLM_LOCKREQ_OFF] = sizeof(struct >>> ldlm_request) }; >>> >>> ---->> >>> __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct >>> ptlrpc_body), >>> [DLM_LOCKREQ_OFF] = sizeof(struct >>> ldlm_request) , >>> //how to add "char *str=Hello" ofcourse >>> we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF >>> bcz for a specific kind of RPC there are limited number of such MACROS >>> >>> >>> *Please correct me if I am wrong or please guide me if I need to >>> consider few corner cases to handle this use case. >>> >>> *Thanks again. >>> >>> Thanks, >>> Vilobh >>> *Graduate Research Associate >>> Department of Computer Science >>> The Ohio State University Columbus Ohio* >>> >>> >>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov < >>> alexey.lyashkov at clusterstor.com> wrote: >>> >>>> Andreas, >>>> >>>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote: >>>> >>>> > On 2010-10-13, at 23:18, Nicolas Williams wrote: >>>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >>>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >>>> >>>> Yes, it''s possible to add buffers to requests. It''s not possible >>>> to add >>>> >>>> buffers to _replies_ to existing RPCs unless you know the client >>>> expects >>>> >>>> those additional buffers -- existing clients expect a given maxsize >>>> for >>>> >>>> each reply, and if your reply is bigger then it will get dropped. >>>> >>> It is wrong for last ~1year. >>>> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for >>>> reply, and resend a request. >>>> >> >>>> >> Ah, I didn''t know that was in 1.8. Are there interop issues (with >>>> older >>>> >> clients) though with sending larger replies than expected? >>>> > >>>> > Nico, it has always been possible in the past to increase the size of >>>> any buffer in a request, or in a reply (if the total reply size will fit >>>> into the pre-allocated reply buffer). An older peer would just ignore the >>>> bytes beyond the known part of the buffer. >>>> > >>>> I think that question don''t about rebalance buffers size in message, >>>> i think that sending large reply in smaller reply buffer. >>>> LNet don''t able to put large reply to small buffer (without truncate >>>> flag, which is not exist in older ptlrpc version). >>>> without that flag you will see messages >>>> >> >>>> CERROR("Matching packet from %s, match "LPU64 >>>> " length %d too big: %d left, %d allowed\n", >>>> libcfs_id2str(src), match_bits, rlength, >>>> md->md_length - offset, mlength); >>>> >> >>>> and LNet will drop message without notify PtlRPC. >>>> >>>> >>>> > Is that not true with the 2.x RPC handling? >>>> > >>>> 2.x able to rebalance space between buffers (but looks by hand), and >>>> able adjust reply buffer after truncated reply. >>>> >>>> >>>> >>>> -------------------------------------- >>>> Alexey Lyashkov >>>> alexey.lyashkov at clusterstor.com >>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101014/302f016d/attachment-0001.html
Alexey Lyashkov
2010-Oct-15 07:39 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
can you please attach diff file ? On Oct 15, 2010, at 03:58, Vilobh Meshram wrote:> Hi Alexey/Nicholas, > > I modified the code in following way in the way Nicholas suggested yesterday in-order to get some information filled in a fixed sized buffer sent from client side.Here I am sending a buffer called "str" (whose size is 16) which will be updated at the MDS side by the string "hello"(whose size is 7 much less than original size of buffer "str" i.e 16).But I am not able to perform the operation successfully and I am getting an error > "LustreError: 4209:0:(file.c:3143:ll_inode_revalidate_fini()) failure -14 inode 31257" > > which seems to be related to DLM_REPLY_REC_OFF since I have modified this offset in my code.Can you please review my code and suggest me if I am making any mistake.I will be done with my task if I can resolve this problem. > > Following are the modifications .The text in BOLD and Italics (blue color) are my modification at Client and MDS side for Lustre 1.8.1.1:- > > At Client side :- lustre/ldlm/ldlm_lockd.c > > 655 int ldlm_cli_enqueue(.........) > 665 __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > 666 [DLM_LOCKREQ_OFF] = sizeof(*body), > 667 [DLM_REPLY_REC_OFF] = lvb_len ? lvb_len : > 668 sizeof(struct ost_lvb), > 669 16}; > > 717 if (reqp == NULL || *reqp == NULL) { > 718 req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0); > | > | > v > > 575 struct ptlrpc_request *ldlm_prep_elc_req(.......) > 584 void *str=NULL; > 585 char *bufs[4] = {NULL,NULL,NULL,str}; > 616 req = ptlrpc_prep_req(class_exp2cliimp(exp), version, > 617 opc, bufcount, size, bufs); > > > At MDS side :- lustre/ldlm/ldlm_lockd.c > > 992 int ldlm_handle_enqueue(.........) > 996 { > 1000 void *str; > __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), > [DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep) > 1009 char *org = "hello"; > > > 1119 existing_lock: > 1120 > 1121 if (flags & LDLM_FL_HAS_INTENT) { > 1122 /* In this case, the reply buffer is allocated deep in > 1123 * local_lock_enqueue by the policy function. */ > 1124 cookie = req; > 1125 } else { > 1126 int buffers = 4; > 1127 > 1128 lock_res_and_lock(lock); > 1129 if (lock->l_resource->lr_lvb_len) { > size[DLM_REPLY_REC_OFF] = lock->l_resource->lr_lvb_len; > buffers = 4; > 1132 } > 1133 unlock_res_and_lock(lock); > 1134 > 1135 if (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR)) > 1136 GOTO(out, rc = -ENOMEM); > str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1, 1); > memcpy ( str , org , 7); > size[DLM_REPLY_REC_OFF + 1] = 16; > > > > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio > > > On Thu, Oct 14, 2010 at 12:25 PM, Vilobh Meshram <vilobh.meshram at gmail.com> wrote: > Hi Alexey, > > That surely helps.Thanks for all the help till now. > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio > > > On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: > Hi Vilobh, > > interop == interoperability between nodes with different version of software. > > in general we have two ways to solve that - for requests with mdc_body - you can set flag in body and analyze that flag in server/client side. > if you want add new operation - better way add new flag into connect_data (look to OBD_CONNECT_* macroses handling) > that flag can checked via export->connect_flags on client or server side for remote side features. > as example 1.x and 2.0 have a different format for setattr requests : > int mdc_setattr > ... > if (mdc_exp_is_2_0_server(exp)) { > size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr); > size[REQ_REC_OFF + 1] = 0; /* capa */ > size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch); > size[REQ_REC_OFF + 3] = ealen; > size[REQ_REC_OFF + 4] = ea2len; > size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request); > offset = REQ_REC_OFF + 5; > bufcount = 6; > replybufcount = 6; > } else { > bufcount = 4; > } > > example of client features are checking version based recovery support for client > mds_version_get_check > ... > if (inode == NULL || !exp_connect_vbr(req->rq_export)) > > > I hope that help you. > > > On Oct 14, 2010, at 18:29, Vilobh Meshram wrote: > >> Hi Alexey, >> >> Thanks again for the reply. >> >> Can you briefly give me some pointers about this interop issue and in which kind of RPC should this issue arise ? How should we resolve this what kind of flag needs to be set in ? >> >> I went through the bugzilla entry mentioned by you it seems like for RPCs dealing with LDLM may cause this issue.Please correct me if I am wrong. >> >> Thanks, >> Vilobh >> Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio >> >> >> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: >> Hi Vilobh, >> >> as i see, you touched code related to locking. struct ldm_request used to lock enqueue process - that why i say about interop issue in ELC code, which solved with export flag. >> for common mdc requests you can resolve interop issue with flags in mdc_body (mdt_body), but that not possible for ldlm requests. >> >> >> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote: >> >>> Hi Alexey, >>> >>> Thanks again for your reply. >>> >>> I am trying to embed a buffer in the RPC which will get filled in with some values which MDS is aware of which the client calling the RPC is not aware of.It has nothing to do with locking.I just want to fill in the buffer which I embedd in the RPC with some suitable data from the MDS end and then do operations on that data at the client side.So I think the approach suggested by you and Nicholas of just including the sizeof(str) [the size of the expected information from the MDS] in the size[] array should be fine as done below :- >>> >>> >>> >>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), >>> [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) }; >>> >>> ---->> >>> __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), >>> [DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request) , >>> //how to add "char *str=Hello" ofcourse we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF bcz for a specific kind of RPC there are limited number of such MACROS >>> >>> >>> Please correct me if I am wrong or please guide me if I need to consider few corner cases to handle this use case. >>> >>> Thanks again. >>> >>> Thanks, >>> Vilobh >>> Graduate Research Associate >>> Department of Computer Science >>> The Ohio State University Columbus Ohio >>> >>> >>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <alexey.lyashkov at clusterstor.com> wrote: >>> Andreas, >>> >>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote: >>> >>> > On 2010-10-13, at 23:18, Nicolas Williams wrote: >>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >>> >>>> Yes, it''s possible to add buffers to requests. It''s not possible to add >>> >>>> buffers to _replies_ to existing RPCs unless you know the client expects >>> >>>> those additional buffers -- existing clients expect a given maxsize for >>> >>>> each reply, and if your reply is bigger then it will get dropped. >>> >>> It is wrong for last ~1year. >>> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for reply, and resend a request. >>> >> >>> >> Ah, I didn''t know that was in 1.8. Are there interop issues (with older >>> >> clients) though with sending larger replies than expected? >>> > >>> > Nico, it has always been possible in the past to increase the size of any buffer in a request, or in a reply (if the total reply size will fit into the pre-allocated reply buffer). An older peer would just ignore the bytes beyond the known part of the buffer. >>> > >>> I think that question don''t about rebalance buffers size in message, >>> i think that sending large reply in smaller reply buffer. >>> LNet don''t able to put large reply to small buffer (without truncate flag, which is not exist in older ptlrpc version). >>> without that flag you will see messages >>> >> >>> CERROR("Matching packet from %s, match "LPU64 >>> " length %d too big: %d left, %d allowed\n", >>> libcfs_id2str(src), match_bits, rlength, >>> md->md_length - offset, mlength); >>> >> >>> and LNet will drop message without notify PtlRPC. >>> >>> >>> > Is that not true with the 2.x RPC handling? >>> > >>> 2.x able to rebalance space between buffers (but looks by hand), and able adjust reply buffer after truncated reply. >>> >>> >>> >>> -------------------------------------- >>> Alexey Lyashkov >>> alexey.lyashkov at clusterstor.com >>> >>> >>> >>> >>> >> >> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101015/9666f350/attachment-0001.html
Vilobh Meshram
2010-Oct-15 16:25 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
Hi Alexey, I have attached the diff file .Please have a look at it and please let me know your comments /suggestions. Thanks again. Thanks, Vilobh *Graduate Research Associate Department of Computer Science The Ohio State University Columbus Ohio* On Fri, Oct 15, 2010 at 3:39 AM, Alexey Lyashkov < alexey.lyashkov at clusterstor.com> wrote:> can you please attach diff file ? > > On Oct 15, 2010, at 03:58, Vilobh Meshram wrote: > > Hi Alexey/Nicholas, > > I modified the code in following way in the way Nicholas suggested > yesterday in-order to get some information filled in a fixed sized buffer > sent from client side.Here I am sending a buffer called "str" (whose size is > 16) which will be updated at the MDS side by the string "hello"(whose size > is 7 much less than original size of buffer "str" i.e 16).But I am not able > to perform the operation successfully and I am getting an error > "LustreError: 4209:0:(file.c:3143:ll_inode_revalidate_fini()) failure -14 > inode 31257" > > which seems to be related to DLM_REPLY_REC_OFF since I have modified this > offset in my code.Can you please review my code and suggest me if I am > making any mistake.I will be done with my task if I can resolve this > problem. > > Following are the modifications .The text in BOLD and Italics (blue color) > are my modification at Client and MDS side for *Lustre 1.8.1.1*:- > > *At Client side :- lustre/ldlm/ldlm_lockd.c** > > * 655 int ldlm_cli_enqueue(.........) > 665 __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct > ptlrpc_body), > 666 [DLM_LOCKREQ_OFF] = sizeof(*body), > 667 [DLM_REPLY_REC_OFF] = lvb_len ? lvb_len : > 668 sizeof(struct > ost_lvb), > * 669 16};* > > 717 if (reqp == NULL || *reqp == NULL) { > *718 req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0); > | > | > v > > 575 struct ptlrpc_request *ldlm_prep_elc_req(.......) > 584 void *str=NULL; > 585 char *bufs[4] = {NULL,NULL,NULL,str}; > 616 req > ptlrpc_prep_req(class_exp2cliimp(exp), version, > 617 opc, bufcount, > size, bufs**); > > > At MDS side :- lustre/ldlm/ldlm_lockd.c > > 992 int ldlm_handle_enqueue(.........) > 996 { > 1000 void *str; > __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct > ptlrpc_body), > [DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep) > 1009 char *org = "hello"; > > > *1119 existing_lock: > 1120 > 1121 if (flags & LDLM_FL_HAS_INTENT) { > 1122 /* In this case, the reply buffer is allocated deep in > 1123 * local_lock_enqueue by the policy function. */ > 1124 cookie = req; > 1125 } else { > *1126 int buffers = 4;* > 1127 > 1128 lock_res_and_lock(lock); > 1129 if (lock->l_resource->lr_lvb_len) { > * size[DLM_REPLY_REC_OFF] > lock->l_resource->lr_lvb_len; > buffers = 4;* > 1132 } > 1133 unlock_res_and_lock(lock); > 1134 > 1135 if > (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR)) > 1136 GOTO(out, rc = -ENOMEM); > * str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1, > 1); > memcpy ( str , org , 7); > size[DLM_REPLY_REC_OFF + 1] = 16; > > > * > > Thanks, > Vilobh > *Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio* > > > On Thu, Oct 14, 2010 at 12:25 PM, Vilobh Meshram <vilobh.meshram at gmail.com > > wrote: > >> Hi Alexey, >> >> That surely helps.Thanks for all the help till now. >> >> Thanks, >> Vilobh >> *Graduate Research Associate >> Department of Computer Science >> The Ohio State University Columbus Ohio* >> >> >> On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov < >> alexey.lyashkov at clusterstor.com> wrote: >> >>> Hi Vilobh, >>> >>> interop == interoperability between nodes with different version of >>> software. >>> >>> in general we have two ways to solve that - for requests with mdc_body - >>> you can set flag in body and analyze that flag in server/client side. >>> if you want add new operation - better way add new flag into >>> connect_data (look to OBD_CONNECT_* macroses handling) >>> that flag can checked via export->connect_flags on client or server side >>> for remote side features. >>> as example 1.x and 2.0 have a different format for setattr requests : >>> int mdc_setattr >>> ... >>> if (mdc_exp_is_2_0_server(exp)) { >>> >>> size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr); >>> >>> size[REQ_REC_OFF + 1] = 0; /* capa */ >>> >>> size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch); >>> >>> size[REQ_REC_OFF + 3] = ealen; >>> >>> size[REQ_REC_OFF + 4] = ea2len; >>> >>> size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request); >>> >>> offset = REQ_REC_OFF + 5; >>> >>> bufcount = 6; >>> >>> replybufcount = 6; >>> >>> } else { >>> >>> bufcount = 4; >>> >>> } >>> >>> >>> example of client features are checking version based recovery support >>> for client >>> mds_version_get_check >>> ... >>> if (inode == NULL || !exp_connect_vbr(req->rq_export)) >>> >>> >>> >>> I hope that help you. >>> >>> >>> On Oct 14, 2010, at 18:29, Vilobh Meshram wrote: >>> >>> Hi Alexey, >>> >>> Thanks again for the reply. >>> >>> Can you briefly give me some pointers about this interop issue and in >>> which kind of RPC should this issue arise ? How should we resolve this what >>> kind of flag needs to be set in ? >>> >>> I went through the bugzilla entry mentioned by you it seems like for RPCs >>> dealing with LDLM may cause this issue.Please correct me if I am wrong. >>> >>> Thanks, >>> Vilobh >>> *Graduate Research Associate >>> Department of Computer Science >>> The Ohio State University Columbus Ohio* >>> >>> >>> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov < >>> alexey.lyashkov at clusterstor.com> wrote: >>> >>>> Hi Vilobh, >>>> >>>> as i see, you touched code related to locking. struct ldm_request used >>>> to lock enqueue process - that why i say about interop issue in ELC code, >>>> which solved with export flag. >>>> for common mdc requests you can resolve interop issue with flags in >>>> mdc_body (mdt_body), but that not possible for ldlm requests. >>>> >>>> >>>> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote: >>>> >>>> Hi Alexey, >>>> >>>> Thanks again for your reply. >>>> >>>> I am trying to embed a buffer in the RPC which will get filled in with >>>> some values which MDS is aware of which the client calling the RPC is not >>>> aware of.It has nothing to do with locking.I just want to fill in the >>>> buffer which I embedd in the RPC with some suitable data from the MDS end >>>> and then do operations on that data at the client side.So I think the >>>> approach suggested by you and Nicholas of just including the sizeof(str) >>>> [the size of the expected information from the MDS] in the size[] array >>>> should be fine as done below :- >>>> >>>> >>>> >>>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), >>>> [DLM_LOCKREQ_OFF] >>>> sizeof(struct ldlm_request) }; >>>> >>>> ---->> >>>> __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct >>>> ptlrpc_body), >>>> [DLM_LOCKREQ_OFF] = sizeof(struct >>>> ldlm_request) , >>>> //how to add "char *str=Hello" >>>> ofcourse we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF >>>> bcz for a specific kind of RPC there are limited number of such MACROS >>>> >>>> >>>> *Please correct me if I am wrong or please guide me if I need to >>>> consider few corner cases to handle this use case. >>>> >>>> *Thanks again. >>>> >>>> Thanks, >>>> Vilobh >>>> *Graduate Research Associate >>>> Department of Computer Science >>>> The Ohio State University Columbus Ohio* >>>> >>>> >>>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov < >>>> alexey.lyashkov at clusterstor.com> wrote: >>>> >>>>> Andreas, >>>>> >>>>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote: >>>>> >>>>> > On 2010-10-13, at 23:18, Nicolas Williams wrote: >>>>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote: >>>>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote: >>>>> >>>> Yes, it''s possible to add buffers to requests. It''s not possible >>>>> to add >>>>> >>>> buffers to _replies_ to existing RPCs unless you know the client >>>>> expects >>>>> >>>> those additional buffers -- existing clients expect a given >>>>> maxsize for >>>>> >>>> each reply, and if your reply is bigger then it will get dropped. >>>>> >>> It is wrong for last ~1year. >>>>> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for >>>>> reply, and resend a request. >>>>> >> >>>>> >> Ah, I didn''t know that was in 1.8. Are there interop issues (with >>>>> older >>>>> >> clients) though with sending larger replies than expected? >>>>> > >>>>> > Nico, it has always been possible in the past to increase the size of >>>>> any buffer in a request, or in a reply (if the total reply size will fit >>>>> into the pre-allocated reply buffer). An older peer would just ignore the >>>>> bytes beyond the known part of the buffer. >>>>> > >>>>> I think that question don''t about rebalance buffers size in message, >>>>> i think that sending large reply in smaller reply buffer. >>>>> LNet don''t able to put large reply to small buffer (without truncate >>>>> flag, which is not exist in older ptlrpc version). >>>>> without that flag you will see messages >>>>> >> >>>>> CERROR("Matching packet from %s, match "LPU64 >>>>> " length %d too big: %d left, %d allowed\n", >>>>> libcfs_id2str(src), match_bits, rlength, >>>>> md->md_length - offset, mlength); >>>>> >> >>>>> and LNet will drop message without notify PtlRPC. >>>>> >>>>> >>>>> > Is that not true with the 2.x RPC handling? >>>>> > >>>>> 2.x able to rebalance space between buffers (but looks by hand), and >>>>> able adjust reply buffer after truncated reply. >>>>> >>>>> >>>>> >>>>> -------------------------------------- >>>>> Alexey Lyashkov >>>>> alexey.lyashkov at clusterstor.com >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101015/3b409614/attachment-0001.html -------------- next part -------------- Index: lustre.spec ==================================================================--- lustre.spec (revision 8279) +++ lustre.spec (working copy) @@ -1,7 +1,7 @@ # lustre.spec %{!?version: %define version 1.8.1.1} -%{!?kversion: %define kversion } -%{!?release: %define release } +%{!?kversion: %define kversion 2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust} +%{!?release: %define release 2.6.18_128.7.1.el5_lustre.1.8.1.1smp_cust_201010141227} %{!?lustre_name: %define lustre_name lustre} %define is_client %(bash -c "if [[ %{lustre_name} = *-client ]]; then echo -n ''1''; else echo -n ''0''; fi") @@ -104,7 +104,7 @@ # Set an explicit path to our Linux tree, if we can. cd $RPM_BUILD_DIR/lustre-%{version} -./configure ''--disable-modules'' ''--disable-utils'' ''--disable-liblustre'' ''--disable-tests'' ''--disable-doc'' --with-lustre-hack --with-sockets %{?configure_flags:configure_flags} \ +./configure ''--with-o2ib=/usr/local/ofed/src/ofa_kernel'' ''--with-linux=/lib/modules/2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust/build'' --with-lustre-hack --with-sockets %{?configure_flags:configure_flags} \ --sysconfdir=%{_sysconfdir} \ --mandir=%{_mandir} \ --libdir=%{_libdir} Index: lustre/mds/handler.c ==================================================================--- lustre/mds/handler.c (revision 8279) +++ lustre/mds/handler.c (working copy) @@ -1687,7 +1687,7 @@ mds->mds_max_mdsize, mds->mds_max_cookiesize }; int bufcount; - + printk("Inside function %s a hit for case MDS_REINT",__func__); /* NB only peek inside req now; mds_reint() will swab it */ if (opcp == NULL) { CERROR ("Can''t inspect opcode\n"); @@ -1704,15 +1704,18 @@ switch (opc) { case REINT_CREATE: + printk("Inside function %s a hit for case REINT_CREATE",__func__); op = PTLRPC_LAST_CNTR + MDS_REINT_CREATE; break; case REINT_LINK: op = PTLRPC_LAST_CNTR + MDS_REINT_LINK; break; case REINT_OPEN: + printk("Inside function %s a hit for case REINT_OPEN",__func__); op = PTLRPC_LAST_CNTR + MDS_REINT_OPEN; break; case REINT_SETATTR: + printk("Inside function %s a hit for case REINT_SETATTR",__func__); op = PTLRPC_LAST_CNTR + MDS_REINT_SETATTR; break; case REINT_RENAME: @@ -1745,8 +1748,9 @@ if (opc == REINT_UNLINK || opc == REINT_RENAME) size[DLM_REPLY_REC_OFF + 1] = 0; } - + printk("Inside function %s in case MDS_REINT before calling lustre_pack_reply",__func__); rc = lustre_pack_reply(req, bufcount, size, NULL); + printk("Inside function %s in case MDS_REINT after calling lustre_pack_reply",__func__); if (rc) break; @@ -1756,6 +1760,7 @@ } case MDS_CLOSE: + printk("Inside function %s in case MDS_CLOSE",__func__); DEBUG_REQ(D_INODE, req, "close"); OBD_FAIL_RETURN(OBD_FAIL_MDS_CLOSE_NET, 0); rc = mds_close(req, REQ_REC_OFF); @@ -1798,6 +1803,7 @@ break; #endif case OBD_PING: + printk("Inside function %s got a hit at case OBD_PING",__func__); DEBUG_REQ(D_INODE, req, "ping"); rc = target_handle_ping(req); if (req->rq_export->exp_delayed) @@ -1811,6 +1817,7 @@ break; case LDLM_ENQUEUE: + printk("\n Inside function %s got a hit at case LDLM_ENQUEUE",__func__); DEBUG_REQ(D_INODE, req, "enqueue"); OBD_FAIL_RETURN(OBD_FAIL_LDLM_ENQUEUE, 0); rc = ldlm_handle_enqueue(req, ldlm_server_completion_ast, Index: lustre/ldlm/ldlm_request.c ==================================================================--- lustre/ldlm/ldlm_request.c (revision 8279) +++ lustre/ldlm/ldlm_request.c (working copy) @@ -581,6 +581,8 @@ int flags, avail, to_free, pack = 0; struct ldlm_request *dlm = NULL; struct ptlrpc_request *req; + void *str=NULL; + char *bufs[4] = {NULL,NULL,NULL,str}; CFS_LIST_HEAD(head); ENTRY; @@ -609,8 +611,10 @@ size[bufoff] = ldlm_request_bufsize(pack, opc); } + printk("\n Inside function %s before calling ptlrpc_prep_req",__func__); + printk("\n OPC for LDLM_ENQUEUE is %d",opc); req = ptlrpc_prep_req(class_exp2cliimp(exp), version, - opc, bufcount, size, NULL); + opc, bufcount, size, bufs); req->rq_export = class_export_get(exp); if (exp_connect_cancelset(exp) && req) { if (canceloff) { @@ -658,10 +662,11 @@ struct ldlm_lock *lock; struct ldlm_request *body; struct ldlm_reply *reply; - __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), + __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), [DLM_LOCKREQ_OFF] = sizeof(*body), [DLM_REPLY_REC_OFF] = lvb_len ? lvb_len : - sizeof(struct ost_lvb) }; + sizeof(struct ost_lvb), + 16}; int is_replay = *flags & LDLM_FL_REPLAY; int req_passed_in = 1, rc, err; struct ptlrpc_request *req; @@ -710,7 +715,7 @@ /* lock not sent to server yet */ if (reqp == NULL || *reqp == NULL) { - req = ldlm_prep_enqueue_req(exp, 2, size, NULL, 0); + req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0); if (req == NULL) { failed_lock_cleanup(ns, lock, lockh, einfo->ei_mode); LDLM_LOCK_PUT(lock); Index: lustre/ldlm/ldlm_lockd.c ==================================================================--- lustre/ldlm/ldlm_lockd.c (revision 8279) +++ lustre/ldlm/ldlm_lockd.c (working copy) @@ -997,13 +997,17 @@ struct obd_device *obddev = req->rq_export->exp_obd; struct ldlm_reply *dlm_rep; struct ldlm_request *dlm_req; - __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), - [DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep) }; + void *str; + __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body), + [DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep) + }; int rc = 0; __u32 flags; ldlm_error_t err = ELDLM_OK; struct ldlm_lock *lock = NULL; void *cookie = NULL; + char *org = "hello"; + ENTRY; LDLM_DEBUG_NOLOCK("server-side enqueue handler START"); @@ -1119,19 +1123,24 @@ * local_lock_enqueue by the policy function. */ cookie = req; } else { - int buffers = 2; + int buffers = 4; lock_res_and_lock(lock); if (lock->l_resource->lr_lvb_len) { size[DLM_REPLY_REC_OFF] = lock->l_resource->lr_lvb_len; - buffers = 3; + buffers = 4; } unlock_res_and_lock(lock); if (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR)) GOTO(out, rc = -ENOMEM); + str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1, 1); + memcpy ( str , org , 7); + size[DLM_REPLY_REC_OFF + 1] = 16; + printk("\n Inside function %s before calling 1.LUSTRE_PACK_REPLY",__func__); rc = lustre_pack_reply(req, buffers, size, NULL); + printk("\n Inside function %s after calling 1.LUSTRE_PACK_REPLY",__func__); if (rc) GOTO(out, rc); } @@ -1215,7 +1224,9 @@ out: req->rq_status = rc ?: err; /* return either error - bug 11190 */ if (!req->rq_packed_final) { + printk("\n Inside function %s before calling 2.LUSTRE_PACK_REPLY",__func__); err = lustre_pack_reply(req, 1, NULL, NULL); + printk("\n Inside function %s after calling 2.LUSTRE_PACK_REPLY",__func__); if (rc == 0) rc = err; }
Alexey Lyashkov
2010-Oct-15 17:22 UTC
[Lustre-devel] Query to understand the Lustre request/reply message
first comment. please use diff -p to see what function has changed. second, please use CDEBUG() if need :) you can set debug level via sysctl -w lnet.debug=-1, sysctl -w lnet.debug_subsystem=-1 after it you can get very detail log via lctl dk (dump_kernel) > $log-file. other comments i say tomorrow. On Oct 15, 2010, at 19:25, Vilobh Meshram wrote:> Hi Alexey, > > I have attached the diff file .Please have a look at it and please let me know your comments /suggestions. > > Thanks again. > > Thanks, > Vilobh > Graduate Research Associate > Department of Computer Science > The Ohio State University Columbus Ohio >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20101015/c7a2c826/attachment.html