Hello, Michael There is a logfile parser script in the attachment, which was written by Eric. This script is very simple, but it should help you understand how we retrieve time step information from lustre debug log. On the server side, if you enable rpc_trace log, whenever the request arrive/start being processed/end of processing, there will be corresponding records being written into the debug log. Basically, you can get all the time step information from these records (actually only two of these records would be enough). a. 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400 This record means the req will being handled, so you can get the start time stamp(1272313858.472660) operation type (opc: 400, ping), xid (1334115493386242), client nid(12345-0 at lo) and so on. b. 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 Request procesed in 45us (77us total) trans 0 rc 0/0 This record means the req is already being handled, so you can get the end time stamp(1272313858.472687), operation type (opc: 400, ping), xid (1334115493386242), client nid(12345-0 at lo) and so no. Note: (77us total) means how long it takes from the request arriving to the end of processing. so you can also get the request arriving time stamp here by (1272313858 - 77 = 1272312781). So with these information you can draw the graph Eric mentioned in his email. If you have any questions, please let me know. Thanks WangDi Eric Barton wrote:> > ------------------------------------------------------------------------ > > Subject: > RE: Visualising Lustre RPCs > To: > "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" > <Tom.Wang at Sun.COM> > > To: > "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" > <Tom.Wang at Sun.COM> > CC: > <lustre-devel at lists.lustre.org>, "Galen M. Shipman" <gshipman at ornl.gov> > > > Michael, > > The previous Lustre RPC visualisation effort I mentioned at the LUG > used the > > Lustre debug log entries of type D_RPCTRACE. We disabled all but these > > log messages and then used the Lustre debug daemon to collect them while > > we ran I/O tests. We then ran a simple logfile parser which used just > the log entries > > for request arrival? > > , start of processing and end of processing to graph request > > queue depth (arrival->end) and the number of requests being serviced > by type > > over time ? e.g? > > read3d > > ?which shows request queue depth (vertical) over time (axis labelled > 20-25) by > > server (axis labelled 0-80). > > *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] > *Sent:* 17 April 2010 6:26 AM > *To:* Galen M. Shipman; Eric Barton > *Subject:* Visualising Lustre RPCs > > Hi Galen, Eric, > > in order to get this little project started, I think what I need at > first to > > write a prototype for a converter are the following things: > > A set of test traces collected on maybe a handful of clients and some > servers > > is probably a good point to start with. It would be even better if we know > > what is in this traces so that we have an expectation what kind of things > > we want to see on the Vampir displays. This little program that Eric > mentioned > > that can read the trace file would be very helpful as well. And as the > last > > idea I have right now, a technical contact. I might come up with a couple > > of question after I have taken the first look onto the original trace > data > > and before I start writing code. > > Regards, Michael > > > -- > > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> > WWW: http://www.tu-dresden.de/zih >-------------- next part -------------- A non-text attachment was scrubbed... Name: rpc_parse.tgz Type: application/x-compressed-tar Size: 30287 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100428/e7d90469/attachment-0001.bin
AFAIK the Handling time is when request processing starts, not when it first arrived at the server. We would need to check I''d there is an RPCTRACE message from ptlrpc_handle_req_in() that we could use to determine the initial request arrival time. Cheers, Andreas On 2010-04-28, at 19:25, "di.wang" <di.wang at oracle.com> wrote:> Hello, Michael > > There is a logfile parser script in the attachment, which was > written by Eric. > > This script is very simple, but it should help you understand how we > retrieve time step information from lustre debug log. On the server > side, if you enable rpc_trace log, whenever the request arrive/start > being processed/end of processing, there will be corresponding > records being written into the debug log. Basically, you can get all > the time step information from these records (actually only two of > these records would be enough). > > > a. 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c: > 1625:ptlrpc_server_handle_request()) Handling RPC pname:cluuid > +ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466- > ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400 > > This record means the req will being handled, so you can get the > start time stamp(1272313858.472660) operation type (opc: 400, ping), > xid (1334115493386242), client nid(12345-0 at lo) and so on. > > b. 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c: > 1672:ptlrpc_server_handle_request()) Handled RPC pname:cluuid > +ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466- > ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 Request > procesed in 45us (77us total) trans 0 rc 0/0 > > This record means the req is already being handled, so you can get > the end time stamp(1272313858.472687), operation type (opc: 400, > ping), xid (1334115493386242), client nid(12345-0 at lo) and so no. > > Note: (77us total) means how long it takes from the request arriving > to the end of processing. so you can also get the request arriving > time stamp here by (1272313858 - 77 = 1272312781). > > > So with these information you can draw the graph Eric mentioned in > his email. If you have any questions, please let me know. > > Thanks > WangDi > > > Eric Barton wrote: >> >> --- >> --------------------------------------------------------------------- >> >> Subject: >> RE: Visualising Lustre RPCs >> To: >> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" <Tom.Wang at Sun.COM >> > >> >> To: >> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" <Tom.Wang at Sun.COM >> > >> CC: >> <lustre-devel at lists.lustre.org>, "Galen M. Shipman" <gshipman at ornl.gov >> > >> >> >> Michael, >> >> The previous Lustre RPC visualisation effort I mentioned at the LUG >> used the >> >> Lustre debug log entries of type D_RPCTRACE. We disabled all but >> these >> >> log messages and then used the Lustre debug daemon to collect them >> while >> >> we ran I/O tests. We then ran a simple logfile parser which used >> just the log entries >> >> for request arrival? >> >> , start of processing and end of processing to graph request >> >> queue depth (arrival->end) and the number of requests being >> serviced by type >> >> over time ? e.g? >> >> read3d >> >> ?which shows request queue depth (vertical) over time (axis labell >> ed 20-25) by >> >> server (axis labelled 0-80). >> >> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] >> *Sent:* 17 April 2010 6:26 AM >> *To:* Galen M. Shipman; Eric Barton >> *Subject:* Visualising Lustre RPCs >> >> Hi Galen, Eric, >> >> in order to get this little project started, I think what I need at >> first to >> >> write a prototype for a converter are the following things: >> >> A set of test traces collected on maybe a handful of clients and >> some servers >> >> is probably a good point to start with. It would be even better if >> we know >> >> what is in this traces so that we have an expectation what kind of >> things >> >> we want to see on the Vampir displays. This little program that >> Eric mentioned >> >> that can read the trace file would be very helpful as well. And as >> the last >> >> idea I have right now, a technical contact. I might come up with a >> couple >> >> of question after I have taken the first look onto the original >> trace data >> >> and before I start writing code. >> >> Regards, Michael >> >> >> -- >> >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room WIL A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de >> > >> WWW: http://www.tu-dresden.de/zih >> > > <rpc_parse.tgz> > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel
Andreas Dilger wrote:> AFAIK the Handling time is when request processing starts, not when it > first arrived at the server. We would need to check I''d there is an > RPCTRACE message from ptlrpc_handle_req_in() that we could use to > determine the initial request arrival time. >Oh, you can also get the arriving time in "Handled RPC" msg by Handled RPC_time - total time. Thanks WangDi> Cheers, Andreas > > On 2010-04-28, at 19:25, "di.wang" <di.wang at oracle.com> wrote: > > >> Hello, Michael >> >> There is a logfile parser script in the attachment, which was >> written by Eric. >> >> This script is very simple, but it should help you understand how we >> retrieve time step information from lustre debug log. On the server >> side, if you enable rpc_trace log, whenever the request arrive/start >> being processed/end of processing, there will be corresponding >> records being written into the debug log. Basically, you can get all >> the time step information from these records (actually only two of >> these records would be enough). >> >> >> a. 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c: >> 1625:ptlrpc_server_handle_request()) Handling RPC pname:cluuid >> +ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466- >> ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400 >> >> This record means the req will being handled, so you can get the >> start time stamp(1272313858.472660) operation type (opc: 400, ping), >> xid (1334115493386242), client nid(12345-0 at lo) and so on. >> >> b. 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c: >> 1672:ptlrpc_server_handle_request()) Handled RPC pname:cluuid >> +ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466- >> ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 Request >> procesed in 45us (77us total) trans 0 rc 0/0 >> >> This record means the req is already being handled, so you can get >> the end time stamp(1272313858.472687), operation type (opc: 400, >> ping), xid (1334115493386242), client nid(12345-0 at lo) and so no. >> >> Note: (77us total) means how long it takes from the request arriving >> to the end of processing. so you can also get the request arriving >> time stamp here by (1272313858 - 77 = 1272312781). >> >> >> So with these information you can draw the graph Eric mentioned in >> his email. If you have any questions, please let me know. >> >> Thanks >> WangDi >> >> >> Eric Barton wrote: >> >>> --- >>> --------------------------------------------------------------------- >>> >>> Subject: >>> RE: Visualising Lustre RPCs >>> To: >>> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" <Tom.Wang at Sun.COM >>> >>> To: >>> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" <Tom.Wang at Sun.COM >>> >>> CC: >>> <lustre-devel at lists.lustre.org>, "Galen M. Shipman" <gshipman at ornl.gov >>> >>> Michael, >>> >>> The previous Lustre RPC visualisation effort I mentioned at the LUG >>> used the >>> >>> Lustre debug log entries of type D_RPCTRACE. We disabled all but >>> these >>> >>> log messages and then used the Lustre debug daemon to collect them >>> while >>> >>> we ran I/O tests. We then ran a simple logfile parser which used >>> just the log entries >>> >>> for request arrival? >>> >>> , start of processing and end of processing to graph request >>> >>> queue depth (arrival->end) and the number of requests being >>> serviced by type >>> >>> over time ? e.g? >>> >>> read3d >>> >>> ?which shows request queue depth (vertical) over time (axis labell >>> ed 20-25) by >>> >>> server (axis labelled 0-80). >>> >>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] >>> *Sent:* 17 April 2010 6:26 AM >>> *To:* Galen M. Shipman; Eric Barton >>> *Subject:* Visualising Lustre RPCs >>> >>> Hi Galen, Eric, >>> >>> in order to get this little project started, I think what I need at >>> first to >>> >>> write a prototype for a converter are the following things: >>> >>> A set of test traces collected on maybe a handful of clients and >>> some servers >>> >>> is probably a good point to start with. It would be even better if >>> we know >>> >>> what is in this traces so that we have an expectation what kind of >>> things >>> >>> we want to see on the Vampir displays. This little program that >>> Eric mentioned >>> >>> that can read the trace file would be very helpful as well. And as >>> the last >>> >>> idea I have right now, a technical contact. I might come up with a >>> couple >>> >>> of question after I have taken the first look onto the original >>> trace data >>> >>> and before I start writing code. >>> >>> Regards, Michael >>> >>> >>> -- >>> >>> Michael Kluge, M.Sc. >>> >>> Technische Universit?t Dresden >>> Center for Information Services and >>> High Performance Computing (ZIH) >>> D-01062 Dresden >>> Germany >>> >>> Contact: >>> Willersbau, Room WIL A 208 >>> Phone: (+49) 351 463-34217 >>> Fax: (+49) 351 463-37773 >>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de >>> >>> WWW: http://www.tu-dresden.de/zih >>> >>> >> <rpc_parse.tgz> >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel >> > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
Hi WangDi, OK, thanks for input. I''ll go ahead and try to write a converter. Could you please collect a set test traces that belong together from a couple servers and clients and put them somewhere so that I can download them? Thanks, Michael Am 29.04.2010 03:25, schrieb di.wang:> Hello, Michael > > There is a logfile parser script in the attachment, which was written by > Eric. > > This script is very simple, but it should help you understand how we > retrieve time step information from lustre debug log. On the server > side, if you enable rpc_trace log, whenever the request arrive/start > being processed/end of processing, there will be corresponding records > being written into the debug log. Basically, you can get all the time > step information from these records (actually only two of these records > would be enough). > > > a. > 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request()) > Handling RPC pname:cluuid+ref:pid:xid:nid:opc > ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400 > > > This record means the req will being handled, so you can get the start > time stamp(1272313858.472660) operation type (opc: 400, ping), xid > (1334115493386242), client nid(12345-0 at lo) and so on. > > b. > 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request()) > Handled RPC pname:cluuid+ref:pid:xid:nid:opc > ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 > Request procesed in 45us (77us total) trans 0 rc 0/0 > > This record means the req is already being handled, so you can get the > end time stamp(1272313858.472687), operation type (opc: 400, ping), xid > (1334115493386242), client nid(12345-0 at lo) and so no. > > Note: (77us total) means how long it takes from the request arriving to > the end of processing. so you can also get the request arriving time > stamp here by (1272313858 - 77 = 1272312781). > > > So with these information you can draw the graph Eric mentioned in his > email. If you have any questions, please let me know. > > Thanks > WangDi > > > Eric Barton wrote: >> >> ------------------------------------------------------------------------ >> >> Subject: >> RE: Visualising Lustre RPCs >> To: >> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" >> <Tom.Wang at Sun.COM> >> >> To: >> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" >> <Tom.Wang at Sun.COM> >> CC: >> <lustre-devel at lists.lustre.org>, "Galen M. Shipman" <gshipman at ornl.gov> >> >> >> Michael, >> >> The previous Lustre RPC visualisation effort I mentioned at the LUG >> used the >> >> Lustre debug log entries of type D_RPCTRACE. We disabled all but these >> >> log messages and then used the Lustre debug daemon to collect them while >> >> we ran I/O tests. We then ran a simple logfile parser which used just >> the log entries >> >> for request arrival? >> >> , start of processing and end of processing to graph request >> >> queue depth (arrival->end) and the number of requests being serviced >> by type >> >> over time ? e.g? >> >> read3d >> >> ?which shows request queue depth (vertical) over time (axis labelled >> 20-25) by >> >> server (axis labelled 0-80). >> >> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] >> *Sent:* 17 April 2010 6:26 AM >> *To:* Galen M. Shipman; Eric Barton >> *Subject:* Visualising Lustre RPCs >> >> Hi Galen, Eric, >> >> in order to get this little project started, I think what I need at >> first to >> >> write a prototype for a converter are the following things: >> >> A set of test traces collected on maybe a handful of clients and some >> servers >> >> is probably a good point to start with. It would be even better if we >> know >> >> what is in this traces so that we have an expectation what kind of things >> >> we want to see on the Vampir displays. This little program that Eric >> mentioned >> >> that can read the trace file would be very helpful as well. And as the >> last >> >> idea I have right now, a technical contact. I might come up with a couple >> >> of question after I have taken the first look onto the original trace >> data >> >> and before I start writing code. >> >> Regards, Michael >> >> >> -- >> >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room WIL A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> >> WWW: http://www.tu-dresden.de/zih
Hi WangDi, I have a small problem understanding the logical order of the events. Here is one example from the logs where the client reports an RPC as completed before the server says that it has finished the RPC handling. The difference is about 1.5 ms. Is that due to the fact that the server waits until the client has ack''ed the result of the RPC? Or are the clocks between the servers not well synchronized? Or are the timestamps in the logfile sometimes not correct (log message could not be flushed or whatever)? In the first case I don''t have a chance to figure out when the server has finished working on the RPC and is going to send out the result, correct? ../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4 ../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4 Request procesed in 238391us (239698us total) trans 266 rc 0/0 For this RPC I see in the oss log: 1272565278.025337: incoming x1334380768266050 1272565278.026649: Handling RPC x1334380768266050 1272565278.265039: Handled RPC x1334380768266050:Request procesed in 238391us (239698us total) So, by doing the math I get: 265039-026649=238390 (just 1 off target, might be wathever) and 265039-025337=239702 (4 off target). So I guess the times reported by the last message are calculated by using a different data source than the log time stamps? Regards, Michael Am 29.04.2010 20:27, schrieb di.wang:> Hello, Michael > > Here is a small debug log example you can use for your development. > > Thanks > WangDi > Michael Kluge wrote: >> Hi WangDi, >> >> OK, thanks for input. I''ll go ahead and try to write a converter. >> Could you please collect a set test traces that belong together from a >> couple servers and clients and put them somewhere so that I can >> download them? >> >> >> Thanks, Michael >> >> Am 29.04.2010 03:25, schrieb di.wang: >>> Hello, Michael >>> >>> There is a logfile parser script in the attachment, which was written by >>> Eric. >>> >>> This script is very simple, but it should help you understand how we >>> retrieve time step information from lustre debug log. On the server >>> side, if you enable rpc_trace log, whenever the request arrive/start >>> being processed/end of processing, there will be corresponding records >>> being written into the debug log. Basically, you can get all the time >>> step information from these records (actually only two of these records >>> would be enough). >>> >>> >>> a. >>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request()) >>> >>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc >>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400 >>> >>> >>> >>> This record means the req will being handled, so you can get the start >>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid >>> (1334115493386242), client nid(12345-0 at lo) and so on. >>> >>> b. >>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request()) >>> >>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc >>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 >>> >>> Request procesed in 45us (77us total) trans 0 rc 0/0 >>> >>> This record means the req is already being handled, so you can get the >>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid >>> (1334115493386242), client nid(12345-0 at lo) and so no. >>> >>> Note: (77us total) means how long it takes from the request arriving to >>> the end of processing. so you can also get the request arriving time >>> stamp here by (1272313858 - 77 = 1272312781). >>> >>> >>> So with these information you can draw the graph Eric mentioned in his >>> email. If you have any questions, please let me know. >>> >>> Thanks >>> WangDi >>> >>> >>> Eric Barton wrote: >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> Subject: >>>> RE: Visualising Lustre RPCs >>>> To: >>>> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" >>>> <Tom.Wang at Sun.COM> >>>> >>>> To: >>>> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" >>>> <Tom.Wang at Sun.COM> >>>> CC: >>>> <lustre-devel at lists.lustre.org>, "Galen M. Shipman" <gshipman at ornl.gov> >>>> >>>> >>>> Michael, >>>> >>>> The previous Lustre RPC visualisation effort I mentioned at the LUG >>>> used the >>>> >>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these >>>> >>>> log messages and then used the Lustre debug daemon to collect them >>>> while >>>> >>>> we ran I/O tests. We then ran a simple logfile parser which used just >>>> the log entries >>>> >>>> for request arrival? >>>> >>>> , start of processing and end of processing to graph request >>>> >>>> queue depth (arrival->end) and the number of requests being serviced >>>> by type >>>> >>>> over time ? e.g? >>>> >>>> read3d >>>> >>>> ?which shows request queue depth (vertical) over time (axis labelled >>>> 20-25) by >>>> >>>> server (axis labelled 0-80). >>>> >>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] >>>> *Sent:* 17 April 2010 6:26 AM >>>> *To:* Galen M. Shipman; Eric Barton >>>> *Subject:* Visualising Lustre RPCs >>>> >>>> Hi Galen, Eric, >>>> >>>> in order to get this little project started, I think what I need at >>>> first to >>>> >>>> write a prototype for a converter are the following things: >>>> >>>> A set of test traces collected on maybe a handful of clients and some >>>> servers >>>> >>>> is probably a good point to start with. It would be even better if we >>>> know >>>> >>>> what is in this traces so that we have an expectation what kind of >>>> things >>>> >>>> we want to see on the Vampir displays. This little program that Eric >>>> mentioned >>>> >>>> that can read the trace file would be very helpful as well. And as the >>>> last >>>> >>>> idea I have right now, a technical contact. I might come up with a >>>> couple >>>> >>>> of question after I have taken the first look onto the original trace >>>> data >>>> >>>> and before I start writing code. >>>> >>>> Regards, Michael >>>> >>>> >>>> -- >>>> >>>> Michael Kluge, M.Sc. >>>> >>>> Technische Universit?t Dresden >>>> Center for Information Services and >>>> High Performance Computing (ZIH) >>>> D-01062 Dresden >>>> Germany >>>> >>>> Contact: >>>> Willersbau, Room WIL A 208 >>>> Phone: (+49) 351 463-34217 >>>> Fax: (+49) 351 463-37773 >>>> e-mail: michael.kluge at tu-dresden.de >>>> <mailto:michael.kluge at tu-dresden.de> >>>> WWW: http://www.tu-dresden.de/zih >> >
On 2010-05-03, at 04:41, Michael Kluge wrote:> I have a small problem understanding the logical order of the events. > Here is one example from the logs where the client reports an RPC as > completed before the server says that it has finished the RPC handling. > The difference is about 1.5 ms. > > Is that due to the fact that the server waits until the client has > ack''ed the result of the RPC? Or are the clocks between the servers not > well synchronized? Or are the timestamps in the logfile sometimes not > correct (log message could not be flushed or whatever)?The timestamps on the log messages are only as accurate as the clocks on the nodes. Lustre can run without synchronized clocks, using the client clock as the basis for the timestamps, but Lustre makes no effort to synchronize the client clocks. For that you need to use NTP (easiest) or adjust the timestamps in the logs based on messages like this.> In the first case I don''t have a chance to figure out when the server > has finished working on the RPC and is going to send out the result, > correct? > > ../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set()) > Completed RPC pname:cluuid:pid:xid:nid:opc > ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4 > > ../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request()) > Handled RPC pname:cluuid+ref:pid:xid:nid:opc > ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4 > Request procesed in 238391us (239698us total) trans 266 rc 0/0 > > > For this RPC I see in the oss log: > > 1272565278.025337: incoming x1334380768266050 > 1272565278.026649: Handling RPC x1334380768266050 > 1272565278.265039: Handled RPC x1334380768266050:Request procesed in > 238391us (239698us total) > > So, by doing the math I get: 265039-026649=238390 (just 1 off target, > might be wathever) and 265039-025337=239702 (4 off target). So I guess > the times reported by the last message are calculated by using a > different data source than the log time stamps?The log timestamps are generated at the time the message is processed by the log system. The processing times are themselves generated by accessing the clock and computing the processing before the message is printed. do_gettimeofday(&work_start); : {handle RPC} : do_gettimeofday(&work_end); timediff = cfs_timeval_sub(&work_end, &work_start, NULL); CDEBUG(D_RPCTRACE, "request x"LPU64" opc %u from %s processed in " "%ldus (%ldus total) trans "LPU64" rc %d/%d\n",> Am 29.04.2010 20:27, schrieb di.wang: >> Hello, Michael >> >> Here is a small debug log example you can use for your development. >> >> Thanks >> WangDi >> Michael Kluge wrote: >>> Hi WangDi, >>> >>> OK, thanks for input. I''ll go ahead and try to write a converter. >>> Could you please collect a set test traces that belong together from a >>> couple servers and clients and put them somewhere so that I can >>> download them? >>> >>> >>> Thanks, Michael >>> >>> Am 29.04.2010 03:25, schrieb di.wang: >>>> Hello, Michael >>>> >>>> There is a logfile parser script in the attachment, which was written by >>>> Eric. >>>> >>>> This script is very simple, but it should help you understand how we >>>> retrieve time step information from lustre debug log. On the server >>>> side, if you enable rpc_trace log, whenever the request arrive/start >>>> being processed/end of processing, there will be corresponding records >>>> being written into the debug log. Basically, you can get all the time >>>> step information from these records (actually only two of these records >>>> would be enough). >>>> >>>> >>>> a. >>>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request()) >>>> >>>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc >>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400 >>>> >>>> >>>> >>>> This record means the req will being handled, so you can get the start >>>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid >>>> (1334115493386242), client nid(12345-0 at lo) and so on. >>>> >>>> b. >>>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request()) >>>> >>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc >>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 >>>> >>>> Request procesed in 45us (77us total) trans 0 rc 0/0 >>>> >>>> This record means the req is already being handled, so you can get the >>>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid >>>> (1334115493386242), client nid(12345-0 at lo) and so no. >>>> >>>> Note: (77us total) means how long it takes from the request arriving to >>>> the end of processing. so you can also get the request arriving time >>>> stamp here by (1272313858 - 77 = 1272312781). >>>> >>>> >>>> So with these information you can draw the graph Eric mentioned in his >>>> email. If you have any questions, please let me know. >>>> >>>> Thanks >>>> WangDi >>>> >>>> >>>> Eric Barton wrote: >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> Subject: >>>>> RE: Visualising Lustre RPCs >>>>> To: >>>>> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" >>>>> <Tom.Wang at Sun.COM> >>>>> >>>>> To: >>>>> "Michael Kluge" <Michael.Kluge at tu-dresden.de>, "wangdi" >>>>> <Tom.Wang at Sun.COM> >>>>> CC: >>>>> <lustre-devel at lists.lustre.org>, "Galen M. Shipman" <gshipman at ornl.gov> >>>>> >>>>> >>>>> Michael, >>>>> >>>>> The previous Lustre RPC visualisation effort I mentioned at the LUG >>>>> used the >>>>> >>>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these >>>>> >>>>> log messages and then used the Lustre debug daemon to collect them >>>>> while >>>>> >>>>> we ran I/O tests. We then ran a simple logfile parser which used just >>>>> the log entries >>>>> >>>>> for request arrival? >>>>> >>>>> , start of processing and end of processing to graph request >>>>> >>>>> queue depth (arrival->end) and the number of requests being serviced >>>>> by type >>>>> >>>>> over time ? e.g? >>>>> >>>>> read3d >>>>> >>>>> ?which shows request queue depth (vertical) over time (axis labelled >>>>> 20-25) by >>>>> >>>>> server (axis labelled 0-80). >>>>> >>>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] >>>>> *Sent:* 17 April 2010 6:26 AM >>>>> *To:* Galen M. Shipman; Eric Barton >>>>> *Subject:* Visualising Lustre RPCs >>>>> >>>>> Hi Galen, Eric, >>>>> >>>>> in order to get this little project started, I think what I need at >>>>> first to >>>>> >>>>> write a prototype for a converter are the following things: >>>>> >>>>> A set of test traces collected on maybe a handful of clients and some >>>>> servers >>>>> >>>>> is probably a good point to start with. It would be even better if we >>>>> know >>>>> >>>>> what is in this traces so that we have an expectation what kind of >>>>> things >>>>> >>>>> we want to see on the Vampir displays. This little program that Eric >>>>> mentioned >>>>> >>>>> that can read the trace file would be very helpful as well. And as the >>>>> last >>>>> >>>>> idea I have right now, a technical contact. I might come up with a >>>>> couple >>>>> >>>>> of question after I have taken the first look onto the original trace >>>>> data >>>>> >>>>> and before I start writing code. >>>>> >>>>> Regards, Michael >>>>> >>>>> >>>>> -- >>>>> >>>>> Michael Kluge, M.Sc. >>>>> >>>>> Technische Universit?t Dresden >>>>> Center for Information Services and >>>>> High Performance Computing (ZIH) >>>>> D-01062 Dresden >>>>> Germany >>>>> >>>>> Contact: >>>>> Willersbau, Room WIL A 208 >>>>> Phone: (+49) 351 463-34217 >>>>> Fax: (+49) 351 463-37773 >>>>> e-mail: michael.kluge at tu-dresden.de >>>>> <mailto:michael.kluge at tu-dresden.de> >>>>> WWW: http://www.tu-dresden.de/zih >>> >> > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-develCheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Hi Andreas, Am 03.05.2010 15:20, schrieb Andreas Dilger:> On 2010-05-03, at 04:41, Michael Kluge wrote: >> I have a small problem understanding the logical order of the events. >> Here is one example from the logs where the client reports an RPC as >> completed before the server says that it has finished the RPC handling. >> The difference is about 1.5 ms. >> >> Is that due to the fact that the server waits until the client has >> ack''ed the result of the RPC? Or are the clocks between the servers not >> well synchronized? Or are the timestamps in the logfile sometimes not >> correct (log message could not be flushed or whatever)? > > The timestamps on the log messages are only as accurate as the clocks on> the nodes. Lustre can run without synchronized clocks, using the client > clock as the basis for the timestamps, but Lustre makes no effort to > synchronize the client clocks. For that you need to use NTP (easiest)> or adjust the timestamps in the logs based on messages like this.OK, so logically the "Completed RPC" on the client side is supposed to show up after the server has written his "Handled RPC" to its log (if clocks are snyc''ed perfectly). One more question: RPC 1334380768266400 (in the log WangDi sent me) has on the client side only a "Sending RPC" message, thus missing the "Completed RPC". The server has all three (received,start work, done work). Has this RPC vanished on the way back to the client? There is no further indication what happend. The last timestamp in the client log is: 1272565368.228628 and the server says it finished the processing of the request at: 1272565281.379471 So the client log has been recorded long enough to contain the "Completed RPC" message for this RPC if it arrived ever ... Regards, Michael>> In the first case I don''t have a chance to figure out when the server >> has finished working on the RPC and is going to send out the result, >> correct? >> >> ../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set()) >> Completed RPC pname:cluuid:pid:xid:nid:opc >> ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4 >> >> ../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request()) >> Handled RPC pname:cluuid+ref:pid:xid:nid:opc >> ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4 >> Request procesed in 238391us (239698us total) trans 266 rc 0/0 >> >> >> For this RPC I see in the oss log: >> >> 1272565278.025337: incoming x1334380768266050 >> 1272565278.026649: Handling RPC x1334380768266050 >> 1272565278.265039: Handled RPC x1334380768266050:Request procesed in >> 238391us (239698us total) >> >> So, by doing the math I get: 265039-026649=238390 (just 1 off target, >> might be wathever) and 265039-025337=239702 (4 off target). So I guess >> the times reported by the last message are calculated by using a >> different data source than the log time stamps? > > The log timestamps are generated at the time the message is processed by the log system. The processing times are themselves generated by accessing the clock and computing the processing before the message is printed. > > do_gettimeofday(&work_start); > : > {handle RPC} > : > do_gettimeofday(&work_end); > timediff = cfs_timeval_sub(&work_end,&work_start, NULL); > CDEBUG(D_RPCTRACE, "request x"LPU64" opc %u from %s processed in " > "%ldus (%ldus total) trans "LPU64" rc %d/%d\n", > > > >> Am 29.04.2010 20:27, schrieb di.wang: >>> Hello, Michael >>> >>> Here is a small debug log example you can use for your development. >>> >>> Thanks >>> WangDi >>> Michael Kluge wrote: >>>> Hi WangDi, >>>> >>>> OK, thanks for input. I''ll go ahead and try to write a converter. >>>> Could you please collect a set test traces that belong together from a >>>> couple servers and clients and put them somewhere so that I can >>>> download them? >>>> >>>> >>>> Thanks, Michael >>>> >>>> Am 29.04.2010 03:25, schrieb di.wang: >>>>> Hello, Michael >>>>> >>>>> There is a logfile parser script in the attachment, which was written by >>>>> Eric. >>>>> >>>>> This script is very simple, but it should help you understand how we >>>>> retrieve time step information from lustre debug log. On the server >>>>> side, if you enable rpc_trace log, whenever the request arrive/start >>>>> being processed/end of processing, there will be corresponding records >>>>> being written into the debug log. Basically, you can get all the time >>>>> step information from these records (actually only two of these records >>>>> would be enough). >>>>> >>>>> >>>>> a. >>>>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request()) >>>>> >>>>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc >>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400 >>>>> >>>>> >>>>> >>>>> This record means the req will being handled, so you can get the start >>>>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid >>>>> (1334115493386242), client nid(12345-0 at lo) and so on. >>>>> >>>>> b. >>>>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request()) >>>>> >>>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc >>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 >>>>> >>>>> Request procesed in 45us (77us total) trans 0 rc 0/0 >>>>> >>>>> This record means the req is already being handled, so you can get the >>>>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid >>>>> (1334115493386242), client nid(12345-0 at lo) and so no. >>>>> >>>>> Note: (77us total) means how long it takes from the request arriving to >>>>> the end of processing. so you can also get the request arriving time >>>>> stamp here by (1272313858 - 77 = 1272312781). >>>>> >>>>> >>>>> So with these information you can draw the graph Eric mentioned in his >>>>> email. If you have any questions, please let me know. >>>>> >>>>> Thanks >>>>> WangDi >>>>> >>>>> >>>>> Eric Barton wrote: >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> Subject: >>>>>> RE: Visualising Lustre RPCs >>>>>> To: >>>>>> "Michael Kluge"<Michael.Kluge at tu-dresden.de>, "wangdi" >>>>>> <Tom.Wang at Sun.COM> >>>>>> >>>>>> To: >>>>>> "Michael Kluge"<Michael.Kluge at tu-dresden.de>, "wangdi" >>>>>> <Tom.Wang at Sun.COM> >>>>>> CC: >>>>>> <lustre-devel at lists.lustre.org>, "Galen M. Shipman"<gshipman at ornl.gov> >>>>>> >>>>>> >>>>>> Michael, >>>>>> >>>>>> The previous Lustre RPC visualisation effort I mentioned at the LUG >>>>>> used the >>>>>> >>>>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these >>>>>> >>>>>> log messages and then used the Lustre debug daemon to collect them >>>>>> while >>>>>> >>>>>> we ran I/O tests. We then ran a simple logfile parser which used just >>>>>> the log entries >>>>>> >>>>>> for request arrival? >>>>>> >>>>>> , start of processing and end of processing to graph request >>>>>> >>>>>> queue depth (arrival->end) and the number of requests being serviced >>>>>> by type >>>>>> >>>>>> over time ? e.g? >>>>>> >>>>>> read3d >>>>>> >>>>>> ?which shows request queue depth (vertical) over time (axis labelled >>>>>> 20-25) by >>>>>> >>>>>> server (axis labelled 0-80). >>>>>> >>>>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] >>>>>> *Sent:* 17 April 2010 6:26 AM >>>>>> *To:* Galen M. Shipman; Eric Barton >>>>>> *Subject:* Visualising Lustre RPCs >>>>>> >>>>>> Hi Galen, Eric, >>>>>> >>>>>> in order to get this little project started, I think what I need at >>>>>> first to >>>>>> >>>>>> write a prototype for a converter are the following things: >>>>>> >>>>>> A set of test traces collected on maybe a handful of clients and some >>>>>> servers >>>>>> >>>>>> is probably a good point to start with. It would be even better if we >>>>>> know >>>>>> >>>>>> what is in this traces so that we have an expectation what kind of >>>>>> things >>>>>> >>>>>> we want to see on the Vampir displays. This little program that Eric >>>>>> mentioned >>>>>> >>>>>> that can read the trace file would be very helpful as well. And as the >>>>>> last >>>>>> >>>>>> idea I have right now, a technical contact. I might come up with a >>>>>> couple >>>>>> >>>>>> of question after I have taken the first look onto the original trace >>>>>> data >>>>>> >>>>>> and before I start writing code. >>>>>> >>>>>> Regards, Michael >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Michael Kluge, M.Sc. >>>>>> >>>>>> Technische Universit?t Dresden >>>>>> Center for Information Services and >>>>>> High Performance Computing (ZIH) >>>>>> D-01062 Dresden >>>>>> Germany >>>>>> >>>>>> Contact: >>>>>> Willersbau, Room WIL A 208 >>>>>> Phone: (+49) 351 463-34217 >>>>>> Fax: (+49) 351 463-37773 >>>>>> e-mail: michael.kluge at tu-dresden.de >>>>>> <mailto:michael.kluge at tu-dresden.de> >>>>>> WWW: http://www.tu-dresden.de/zih
Hi Michael, On May 3, 2010, at 11:10 , Michael Kluge wrote:> Hi Andreas, > > Am 03.05.2010 15:20, schrieb Andreas Dilger: >> On 2010-05-03, at 04:41, Michael Kluge wrote: >>> I have a small problem understanding the logical order of the events. >>> Here is one example from the logs where the client reports an RPC as >>> completed before the server says that it has finished the RPC handling. >>> The difference is about 1.5 ms. >>> >>> Is that due to the fact that the server waits until the client has >>> ack''ed the result of the RPC? Or are the clocks between the servers not >>> well synchronized? Or are the timestamps in the logfile sometimes not >>> correct (log message could not be flushed or whatever)? >> >> The timestamps on the log messages are only as accurate as the clocks on >> the nodes. Lustre can run without synchronized clocks, using the client >> clock as the basis for the timestamps, but Lustre makes no effort to >> synchronize the client clocks. For that you need to use NTP (easiest) >> or adjust the timestamps in the logs based on messages like this. > > OK, so logically the "Completed RPC" on the client side is supposed to > show up after the server has written his "Handled RPC" to its log (if > clocks are snyc''ed perfectly). > > One more question: RPC 1334380768266400 (in the log WangDi sent me) has > on the client side only a "Sending RPC" message, thus missing the > "Completed RPC". The server has all three (received,start work, done > work). Has this RPC vanished on the way back to the client? There is no > further indication what happend. The last timestamp in the client log is: > 1272565368.228628 > and the server says it finished the processing of the request at: > 1272565281.379471 > So the client log has been recorded long enough to contain the > "Completed RPC" message for this RPC if it arrived ever ...If the reply was never received then you should expect to see a timeout for that rpc in the log. It''s also possible, though I''m not sure how likely in this case, that the trace buffer on the cpu that processed the completion has overflowed. You can check if this is possible by finding the earliest message for each cpu and see if any of them of them is later than the expected time for this message. cheers, robert> > > Regards, Michael > > >>> In the first case I don''t have a chance to figure out when the server >>> has finished working on the RPC and is going to send out the result, >>> correct? >>> >>> ../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set()) >>> Completed RPC pname:cluuid:pid:xid:nid:opc >>> ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4 >>> >>> ../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request()) >>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc >>> ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4 >>> Request procesed in 238391us (239698us total) trans 266 rc 0/0 >>> >>> >>> For this RPC I see in the oss log: >>> >>> 1272565278.025337: incoming x1334380768266050 >>> 1272565278.026649: Handling RPC x1334380768266050 >>> 1272565278.265039: Handled RPC x1334380768266050:Request procesed in >>> 238391us (239698us total) >>> >>> So, by doing the math I get: 265039-026649=238390 (just 1 off target, >>> might be wathever) and 265039-025337=239702 (4 off target). So I guess >>> the times reported by the last message are calculated by using a >>> different data source than the log time stamps? >> >> The log timestamps are generated at the time the message is processed by the log system. The processing times are themselves generated by accessing the clock and computing the processing before the message is printed. >> >> do_gettimeofday(&work_start); >> : >> {handle RPC} >> : >> do_gettimeofday(&work_end); >> timediff = cfs_timeval_sub(&work_end,&work_start, NULL); >> CDEBUG(D_RPCTRACE, "request x"LPU64" opc %u from %s processed in " >> "%ldus (%ldus total) trans "LPU64" rc %d/%d\n", >> >> >> >>> Am 29.04.2010 20:27, schrieb di.wang: >>>> Hello, Michael >>>> >>>> Here is a small debug log example you can use for your development. >>>> >>>> Thanks >>>> WangDi >>>> Michael Kluge wrote: >>>>> Hi WangDi, >>>>> >>>>> OK, thanks for input. I''ll go ahead and try to write a converter. >>>>> Could you please collect a set test traces that belong together from a >>>>> couple servers and clients and put them somewhere so that I can >>>>> download them? >>>>> >>>>> >>>>> Thanks, Michael >>>>> >>>>> Am 29.04.2010 03:25, schrieb di.wang: >>>>>> Hello, Michael >>>>>> >>>>>> There is a logfile parser script in the attachment, which was written by >>>>>> Eric. >>>>>> >>>>>> This script is very simple, but it should help you understand how we >>>>>> retrieve time step information from lustre debug log. On the server >>>>>> side, if you enable rpc_trace log, whenever the request arrive/start >>>>>> being processed/end of processing, there will be corresponding records >>>>>> being written into the debug log. Basically, you can get all the time >>>>>> step information from these records (actually only two of these records >>>>>> would be enough). >>>>>> >>>>>> >>>>>> a. >>>>>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request()) >>>>>> >>>>>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc >>>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400 >>>>>> >>>>>> >>>>>> >>>>>> This record means the req will being handled, so you can get the start >>>>>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid >>>>>> (1334115493386242), client nid(12345-0 at lo) and so on. >>>>>> >>>>>> b. >>>>>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request()) >>>>>> >>>>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc >>>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 >>>>>> >>>>>> Request procesed in 45us (77us total) trans 0 rc 0/0 >>>>>> >>>>>> This record means the req is already being handled, so you can get the >>>>>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid >>>>>> (1334115493386242), client nid(12345-0 at lo) and so no. >>>>>> >>>>>> Note: (77us total) means how long it takes from the request arriving to >>>>>> the end of processing. so you can also get the request arriving time >>>>>> stamp here by (1272313858 - 77 = 1272312781). >>>>>> >>>>>> >>>>>> So with these information you can draw the graph Eric mentioned in his >>>>>> email. If you have any questions, please let me know. >>>>>> >>>>>> Thanks >>>>>> WangDi >>>>>> >>>>>> >>>>>> Eric Barton wrote: >>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> Subject: >>>>>>> RE: Visualising Lustre RPCs >>>>>>> To: >>>>>>> "Michael Kluge"<Michael.Kluge at tu-dresden.de>, "wangdi" >>>>>>> <Tom.Wang at Sun.COM> >>>>>>> >>>>>>> To: >>>>>>> "Michael Kluge"<Michael.Kluge at tu-dresden.de>, "wangdi" >>>>>>> <Tom.Wang at Sun.COM> >>>>>>> CC: >>>>>>> <lustre-devel at lists.lustre.org>, "Galen M. Shipman"<gshipman at ornl.gov> >>>>>>> >>>>>>> >>>>>>> Michael, >>>>>>> >>>>>>> The previous Lustre RPC visualisation effort I mentioned at the LUG >>>>>>> used the >>>>>>> >>>>>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these >>>>>>> >>>>>>> log messages and then used the Lustre debug daemon to collect them >>>>>>> while >>>>>>> >>>>>>> we ran I/O tests. We then ran a simple logfile parser which used just >>>>>>> the log entries >>>>>>> >>>>>>> for request arrival? >>>>>>> >>>>>>> , start of processing and end of processing to graph request >>>>>>> >>>>>>> queue depth (arrival->end) and the number of requests being serviced >>>>>>> by type >>>>>>> >>>>>>> over time ? e.g? >>>>>>> >>>>>>> read3d >>>>>>> >>>>>>> ?which shows request queue depth (vertical) over time (axis labelled >>>>>>> 20-25) by >>>>>>> >>>>>>> server (axis labelled 0-80). >>>>>>> >>>>>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] >>>>>>> *Sent:* 17 April 2010 6:26 AM >>>>>>> *To:* Galen M. Shipman; Eric Barton >>>>>>> *Subject:* Visualising Lustre RPCs >>>>>>> >>>>>>> Hi Galen, Eric, >>>>>>> >>>>>>> in order to get this little project started, I think what I need at >>>>>>> first to >>>>>>> >>>>>>> write a prototype for a converter are the following things: >>>>>>> >>>>>>> A set of test traces collected on maybe a handful of clients and some >>>>>>> servers >>>>>>> >>>>>>> is probably a good point to start with. It would be even better if we >>>>>>> know >>>>>>> >>>>>>> what is in this traces so that we have an expectation what kind of >>>>>>> things >>>>>>> >>>>>>> we want to see on the Vampir displays. This little program that Eric >>>>>>> mentioned >>>>>>> >>>>>>> that can read the trace file would be very helpful as well. And as the >>>>>>> last >>>>>>> >>>>>>> idea I have right now, a technical contact. I might come up with a >>>>>>> couple >>>>>>> >>>>>>> of question after I have taken the first look onto the original trace >>>>>>> data >>>>>>> >>>>>>> and before I start writing code. >>>>>>> >>>>>>> Regards, Michael >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Michael Kluge, M.Sc. >>>>>>> >>>>>>> Technische Universit?t Dresden >>>>>>> Center for Information Services and >>>>>>> High Performance Computing (ZIH) >>>>>>> D-01062 Dresden >>>>>>> Germany >>>>>>> >>>>>>> Contact: >>>>>>> Willersbau, Room WIL A 208 >>>>>>> Phone: (+49) 351 463-34217 >>>>>>> Fax: (+49) 351 463-37773 >>>>>>> e-mail: michael.kluge at tu-dresden.de >>>>>>> <mailto:michael.kluge at tu-dresden.de> >>>>>>> WWW: http://www.tu-dresden.de/zih > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel
Michael Kluge wrote:> Hi Andreas, > > Am 03.05.2010 15:20, schrieb Andreas Dilger: >> On 2010-05-03, at 04:41, Michael Kluge wrote: >>> I have a small problem understanding the logical order of the events. >>> Here is one example from the logs where the client reports an RPC as >>> completed before the server says that it has finished the RPC handling. >>> The difference is about 1.5 ms. >>> >>> Is that due to the fact that the server waits until the client has >>> ack''ed the result of the RPC? Or are the clocks between the servers not >>> well synchronized? Or are the timestamps in the logfile sometimes not >>> correct (log message could not be flushed or whatever)? >> >> The timestamps on the log messages are only as accurate as the clocks on > > the nodes. Lustre can run without synchronized clocks, using the > client > > clock as the basis for the timestamps, but Lustre makes no effort to > > synchronize the client clocks. For that you need to use NTP (easiest) >> or adjust the timestamps in the logs based on messages like this. > > OK, so logically the "Completed RPC" on the client side is supposed to > show up after the server has written his "Handled RPC" to its log (if > clocks are snyc''ed perfectly).Yes.> > One more question: RPC 1334380768266400 (in the log WangDi sent me) > has on the client side only a "Sending RPC" message, thus missing the > "Completed RPC". The server has all three (received,start work, done > work). Has this RPC vanished on the way back to the client? There is > no further indication what happend. The last timestamp in the client > log is: > 1272565368.228628 > and the server says it finished the processing of the request at: > 1272565281.379471 > So the client log has been recorded long enough to contain the > "Completed RPC" message for this RPC if it arrived ever ...Logically, yes. But in some cases, some debug logs might be abandoned for some reasons(actually, it happens not rarely), and probably you need maintain an average time from server "Handled RPC" to client "Completed RPC", then you just guess the client "Completed RPC" time in this case. Thanks WangDi> > > Regards, Michael >
Hi WangDi,>> OK, so logically the "Completed RPC" on the client side is supposed to >> show up after the server has written his "Handled RPC" to its log (if >> clocks are snyc''ed perfectly). > Yes.Thanks.>> One more question: RPC 1334380768266400 (in the log WangDi sent me) >> has on the client side only a "Sending RPC" message, thus missing the >> "Completed RPC". The server has all three (received,start work, done >> work). Has this RPC vanished on the way back to the client? There is >> no further indication what happend. The last timestamp in the client >> log is: >> 1272565368.228628 >> and the server says it finished the processing of the request at: >> 1272565281.379471 >> So the client log has been recorded long enough to contain the >> "Completed RPC" message for this RPC if it arrived ever ... > Logically, yes. But in some cases, some debug logs might be abandoned > for some reasons(actually, it happens not rarely), and probably you need > maintain an average time from server "Handled RPC" to client "Completed > RPC", then you just guess the client "Completed RPC" time in this case.Oh my gosh ;) I don''t want to start speculations about the helpfulness of incomplete debug logs. Anyway, what can get lost? Any kind of message on the servers and clients? I think I''d like to know what cases have to be handled while I try to track individual RPC''s on their way. Regards, Michael
Michael Kluge wrote:>>> One more question: RPC 1334380768266400 (in the log WangDi sent me) >>> has on the client side only a "Sending RPC" message, thus missing the >>> "Completed RPC". The server has all three (received,start work, done >>> work). Has this RPC vanished on the way back to the client? There is >>> no further indication what happend. The last timestamp in the client >>> log is: >>> 1272565368.228628 >>> and the server says it finished the processing of the request at: >>> 1272565281.379471 >>> So the client log has been recorded long enough to contain the >>> "Completed RPC" message for this RPC if it arrived ever ... >>> >> Logically, yes. But in some cases, some debug logs might be abandoned >> for some reasons(actually, it happens not rarely), and probably you need >> maintain an average time from server "Handled RPC" to client "Completed >> RPC", then you just guess the client "Completed RPC" time in this case. >> > > Oh my gosh ;) I don''t want to start speculations about the helpfulness > of incomplete debug logs. Anyway, what can get lost? Any kind of message > on the servers and clients? I think I''d like to know what cases have to > be handled while I try to track individual RPC''s on their way. >Any records can get lost here. Unfortunately, there are not any messages indicate the missing happened. :( (Usually, I would check the time stamp in the log, i.e. no records for a "long" time, for example several seconds, but this is not the accurate way). I guess you can just ignore these uncompleted records in your first step? Let''s see how these incomplete log will impact the profiling result, then we will decide how to deal with this? Thanks Wangdi> > Regards, Michael > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
OK, sounds reasonable. I can put the number of ignored messages in an additional counter. That should make the generated trace file more descriptive. Regards, Michael Am 03.05.2010 21:52, schrieb di.wang:>> Anyway, what can get lost? Any kind of >> message on the servers and clients? I think I''d like to know what >> cases have to be handled while I try to track individual RPC''s on >> their way. > Any records can get lost here. Unfortunately, there are not any messages > indicate the missing happened. :( > (Usually, I would check the time stamp in the log, i.e. no records for a > "long" time, for example several seconds, but this is not the accurate > way). > > I guess you can just ignore these uncompleted records in your first > step? Let''s see how these incomplete log will > impact the profiling result, then we will decide how to deal with this? >
Hi WangDi, the first version works. Screenshot is attached. I have a couple of counter realized: RPC''s in flight and RPC''s completed in total on the client, RPC''s enqueued, RPC''s in processing and RPC''c completed in total on the server. All these counter can be broken down by the type of RPC (op code). The picture has not yet the lines that show each single RPC, I still have to do counter like "avg. time to complete an RPC over the last second" and there are some more TODO''s. Like the timer synchronization. (In the screenshot the first and the last counter show total values while the one in the middle shows a rate.) What I like to have is a complete set of traces from a small cluster (<100 nodes) including the servers. Would that be possible? Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be there and like to talk about what would be useful for the next steps. Regards, Michael Am 03.05.2010 21:52, schrieb di.wang:> Michael Kluge wrote: >>>> One more question: RPC 1334380768266400 (in the log WangDi sent me) >>>> has on the client side only a "Sending RPC" message, thus missing the >>>> "Completed RPC". The server has all three (received,start work, done >>>> work). Has this RPC vanished on the way back to the client? There is >>>> no further indication what happend. The last timestamp in the client >>>> log is: >>>> 1272565368.228628 >>>> and the server says it finished the processing of the request at: >>>> 1272565281.379471 >>>> So the client log has been recorded long enough to contain the >>>> "Completed RPC" message for this RPC if it arrived ever ... >>> Logically, yes. But in some cases, some debug logs might be abandoned >>> for some reasons(actually, it happens not rarely), and probably you need >>> maintain an average time from server "Handled RPC" to client "Completed >>> RPC", then you just guess the client "Completed RPC" time in this case. >> >> Oh my gosh ;) I don''t want to start speculations about the helpfulness >> of incomplete debug logs. Anyway, what can get lost? Any kind of >> message on the servers and clients? I think I''d like to know what >> cases have to be handled while I try to track individual RPC''s on >> their way. > Any records can get lost here. Unfortunately, there are not any messages > indicate the missing happened. :( > (Usually, I would check the time stamp in the log, i.e. no records for a > "long" time, for example several seconds, but this is not the accurate > way). > > I guess you can just ignore these uncompleted records in your first > step? Let''s see how these incomplete log will > impact the profiling result, then we will decide how to deal with this? > > Thanks > Wangdi >> >> Regards, Michael >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: Vampir - [Trace View - lustre_rpcs.otf].png Type: image/png Size: 97181 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100516/f4cd3a83/attachment-0001.png
Excellent :) How do you think measurements taken from 1000 servers with 100,000 clients can be visualised? We''ve used heat maps to visualise 10s-100s of concurrent measurements (y) over time (x) but I wonder if that will scale. Does vampire support heat maps? Cheers, Eric> -----Original Message----- > From: Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] > Sent: 16 May 2010 10:30 AM > To: di.wang > Cc: Eric Barton; Andreas Dilger; Robert Read; Galen M. Shipman; lustre-devel > Subject: Re: [Lustre-devel] Lustre RPC visualization > > Hi WangDi, > > the first version works. Screenshot is attached. I have a couple of > counter realized: RPC''s in flight and RPC''s completed in total on the > client, RPC''s enqueued, RPC''s in processing and RPC''c completed in total > on the server. All these counter can be broken down by the type of RPC > (op code). The picture has not yet the lines that show each single RPC, > I still have to do counter like "avg. time to complete an RPC over the > last second" and there are some more TODO''s. Like the timer > synchronization. (In the screenshot the first and the last counter show > total values while the one in the middle shows a rate.) > > What I like to have is a complete set of traces from a small cluster > (<100 nodes) including the servers. Would that be possible? > > Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be there and > like to talk about what would be useful for the next steps. > > > Regards, Michael > > Am 03.05.2010 21:52, schrieb di.wang: > > Michael Kluge wrote: > >>>> One more question: RPC 1334380768266400 (in the log WangDi sent me) > >>>> has on the client side only a "Sending RPC" message, thus missing the > >>>> "Completed RPC". The server has all three (received,start work, done > >>>> work). Has this RPC vanished on the way back to the client? There is > >>>> no further indication what happend. The last timestamp in the client > >>>> log is: > >>>> 1272565368.228628 > >>>> and the server says it finished the processing of the request at: > >>>> 1272565281.379471 > >>>> So the client log has been recorded long enough to contain the > >>>> "Completed RPC" message for this RPC if it arrived ever ... > >>> Logically, yes. But in some cases, some debug logs might be abandoned > >>> for some reasons(actually, it happens not rarely), and probably you need > >>> maintain an average time from server "Handled RPC" to client "Completed > >>> RPC", then you just guess the client "Completed RPC" time in this case. > >> > >> Oh my gosh ;) I don''t want to start speculations about the helpfulness > >> of incomplete debug logs. Anyway, what can get lost? Any kind of > >> message on the servers and clients? I think I''d like to know what > >> cases have to be handled while I try to track individual RPC''s on > >> their way. > > Any records can get lost here. Unfortunately, there are not any messages > > indicate the missing happened. :( > > (Usually, I would check the time stamp in the log, i.e. no records for a > > "long" time, for example several seconds, but this is not the accurate > > way). > > > > I guess you can just ignore these uncompleted records in your first > > step? Let''s see how these incomplete log will > > impact the profiling result, then we will decide how to deal with this? > > > > Thanks > > Wangdi > >> > >> Regards, Michael > >> _______________________________________________ > >> Lustre-devel mailing list > >> Lustre-devel at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > > > > > > -- > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih
I think this work is very interesting. Will anyone be at CUG 2010 next week to discuss? Cheers, Andrew 2010/5/16 Michael Kluge <Michael.Kluge at tu-dresden.de>> Hi WangDi, > > the first version works. Screenshot is attached. I have a couple of counter > realized: RPC''s in flight and RPC''s completed in total on the client, RPC''s > enqueued, RPC''s in processing and RPC''c completed in total on the server. > All these counter can be broken down by the type of RPC (op code). The > picture has not yet the lines that show each single RPC, I still have to do > counter like "avg. time to complete an RPC over the last second" and there > are some more TODO''s. Like the timer synchronization. (In the screenshot the > first and the last counter show total values while the one in the middle > shows a rate.) > > What I like to have is a complete set of traces from a small cluster (<100 > nodes) including the servers. Would that be possible? > > Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be there and > like to talk about what would be useful for the next steps. > > > > Regards, Michael > > Am 03.05.2010 21:52, schrieb di.wang: > >> Michael Kluge wrote: >> >> One more question: RPC 1334380768266400 (in the log WangDi sent me) >>>>> has on the client side only a "Sending RPC" message, thus missing the >>>>> "Completed RPC". The server has all three (received,start work, done >>>>> work). Has this RPC vanished on the way back to the client? There is >>>>> no further indication what happend. The last timestamp in the client >>>>> log is: >>>>> 1272565368.228628 >>>>> and the server says it finished the processing of the request at: >>>>> 1272565281.379471 >>>>> So the client log has been recorded long enough to contain the >>>>> "Completed RPC" message for this RPC if it arrived ever ... >>>>> >>>> Logically, yes. But in some cases, some debug logs might be abandoned >>>> for some reasons(actually, it happens not rarely), and probably you need >>>> maintain an average time from server "Handled RPC" to client "Completed >>>> RPC", then you just guess the client "Completed RPC" time in this case. >>>> >>> >>> Oh my gosh ;) I don''t want to start speculations about the helpfulness >>> of incomplete debug logs. Anyway, what can get lost? Any kind of >>> message on the servers and clients? I think I''d like to know what >>> cases have to be handled while I try to track individual RPC''s on >>> their way. >>> >> Any records can get lost here. Unfortunately, there are not any messages >> indicate the missing happened. :( >> (Usually, I would check the time stamp in the log, i.e. no records for a >> "long" time, for example several seconds, but this is not the accurate >> way). >> >> I guess you can just ignore these uncompleted records in your first >> step? Let''s see how these incomplete log will >> impact the profiling result, then we will decide how to deal with this? >> >> Thanks >> Wangdi >> >>> >>> Regards, Michael >>> _______________________________________________ >>> Lustre-devel mailing list >>> Lustre-devel at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >> >> >> > > -- > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100516/b1d7074f/attachment.html
I would be very interested in this, particularly at scale. We should look into collecting some large scale traces on Jaguar for experimentation of various visualization techniques. Thanks, Galen On May 16, 2010, at 9:12 AM, Eric Barton wrote: Excellent :) How do you think measurements taken from 1000 servers with 100,000 clients can be visualised? We''ve used heat maps to visualise 10s-100s of concurrent measurements (y) over time (x) but I wonder if that will scale. Does vampire support heat maps? Cheers, Eric -----Original Message----- From: Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] Sent: 16 May 2010 10:30 AM To: di.wang Cc: Eric Barton; Andreas Dilger; Robert Read; Galen M. Shipman; lustre-devel Subject: Re: [Lustre-devel] Lustre RPC visualization Hi WangDi, the first version works. Screenshot is attached. I have a couple of counter realized: RPC''s in flight and RPC''s completed in total on the client, RPC''s enqueued, RPC''s in processing and RPC''c completed in total on the server. All these counter can be broken down by the type of RPC (op code). The picture has not yet the lines that show each single RPC, I still have to do counter like "avg. time to complete an RPC over the last second" and there are some more TODO''s. Like the timer synchronization. (In the screenshot the first and the last counter show total values while the one in the middle shows a rate.) What I like to have is a complete set of traces from a small cluster (<100 nodes) including the servers. Would that be possible? Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be there and like to talk about what would be useful for the next steps. Regards, Michael Am 03.05.2010 21:52, schrieb di.wang: Michael Kluge wrote: One more question: RPC 1334380768266400 (in the log WangDi sent me) has on the client side only a "Sending RPC" message, thus missing the "Completed RPC". The server has all three (received,start work, done work). Has this RPC vanished on the way back to the client? There is no further indication what happend. The last timestamp in the client log is: 1272565368.228628 and the server says it finished the processing of the request at: 1272565281.379471 So the client log has been recorded long enough to contain the "Completed RPC" message for this RPC if it arrived ever ... Logically, yes. But in some cases, some debug logs might be abandoned for some reasons(actually, it happens not rarely), and probably you need maintain an average time from server "Handled RPC" to client "Completed RPC", then you just guess the client "Completed RPC" time in this case. Oh my gosh ;) I don''t want to start speculations about the helpfulness of incomplete debug logs. Anyway, what can get lost? Any kind of message on the servers and clients? I think I''d like to know what cases have to be handled while I try to track individual RPC''s on their way. Any records can get lost here. Unfortunately, there are not any messages indicate the missing happened. :( (Usually, I would check the time stamp in the log, i.e. no records for a "long" time, for example several seconds, but this is not the accurate way). I guess you can just ignore these uncompleted records in your first step? Let''s see how these incomplete log will impact the profiling result, then we will decide how to deal with this? Thanks Wangdi Regards, Michael _______________________________________________ Lustre-devel mailing list Lustre-devel at lists.lustre.org<mailto:Lustre-devel at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-devel -- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de<mailto:michael.kluge at tu-dresden.de> WWW: http://www.tu-dresden.de/zih Galen M. Shipman Group Leader - Technology Integration National Center for Computational Sciences Oak Ridge National Laboratory Office: 865.576.2672 Cell: 865.307.1209 Email: gshipman at ornl.gov<mailto:gshipman at ornl.gov>
Eric, Vampir does scale up to this point. Almost, if I remember well, but thing might have changed recently). If it does not, well'' make it ;) We have visualized real trace files with up to 20.000 processes and artificial trace files with many more. Heat maps can be emulated. Once we have had something like a color-coded counter timeline that procduces a heat map. Right now it can be done by on the timeline by placing artificial enter/leave events for function called "5%", "10%", and so on. I have done this already in the past, so it should not be too difficult. Michael Am 16.05.2010 15:12, schrieb Eric Barton:> Excellent :) > > How do you think measurements taken from 1000 servers with 100,000 > clients can be visualised? We''ve used heat maps to visualise > 10s-100s of concurrent measurements (y) over time (x) but I wonder > if that will scale. Does vampire support heat maps? > > Cheers, > Eric > >> -----Original Message----- >> From: Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] >> Sent: 16 May 2010 10:30 AM >> To: di.wang >> Cc: Eric Barton; Andreas Dilger; Robert Read; Galen M. Shipman; lustre-devel >> Subject: Re: [Lustre-devel] Lustre RPC visualization >> >> Hi WangDi, >> >> the first version works. Screenshot is attached. I have a couple of >> counter realized: RPC''s in flight and RPC''s completed in total on the >> client, RPC''s enqueued, RPC''s in processing and RPC''c completed in total >> on the server. All these counter can be broken down by the type of RPC >> (op code). The picture has not yet the lines that show each single RPC, >> I still have to do counter like "avg. time to complete an RPC over the >> last second" and there are some more TODO''s. Like the timer >> synchronization. (In the screenshot the first and the last counter show >> total values while the one in the middle shows a rate.) >> >> What I like to have is a complete set of traces from a small cluster >> (<100 nodes) including the servers. Would that be possible? >> >> Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be there and >> like to talk about what would be useful for the next steps. >> >> >> Regards, Michael >> >> Am 03.05.2010 21:52, schrieb di.wang: >>> Michael Kluge wrote: >>>>>> One more question: RPC 1334380768266400 (in the log WangDi sent me) >>>>>> has on the client side only a "Sending RPC" message, thus missing the >>>>>> "Completed RPC". The server has all three (received,start work, done >>>>>> work). Has this RPC vanished on the way back to the client? There is >>>>>> no further indication what happend. The last timestamp in the client >>>>>> log is: >>>>>> 1272565368.228628 >>>>>> and the server says it finished the processing of the request at: >>>>>> 1272565281.379471 >>>>>> So the client log has been recorded long enough to contain the >>>>>> "Completed RPC" message for this RPC if it arrived ever ... >>>>> Logically, yes. But in some cases, some debug logs might be abandoned >>>>> for some reasons(actually, it happens not rarely), and probably you need >>>>> maintain an average time from server "Handled RPC" to client "Completed >>>>> RPC", then you just guess the client "Completed RPC" time in this case. >>>> >>>> Oh my gosh ;) I don''t want to start speculations about the helpfulness >>>> of incomplete debug logs. Anyway, what can get lost? Any kind of >>>> message on the servers and clients? I think I''d like to know what >>>> cases have to be handled while I try to track individual RPC''s on >>>> their way. >>> Any records can get lost here. Unfortunately, there are not any messages >>> indicate the missing happened. :( >>> (Usually, I would check the time stamp in the log, i.e. no records for a >>> "long" time, for example several seconds, but this is not the accurate >>> way). >>> >>> I guess you can just ignore these uncompleted records in your first >>> step? Let''s see how these incomplete log will >>> impact the profiling result, then we will decide how to deal with this? >>> >>> Thanks >>> Wangdi >>>> >>>> Regards, Michael >>>> _______________________________________________ >>>> Lustre-devel mailing list >>>> Lustre-devel at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >>> >> >> >> -- >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room WIL A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de >> WWW: http://www.tu-dresden.de/zih > >
Hi Andrew, unfortunately no. We don''t own a Cray :( Regards, Michael Am Sonntag, den 16.05.2010, 20:24 -0700 schrieb Andrew Uselton:> I think this work is very interesting. Will anyone be at CUG 2010 > next week to discuss? > Cheers, > Andrew > > > 2010/5/16 Michael Kluge <Michael.Kluge at tu-dresden.de> > Hi WangDi, > > the first version works. Screenshot is attached. I have a > couple of counter realized: RPC''s in flight and RPC''s > completed in total on the client, RPC''s enqueued, RPC''s in > processing and RPC''c completed in total on the server. All > these counter can be broken down by the type of RPC (op code). > The picture has not yet the lines that show each single RPC, I > still have to do counter like "avg. time to complete an RPC > over the last second" and there are some more TODO''s. Like the > timer synchronization. (In the screenshot the first and the > last counter show total values while the one in the middle > shows a rate.) > > What I like to have is a complete set of traces from a small > cluster (<100 nodes) including the servers. Would that be > possible? > > Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be > there and like to talk about what would be useful for the next > steps. > > > > Regards, Michael > > Am 03.05.2010 21:52, schrieb di.wang: > > Michael Kluge wrote: > > > One more question: RPC > 1334380768266400 (in the log > WangDi sent me) > has on the client side only a > "Sending RPC" message, thus > missing the > "Completed RPC". The server > has all three (received,start > work, done > work). Has this RPC vanished > on the way back to the client? > There is > no further indication what > happend. The last timestamp in > the client > log is: > 1272565368.228628 > and the server says it > finished the processing of the > request at: > 1272565281.379471 > So the client log has been > recorded long enough to > contain the > "Completed RPC" message for > this RPC if it arrived > ever ... > Logically, yes. But in some cases, > some debug logs might be abandoned > for some reasons(actually, it happens > not rarely), and probably you need > maintain an average time from server > "Handled RPC" to client "Completed > RPC", then you just guess the client > "Completed RPC" time in this case. > > Oh my gosh ;) I don''t want to start > speculations about the helpfulness > of incomplete debug logs. Anyway, what can get > lost? Any kind of > message on the servers and clients? I think > I''d like to know what > cases have to be handled while I try to track > individual RPC''s on > their way. > Any records can get lost here. Unfortunately, there > are not any messages > indicate the missing happened. :( > (Usually, I would check the time stamp in the log, > i.e. no records for a > "long" time, for example several seconds, but this is > not the accurate > way). > > I guess you can just ignore these uncompleted records > in your first > step? Let''s see how these incomplete log will > impact the profiling result, then we will decide how > to deal with this? > > Thanks > Wangdi > > Regards, Michael > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel > > > > > > -- > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > > > WWW: http://www.tu-dresden.de/zih > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel > > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5997 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100517/92de1f31/attachment-0001.bin
Hi Galen, that''s why we started this :) I guess I need some more input data to be able to play with the visualization. Vampir also supports markers that can point to specific (process,timestamp) combinations. If we know that some kind of condition happened while we collected the trace, it is probably possible to detect this in the trace converter and to put a mark into the trace file. That would help finding "interesting points" faster. Regards, Michael Am Sonntag, den 16.05.2010, 23:39 -0400 schrieb Shipman, Galen M.:> I would be very interested in this, particularly at scale. We should look into collecting some large scale traces on Jaguar for experimentation of various visualization techniques. > > Thanks, > > Galen > > > On May 16, 2010, at 9:12 AM, Eric Barton wrote: > > Excellent :) > > How do you think measurements taken from 1000 servers with 100,000 > clients can be visualised? We''ve used heat maps to visualise > 10s-100s of concurrent measurements (y) over time (x) but I wonder > if that will scale. Does vampire support heat maps? > > Cheers, > Eric > > -----Original Message----- > From: Michael Kluge [mailto:Michael.Kluge at tu-dresden.de] > Sent: 16 May 2010 10:30 AM > To: di.wang > Cc: Eric Barton; Andreas Dilger; Robert Read; Galen M. Shipman; lustre-devel > Subject: Re: [Lustre-devel] Lustre RPC visualization > > Hi WangDi, > > the first version works. Screenshot is attached. I have a couple of > counter realized: RPC''s in flight and RPC''s completed in total on the > client, RPC''s enqueued, RPC''s in processing and RPC''c completed in total > on the server. All these counter can be broken down by the type of RPC > (op code). The picture has not yet the lines that show each single RPC, > I still have to do counter like "avg. time to complete an RPC over the > last second" and there are some more TODO''s. Like the timer > synchronization. (In the screenshot the first and the last counter show > total values while the one in the middle shows a rate.) > > What I like to have is a complete set of traces from a small cluster > (<100 nodes) including the servers. Would that be possible? > > Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be there and > like to talk about what would be useful for the next steps. > > > Regards, Michael > > Am 03.05.2010 21:52, schrieb di.wang: > Michael Kluge wrote: > One more question: RPC 1334380768266400 (in the log WangDi sent me) > has on the client side only a "Sending RPC" message, thus missing the > "Completed RPC". The server has all three (received,start work, done > work). Has this RPC vanished on the way back to the client? There is > no further indication what happend. The last timestamp in the client > log is: > 1272565368.228628 > and the server says it finished the processing of the request at: > 1272565281.379471 > So the client log has been recorded long enough to contain the > "Completed RPC" message for this RPC if it arrived ever ... > Logically, yes. But in some cases, some debug logs might be abandoned > for some reasons(actually, it happens not rarely), and probably you need > maintain an average time from server "Handled RPC" to client "Completed > RPC", then you just guess the client "Completed RPC" time in this case. > > Oh my gosh ;) I don''t want to start speculations about the helpfulness > of incomplete debug logs. Anyway, what can get lost? Any kind of > message on the servers and clients? I think I''d like to know what > cases have to be handled while I try to track individual RPC''s on > their way. > Any records can get lost here. Unfortunately, there are not any messages > indicate the missing happened. :( > (Usually, I would check the time stamp in the log, i.e. no records for a > "long" time, for example several seconds, but this is not the accurate > way). > > I guess you can just ignore these uncompleted records in your first > step? Let''s see how these incomplete log will > impact the profiling result, then we will decide how to deal with this? > > Thanks > Wangdi > > Regards, Michael > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org<mailto:Lustre-devel at lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-devel > > > > > -- > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de<mailto:michael.kluge at tu-dresden.de> > WWW: http://www.tu-dresden.de/zih > > > Galen M. Shipman > Group Leader - Technology Integration > National Center for Computational Sciences > Oak Ridge National Laboratory > Office: 865.576.2672 > Cell: 865.307.1209 > Email: gshipman at ornl.gov<mailto:gshipman at ornl.gov> > > > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5997 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100517/c15ea9a9/attachment.bin
Hi WangDi, so, for the moment I am done with what I promised. The work to be done is mainly debugging with more input data sets. Screenshot of Vampir showing the derived counter values for the RPC processing/queue times on the server and the client is attached. Units for the values are either microseconds or just a number. Regards, Michael Am Sonntag, den 16.05.2010, 11:29 +0200 schrieb Michael Kluge:> Hi WangDi, > > the first version works. Screenshot is attached. I have a couple of > counter realized: RPC''s in flight and RPC''s completed in total on the > client, RPC''s enqueued, RPC''s in processing and RPC''c completed in total > on the server. All these counter can be broken down by the type of RPC > (op code). The picture has not yet the lines that show each single RPC, > I still have to do counter like "avg. time to complete an RPC over the > last second" and there are some more TODO''s. Like the timer > synchronization. (In the screenshot the first and the last counter show > total values while the one in the middle shows a rate.) > > What I like to have is a complete set of traces from a small cluster > (<100 nodes) including the servers. Would that be possible? > > Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be there and > like to talk about what would be useful for the next steps. > > > Regards, Michael > > Am 03.05.2010 21:52, schrieb di.wang: > > Michael Kluge wrote: > >>>> One more question: RPC 1334380768266400 (in the log WangDi sent me) > >>>> has on the client side only a "Sending RPC" message, thus missing the > >>>> "Completed RPC". The server has all three (received,start work, done > >>>> work). Has this RPC vanished on the way back to the client? There is > >>>> no further indication what happend. The last timestamp in the client > >>>> log is: > >>>> 1272565368.228628 > >>>> and the server says it finished the processing of the request at: > >>>> 1272565281.379471 > >>>> So the client log has been recorded long enough to contain the > >>>> "Completed RPC" message for this RPC if it arrived ever ... > >>> Logically, yes. But in some cases, some debug logs might be abandoned > >>> for some reasons(actually, it happens not rarely), and probably you need > >>> maintain an average time from server "Handled RPC" to client "Completed > >>> RPC", then you just guess the client "Completed RPC" time in this case. > >> > >> Oh my gosh ;) I don''t want to start speculations about the helpfulness > >> of incomplete debug logs. Anyway, what can get lost? Any kind of > >> message on the servers and clients? I think I''d like to know what > >> cases have to be handled while I try to track individual RPC''s on > >> their way. > > Any records can get lost here. Unfortunately, there are not any messages > > indicate the missing happened. :( > > (Usually, I would check the time stamp in the log, i.e. no records for a > > "long" time, for example several seconds, but this is not the accurate > > way). > > > > I guess you can just ignore these uncompleted records in your first > > step? Let''s see how these incomplete log will > > impact the profiling result, then we will decide how to deal with this? > > > > Thanks > > Wangdi > >> > >> Regards, Michael > >> _______________________________________________ > >> Lustre-devel mailing list > >> Lustre-devel at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > > > > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: lustre_rpc_trace.png Type: image/png Size: 146793 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100525/66fe7ab6/attachment-0001.png -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5997 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100525/66fe7ab6/attachment-0001.bin
Hi WangDi,> Looks great! Just query, as you said, "All these counter can be broken > down by the type of RPC (op code)" , you actually implemented that, but > not shown in the attached picture?Yes.> And could you please also add "Server queued RPCs" over time ?Already done. One good news: The Feature that Vampir can show something like a heat map (Eric asked about this) comes back with the release at ISC. It is now called "performance radar". It can produce a heat map for a counter and does some other things as well. I could send a picture around, but need at first an bigger trace (more hosts generating traces in parallel). Regards, Michael> Thanks > WangDi > > Michael Kluge wrote: > > Hi WangDi, > > > > so, for the moment I am done with what I promised. The work to be done > > is mainly debugging with more input data sets. Screenshot of Vampir > > showing the derived counter values for the RPC processing/queue times on > > the server and the client is attached. Units for the values are either > > microseconds or just a number. > > > > > > Regards, Michael > > > > Am Sonntag, den 16.05.2010, 11:29 +0200 schrieb Michael Kluge: > > > >> Hi WangDi, > >> > >> the first version works. Screenshot is attached. I have a couple of > >> counter realized: RPC''s in flight and RPC''s completed in total on the > >> client, RPC''s enqueued, RPC''s in processing and RPC''c completed in total > >> on the server. All these counter can be broken down by the type of RPC > >> (op code). The picture has not yet the lines that show each single RPC, > >> I still have to do counter like "avg. time to complete an RPC over the > >> last second" and there are some more TODO''s. Like the timer > >> synchronization. (In the screenshot the first and the last counter show > >> total values while the one in the middle shows a rate.) > >> > >> What I like to have is a complete set of traces from a small cluster > >> (<100 nodes) including the servers. Would that be possible? > >> > >> Is one of you in Hamburg May, 31-June, 3 for ISC''2010? I''ll be there and > >> like to talk about what would be useful for the next steps. > >> > >> > >> Regards, Michael > >> > >> Am 03.05.2010 21:52, schrieb di.wang: > >> > >>> Michael Kluge wrote: > >>> > >>>>>> One more question: RPC 1334380768266400 (in the log WangDi sent me) > >>>>>> has on the client side only a "Sending RPC" message, thus missing the > >>>>>> "Completed RPC". The server has all three (received,start work, done > >>>>>> work). Has this RPC vanished on the way back to the client? There is > >>>>>> no further indication what happend. The last timestamp in the client > >>>>>> log is: > >>>>>> 1272565368.228628 > >>>>>> and the server says it finished the processing of the request at: > >>>>>> 1272565281.379471 > >>>>>> So the client log has been recorded long enough to contain the > >>>>>> "Completed RPC" message for this RPC if it arrived ever ... > >>>>>> > >>>>> Logically, yes. But in some cases, some debug logs might be abandoned > >>>>> for some reasons(actually, it happens not rarely), and probably you need > >>>>> maintain an average time from server "Handled RPC" to client "Completed > >>>>> RPC", then you just guess the client "Completed RPC" time in this case. > >>>>> > >>>> Oh my gosh ;) I don''t want to start speculations about the helpfulness > >>>> of incomplete debug logs. Anyway, what can get lost? Any kind of > >>>> message on the servers and clients? I think I''d like to know what > >>>> cases have to be handled while I try to track individual RPC''s on > >>>> their way. > >>>> > >>> Any records can get lost here. Unfortunately, there are not any messages > >>> indicate the missing happened. :( > >>> (Usually, I would check the time stamp in the log, i.e. no records for a > >>> "long" time, for example several seconds, but this is not the accurate > >>> way). > >>> > >>> I guess you can just ignore these uncompleted records in your first > >>> step? Let''s see how these incomplete log will > >>> impact the profiling result, then we will decide how to deal with this? > >>> > >>> Thanks > >>> Wangdi > >>> > >>>> Regards, Michael > >>>> _______________________________________________ > >>>> Lustre-devel mailing list > >>>> Lustre-devel at lists.lustre.org > >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel > >>>> > >>> > >> _______________________________________________ > >> Lustre-devel mailing list > >> Lustre-devel at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-devel > >> > > > > > > > > ------------------------------------------------------------------------ > > > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5997 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100528/3897d501/attachment.bin
Eric Barton wrote:> I''d really like to see how vampire handles _all_ the trace data > we can throw at it. If 600 clients is a pain, how bad will it > be at 60,000? > > What in particular makes collecting traces from all the clients > + servers hard? How can/should we automate it or otherwise make > it easier? >Aha, I do not mean it is difficult to get all trace logs. I had thought the current target is to make sure vampire can handle the trace from "big" enough clusters. Actually, I am not sure whether vampire(with this trace analyse) should work in this way as we do now? i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire? Or I miss sth here? Michale, could you please explain a bit? What is your idea how vampire could help the end users? or the target here is just helping the developers and sysadmin to understand the system? Thanks WangDi> Cheers, > Eric > > >> -----Original Message----- >> From: di.wang [mailto:di.wang at oracle.com] >> Sent: 01 June 2010 12:50 PM >> To: Robert Read >> Cc: Michael Kluge; Eric Barton; Galen M. Shipman >> Subject: Re: [Lustre-devel] Lustre RPC visualization >> >> Hello, >> >> IMHO, just run IOR with whatever parameters, and get rpctrace >> log(probably only enable rpctrace) from 1 OST and some of clients >> (probably 2 is enough). >> Note: please make sure these 2 clients did communicate the OST during >> the IOR. >> >> Michael, I do not think you need all the trace logs from the clients. right? >> >> Robert >> >> If there are available time slots for this test on Hyperion, who can >> help to get these logs? >> >> Thanks >> Wangdi >> >> Robert Read wrote: >> >>> What should I run then? Do have scripts to capture this? >>> >>> robert >>> >>> On May 31, 2010, at 2:39 , Michael Kluge wrote: >>> >>> >>>> Hi Robert, >>>> >>>> 600 is a nice number. Plus the traces from the server an I am happy. >>>> >>>> >>>> Michael >>>> >>>> Am 28.05.2010 um 17:53 schrieb Robert Read: >>>> >>>> >>>>> On May 28, 2010, at 4:09 , di.wang wrote: >>>>> >>>>> >>>>>> Hello, Michael >>>>>> >>>>>> >>>>>>> One good news: The Feature that Vampir can show something like a heat >>>>>>> map (Eric asked about this) comes back with the release at ISC. It is >>>>>>> now called "performance radar". It can produce a heat map for a >>>>>>> counter >>>>>>> and does some other things as well. I could send a picture around, but >>>>>>> need at first an bigger trace (more hosts generating traces in >>>>>>> parallel). >>>>>>> >>>>>>> >>>>>> Right now I do not have big clusters available to generate the trace. >>>>>> I will see what I can do here. >>>>>> >>>>> If ~600 clients is big enough we could generate that on Hyperion. >>>>> >>>>> robert >>>>> >>>>> >>>>>> Thanks >>>>>> WangDi >>>>>> >>>>>>> >>>>>>> >>>>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Lustre-devel mailing list >>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org> >>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>> >>>>>>> >>>>> >>>> -- >>>> >>>> Michael Kluge, M.Sc. >>>> >>>> Technische Universit?t Dresden >>>> Center for Information Services and >>>> High Performance Computing (ZIH) >>>> D-01062 Dresden >>>> Germany >>>> >>>> Contact: >>>> Willersbau, Room WIL A 208 >>>> Phone: (+49) 351 463-34217 >>>> Fax: (+49) 351 463-37773 >>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> >>>> WWW: http://www.tu-dresden.de/zih >>>> >>>> > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
I''d really like to see how vampire handles _all_ the trace data we can throw at it. If 600 clients is a pain, how bad will it be at 60,000? What in particular makes collecting traces from all the clients + servers hard? How can/should we automate it or otherwise make it easier? Cheers, Eric> -----Original Message----- > From: di.wang [mailto:di.wang at oracle.com] > Sent: 01 June 2010 12:50 PM > To: Robert Read > Cc: Michael Kluge; Eric Barton; Galen M. Shipman > Subject: Re: [Lustre-devel] Lustre RPC visualization > > Hello, > > IMHO, just run IOR with whatever parameters, and get rpctrace > log(probably only enable rpctrace) from 1 OST and some of clients > (probably 2 is enough). > Note: please make sure these 2 clients did communicate the OST during > the IOR. > > Michael, I do not think you need all the trace logs from the clients. right? > > Robert > > If there are available time slots for this test on Hyperion, who can > help to get these logs? > > Thanks > Wangdi > > Robert Read wrote: > > What should I run then? Do have scripts to capture this? > > > > robert > > > > On May 31, 2010, at 2:39 , Michael Kluge wrote: > > > >> Hi Robert, > >> > >> 600 is a nice number. Plus the traces from the server an I am happy. > >> > >> > >> Michael > >> > >> Am 28.05.2010 um 17:53 schrieb Robert Read: > >> > >>> > >>> On May 28, 2010, at 4:09 , di.wang wrote: > >>> > >>>> Hello, Michael > >>>> > >>>>> One good news: The Feature that Vampir can show something like a heat > >>>>> map (Eric asked about this) comes back with the release at ISC. It is > >>>>> now called "performance radar". It can produce a heat map for a > >>>>> counter > >>>>> and does some other things as well. I could send a picture around, but > >>>>> need at first an bigger trace (more hosts generating traces in > >>>>> parallel). > >>>>> > >>>> Right now I do not have big clusters available to generate the trace. > >>>> I will see what I can do here. > >>> > >>> If ~600 clients is big enough we could generate that on Hyperion. > >>> > >>> robert > >>> > >>>> > >>>> Thanks > >>>> WangDi > >>>>> > >>>>> > >>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> ------------------------------------------------------------------------ > >>>>> > >>>>> _______________________________________________ > >>>>> Lustre-devel mailing list > >>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org> > >>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel > >>>>> > >>>> > >>> > >>> > >> > >> > >> -- > >> > >> Michael Kluge, M.Sc. > >> > >> Technische Universit?t Dresden > >> Center for Information Services and > >> High Performance Computing (ZIH) > >> D-01062 Dresden > >> Germany > >> > >> Contact: > >> Willersbau, Room WIL A 208 > >> Phone: (+49) 351 463-34217 > >> Fax: (+49) 351 463-37773 > >> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> > >> WWW: http://www.tu-dresden.de/zih > >> > >
On 2010-06-01, at 06:12, di.wang wrote:> Michael, I do not think you need all the trace logs from the clients. right?Actually, I think he does want trace logs from all of the clients.> I had thought the current target is to make sure vampire can handle > the trace from "big" enough clusters. > > Actually, I am not sure whether vampire(with this trace analyse) should > work in this way as we do now? i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire? > > Or I miss sth here? Michale, could you please explain a bit? What is > your idea how vampire could help the end users? or the target here is > just helping the developers and sysadmin to understand the system?My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines). Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes. This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin. In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID. The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd). The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own. That would be a significant undertaking, I think. In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.>>> -----Original Message----- >>> From: di.wang [mailto:di.wang at oracle.com] >>> Sent: 01 June 2010 12:50 PM >>> To: Robert Read >>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman >>> Subject: Re: [Lustre-devel] Lustre RPC visualization >>> >>> Hello, >>> >>> IMHO, just run IOR with whatever parameters, and get rpctrace >>> log(probably only enable rpctrace) from 1 OST and some of clients >>> (probably 2 is enough). >>> Note: please make sure these 2 clients did communicate the OST during >>> the IOR. >>> >>> Michael, I do not think you need all the trace logs from the clients. right? >>> >>> Robert >>> >>> If there are available time slots for this test on Hyperion, who can >>> help to get these logs? >>> >>> Thanks >>> Wangdi >>> >>> Robert Read wrote: >>> >>>> What should I run then? Do have scripts to capture this? >>>> >>>> robert >>>> >>>> On May 31, 2010, at 2:39 , Michael Kluge wrote: >>>> >>>> >>>>> Hi Robert, >>>>> >>>>> 600 is a nice number. Plus the traces from the server an I am happy. >>>>> >>>>> >>>>> Michael >>>>> >>>>> Am 28.05.2010 um 17:53 schrieb Robert Read: >>>>> >>>>> >>>>>> On May 28, 2010, at 4:09 , di.wang wrote: >>>>>> >>>>>> >>>>>>> Hello, Michael >>>>>>> >>>>>>> >>>>>>>> One good news: The Feature that Vampir can show something like a heat >>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is >>>>>>>> now called "performance radar". It can produce a heat map for a >>>>>>>> counter >>>>>>>> and does some other things as well. I could send a picture around, but >>>>>>>> need at first an bigger trace (more hosts generating traces in >>>>>>>> parallel). >>>>>>>> >>>>>>>> >>>>>>> Right now I do not have big clusters available to generate the trace. >>>>>>> I will see what I can do here. >>>>>>> >>>>>> If ~600 clients is big enough we could generate that on Hyperion. >>>>>> >>>>>> robert >>>>>> >>>>>> >>>>>>> Thanks >>>>>>> WangDi >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Lustre-devel mailing list >>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org> >>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>>> >>>>>>>> >>>>>> >>>>> -- >>>>> >>>>> Michael Kluge, M.Sc. >>>>> >>>>> Technische Universit?t Dresden >>>>> Center for Information Services and >>>>> High Performance Computing (ZIH) >>>>> D-01062 Dresden >>>>> Germany >>>>> >>>>> Contact: >>>>> Willersbau, Room WIL A 208 >>>>> Phone: (+49) 351 463-34217 >>>>> Fax: (+49) 351 463-37773 >>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> >>>>> WWW: http://www.tu-dresden.de/zih >>>>> >>>>> >> >> >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel >> > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-develCheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
> On 2010-06-01, at 06:12, di.wang wrote: >> Michael, I do not think you need all the trace logs from the clients. right? > > Actually, I think he does want trace logs from all of the clients.Yes, 600 is a good number. Vampir can easily handle this. If possible, I''d like to have all server traces as well to include this information. Right now I am only putting these RPC''s in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed. Michael>> I had thought the current target is to make sure vampire can handle >> the trace from "big" enough clusters. >> >> Actually, I am not sure whether vampire(with this trace analyse) should >> work in this way as we do now? i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire? >> >> Or I miss sth here? Michale, could you please explain a bit? What is >> your idea how vampire could help the end users? or the target here is >> just helping the developers and sysadmin to understand the system? > > My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines). Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes. This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin. > > In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID. The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd). The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own. That would be a significant undertaking, I think. > > In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed. > >>>> -----Original Message----- >>>> From: di.wang [mailto:di.wang at oracle.com] >>>> Sent: 01 June 2010 12:50 PM >>>> To: Robert Read >>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman >>>> Subject: Re: [Lustre-devel] Lustre RPC visualization >>>> >>>> Hello, >>>> >>>> IMHO, just run IOR with whatever parameters, and get rpctrace >>>> log(probably only enable rpctrace) from 1 OST and some of clients >>>> (probably 2 is enough). >>>> Note: please make sure these 2 clients did communicate the OST during >>>> the IOR. >>>> >>>> Michael, I do not think you need all the trace logs from the clients. right? >>>> >>>> Robert >>>> >>>> If there are available time slots for this test on Hyperion, who can >>>> help to get these logs? >>>> >>>> Thanks >>>> Wangdi >>>> >>>> Robert Read wrote: >>>> >>>>> What should I run then? Do have scripts to capture this? >>>>> >>>>> robert >>>>> >>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote: >>>>> >>>>> >>>>>> Hi Robert, >>>>>> >>>>>> 600 is a nice number. Plus the traces from the server an I am happy. >>>>>> >>>>>> >>>>>> Michael >>>>>> >>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read: >>>>>> >>>>>> >>>>>>> On May 28, 2010, at 4:09 , di.wang wrote: >>>>>>> >>>>>>> >>>>>>>> Hello, Michael >>>>>>>> >>>>>>>> >>>>>>>>> One good news: The Feature that Vampir can show something like a heat >>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is >>>>>>>>> now called "performance radar". It can produce a heat map for a >>>>>>>>> counter >>>>>>>>> and does some other things as well. I could send a picture around, but >>>>>>>>> need at first an bigger trace (more hosts generating traces in >>>>>>>>> parallel). >>>>>>>>> >>>>>>>>> >>>>>>>> Right now I do not have big clusters available to generate the trace. >>>>>>>> I will see what I can do here. >>>>>>>> >>>>>>> If ~600 clients is big enough we could generate that on Hyperion. >>>>>>> >>>>>>> robert >>>>>>> >>>>>>> >>>>>>>> Thanks >>>>>>>> WangDi >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Lustre-devel mailing list >>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org> >>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> -- >>>>>> >>>>>> Michael Kluge, M.Sc. >>>>>> >>>>>> Technische Universit?t Dresden >>>>>> Center for Information Services and >>>>>> High Performance Computing (ZIH) >>>>>> D-01062 Dresden >>>>>> Germany >>>>>> >>>>>> Contact: >>>>>> Willersbau, Room WIL A 208 >>>>>> Phone: (+49) 351 463-34217 >>>>>> Fax: (+49) 351 463-37773 >>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> >>>>>> WWW: http://www.tu-dresden.de/zih >>>>>> >>>>>> >>> >>> >>> _______________________________________________ >>> Lustre-devel mailing list >>> Lustre-devel at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >> >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100601/557b4f54/attachment-0001.html
Hi Robert, thanks for the trace files. Every single file contains log entries like: line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback()) incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296 Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers? Michael Am 01.06.2010 um 21:39 schrieb Michael Kluge:>> On 2010-06-01, at 06:12, di.wang wrote: >>> Michael, I do not think you need all the trace logs from the clients. right? >> >> Actually, I think he does want trace logs from all of the clients. > > Yes, 600 is a good number. Vampir can easily handle this. If possible, I''d like to have all server traces as well to include this information. Right now I am only putting these RPC''s in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed. > > > Michael > >>> I had thought the current target is to make sure vampire can handle >>> the trace from "big" enough clusters. >>> >>> Actually, I am not sure whether vampire(with this trace analyse) should >>> work in this way as we do now? i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire? >>> >>> Or I miss sth here? Michale, could you please explain a bit? What is >>> your idea how vampire could help the end users? or the target here is >>> just helping the developers and sysadmin to understand the system? >> >> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines). Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes. This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin. >> >> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID. The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd). The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own. That would be a significant undertaking, I think. >> >> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed. >> >>>>> -----Original Message----- >>>>> From: di.wang [mailto:di.wang at oracle.com] >>>>> Sent: 01 June 2010 12:50 PM >>>>> To: Robert Read >>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman >>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization >>>>> >>>>> Hello, >>>>> >>>>> IMHO, just run IOR with whatever parameters, and get rpctrace >>>>> log(probably only enable rpctrace) from 1 OST and some of clients >>>>> (probably 2 is enough). >>>>> Note: please make sure these 2 clients did communicate the OST during >>>>> the IOR. >>>>> >>>>> Michael, I do not think you need all the trace logs from the clients. right? >>>>> >>>>> Robert >>>>> >>>>> If there are available time slots for this test on Hyperion, who can >>>>> help to get these logs? >>>>> >>>>> Thanks >>>>> Wangdi >>>>> >>>>> Robert Read wrote: >>>>> >>>>>> What should I run then? Do have scripts to capture this? >>>>>> >>>>>> robert >>>>>> >>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote: >>>>>> >>>>>> >>>>>>> Hi Robert, >>>>>>> >>>>>>> 600 is a nice number. Plus the traces from the server an I am happy. >>>>>>> >>>>>>> >>>>>>> Michael >>>>>>> >>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read: >>>>>>> >>>>>>> >>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Hello, Michael >>>>>>>>> >>>>>>>>> >>>>>>>>>> One good news: The Feature that Vampir can show something like a heat >>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is >>>>>>>>>> now called "performance radar". It can produce a heat map for a >>>>>>>>>> counter >>>>>>>>>> and does some other things as well. I could send a picture around, but >>>>>>>>>> need at first an bigger trace (more hosts generating traces in >>>>>>>>>> parallel). >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Right now I do not have big clusters available to generate the trace. >>>>>>>>> I will see what I can do here. >>>>>>>>> >>>>>>>> If ~600 clients is big enough we could generate that on Hyperion. >>>>>>>> >>>>>>>> robert >>>>>>>> >>>>>>>> >>>>>>>>> Thanks >>>>>>>>> WangDi >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Lustre-devel mailing list >>>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org> >>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Michael Kluge, M.Sc. >>>>>>> >>>>>>> Technische Universit?t Dresden >>>>>>> Center for Information Services and >>>>>>> High Performance Computing (ZIH) >>>>>>> D-01062 Dresden >>>>>>> Germany >>>>>>> >>>>>>> Contact: >>>>>>> Willersbau, Room WIL A 208 >>>>>>> Phone: (+49) 351 463-34217 >>>>>>> Fax: (+49) 351 463-37773 >>>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> >>>>>>> WWW: http://www.tu-dresden.de/zih >>>>>>> >>>>>>> >>>> >>>> >>>> _______________________________________________ >>>> Lustre-devel mailing list >>>> Lustre-devel at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>> >>> >>> _______________________________________________ >>> Lustre-devel mailing list >>> Lustre-devel at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >> >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel >> > > > -- > > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100616/4755505c/attachment-0001.html
On 2010-06-16, at 02:46, Michael Kluge wrote:> thanks for the trace files. Every single file contains log entries like: > > line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback()) incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296 > > Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers?Clients can receive requests from the server for things like lock cancellations, so this line will appear in both client and server logs. We don''t have a test system with 684 servers...> Am 01.06.2010 um 21:39 schrieb Michael Kluge: > >>> On 2010-06-01, at 06:12, di.wang wrote: >>>> Michael, I do not think you need all the trace logs from the clients. right? >>> >>> Actually, I think he does want trace logs from all of the clients. >> >> Yes, 600 is a good number. Vampir can easily handle this. If possible, I''d like to have all server traces as well to include this information. Right now I am only putting these RPC''s in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed. >> >> >> Michael >> >>>> I had thought the current target is to make sure vampire can handle >>>> the trace from "big" enough clusters. >>>> >>>> Actually, I am not sure whether vampire(with this trace analyse) should >>>> work in this way as we do now? i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire? >>>> >>>> Or I miss sth here? Michale, could you please explain a bit? What is >>>> your idea how vampire could help the end users? or the target here is >>>> just helping the developers and sysadmin to understand the system? >>> >>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines). Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes. This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin. >>> >>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID. The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd). The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own. That would be a significant undertaking, I think. >>> >>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed. >>> >>>>>> -----Original Message----- >>>>>> From: di.wang [mailto:di.wang at oracle.com] >>>>>> Sent: 01 June 2010 12:50 PM >>>>>> To: Robert Read >>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman >>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization >>>>>> >>>>>> Hello, >>>>>> >>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace >>>>>> log(probably only enable rpctrace) from 1 OST and some of clients >>>>>> (probably 2 is enough). >>>>>> Note: please make sure these 2 clients did communicate the OST during >>>>>> the IOR. >>>>>> >>>>>> Michael, I do not think you need all the trace logs from the clients. right? >>>>>> >>>>>> Robert >>>>>> >>>>>> If there are available time slots for this test on Hyperion, who can >>>>>> help to get these logs? >>>>>> >>>>>> Thanks >>>>>> Wangdi >>>>>> >>>>>> Robert Read wrote: >>>>>> >>>>>>> What should I run then? Do have scripts to capture this? >>>>>>> >>>>>>> robert >>>>>>> >>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote: >>>>>>> >>>>>>> >>>>>>>> Hi Robert, >>>>>>>> >>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy. >>>>>>>> >>>>>>>> >>>>>>>> Michael >>>>>>>> >>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read: >>>>>>>> >>>>>>>> >>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hello, Michael >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat >>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is >>>>>>>>>>> now called "performance radar". It can produce a heat map for a >>>>>>>>>>> counter >>>>>>>>>>> and does some other things as well. I could send a picture around, but >>>>>>>>>>> need at first an bigger trace (more hosts generating traces in >>>>>>>>>>> parallel). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Right now I do not have big clusters available to generate the trace. >>>>>>>>>> I will see what I can do here. >>>>>>>>>> >>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion. >>>>>>>>> >>>>>>>>> robert >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> WangDi >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Lustre-devel mailing list >>>>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org> >>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Michael Kluge, M.Sc. >>>>>>>> >>>>>>>> Technische Universit?t Dresden >>>>>>>> Center for Information Services and >>>>>>>> High Performance Computing (ZIH) >>>>>>>> D-01062 Dresden >>>>>>>> Germany >>>>>>>> >>>>>>>> Contact: >>>>>>>> Willersbau, Room WIL A 208 >>>>>>>> Phone: (+49) 351 463-34217 >>>>>>>> Fax: (+49) 351 463-37773 >>>>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> >>>>>>>> WWW: http://www.tu-dresden.de/zih >>>>>>>> >>>>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Lustre-devel mailing list >>>>> Lustre-devel at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>> >>>> >>>> _______________________________________________ >>>> Lustre-devel mailing list >>>> Lustre-devel at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Lustre Technical Lead >>> Oracle Corporation Canada Inc. >>> >>> _______________________________________________ >>> Lustre-devel mailing list >>> Lustre-devel at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >> >> >> -- >> >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room WIL A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de >> WWW: http://www.tu-dresden.de/zih >> >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > > -- > > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih >Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Hmm. Ok. Is there a way to figure out what type of file I have if I have nothing like the (most probably incomplete) file itself? Whether it is a client or a server log and which NID the node has had? Right now the code is executing a not-so-easy mapping algorithm over all files to figure this out. The way to distinguish between client and server was to look at those lines which I expected to see only on the server (the "incoming req at ....."). Michael Am Mittwoch, den 16.06.2010, 08:50 -0600 schrieb Andreas Dilger:> On 2010-06-16, at 02:46, Michael Kluge wrote: > > thanks for the trace files. Every single file contains log entries like: > > > > line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback()) incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296 > > > > Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers? > > Clients can receive requests from the server for things like lock cancellations, so this line will appear in both client and server logs. We don''t have a test system with 684 servers... > > > Am 01.06.2010 um 21:39 schrieb Michael Kluge: > > > >>> On 2010-06-01, at 06:12, di.wang wrote: > >>>> Michael, I do not think you need all the trace logs from the clients. right? > >>> > >>> Actually, I think he does want trace logs from all of the clients. > >> > >> Yes, 600 is a good number. Vampir can easily handle this. If possible, I''d like to have all server traces as well to include this information. Right now I am only putting these RPC''s in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed. > >> > >> > >> Michael > >> > >>>> I had thought the current target is to make sure vampire can handle > >>>> the trace from "big" enough clusters. > >>>> > >>>> Actually, I am not sure whether vampire(with this trace analyse) should > >>>> work in this way as we do now? i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire? > >>>> > >>>> Or I miss sth here? Michale, could you please explain a bit? What is > >>>> your idea how vampire could help the end users? or the target here is > >>>> just helping the developers and sysadmin to understand the system? > >>> > >>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines). Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes. This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin. > >>> > >>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID. The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd). The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own. That would be a significant undertaking, I think. > >>> > >>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed. > >>> > >>>>>> -----Original Message----- > >>>>>> From: di.wang [mailto:di.wang at oracle.com] > >>>>>> Sent: 01 June 2010 12:50 PM > >>>>>> To: Robert Read > >>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman > >>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace > >>>>>> log(probably only enable rpctrace) from 1 OST and some of clients > >>>>>> (probably 2 is enough). > >>>>>> Note: please make sure these 2 clients did communicate the OST during > >>>>>> the IOR. > >>>>>> > >>>>>> Michael, I do not think you need all the trace logs from the clients. right? > >>>>>> > >>>>>> Robert > >>>>>> > >>>>>> If there are available time slots for this test on Hyperion, who can > >>>>>> help to get these logs? > >>>>>> > >>>>>> Thanks > >>>>>> Wangdi > >>>>>> > >>>>>> Robert Read wrote: > >>>>>> > >>>>>>> What should I run then? Do have scripts to capture this? > >>>>>>> > >>>>>>> robert > >>>>>>> > >>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote: > >>>>>>> > >>>>>>> > >>>>>>>> Hi Robert, > >>>>>>>> > >>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy. > >>>>>>>> > >>>>>>>> > >>>>>>>> Michael > >>>>>>>> > >>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read: > >>>>>>>> > >>>>>>>> > >>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Hello, Michael > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat > >>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is > >>>>>>>>>>> now called "performance radar". It can produce a heat map for a > >>>>>>>>>>> counter > >>>>>>>>>>> and does some other things as well. I could send a picture around, but > >>>>>>>>>>> need at first an bigger trace (more hosts generating traces in > >>>>>>>>>>> parallel). > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> Right now I do not have big clusters available to generate the trace. > >>>>>>>>>> I will see what I can do here. > >>>>>>>>>> > >>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion. > >>>>>>>>> > >>>>>>>>> robert > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Thanks > >>>>>>>>>> WangDi > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> ------------------------------------------------------------------------ > >>>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> Lustre-devel mailing list > >>>>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org> > >>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> -- > >>>>>>>> > >>>>>>>> Michael Kluge, M.Sc. > >>>>>>>> > >>>>>>>> Technische Universit?t Dresden > >>>>>>>> Center for Information Services and > >>>>>>>> High Performance Computing (ZIH) > >>>>>>>> D-01062 Dresden > >>>>>>>> Germany > >>>>>>>> > >>>>>>>> Contact: > >>>>>>>> Willersbau, Room WIL A 208 > >>>>>>>> Phone: (+49) 351 463-34217 > >>>>>>>> Fax: (+49) 351 463-37773 > >>>>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de> > >>>>>>>> WWW: http://www.tu-dresden.de/zih > >>>>>>>> > >>>>>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Lustre-devel mailing list > >>>>> Lustre-devel at lists.lustre.org > >>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel > >>>>> > >>>> > >>>> _______________________________________________ > >>>> Lustre-devel mailing list > >>>> Lustre-devel at lists.lustre.org > >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel > >>> > >>> > >>> Cheers, Andreas > >>> -- > >>> Andreas Dilger > >>> Lustre Technical Lead > >>> Oracle Corporation Canada Inc. > >>> > >>> _______________________________________________ > >>> Lustre-devel mailing list > >>> Lustre-devel at lists.lustre.org > >>> http://lists.lustre.org/mailman/listinfo/lustre-devel > >>> > >> > >> > >> -- > >> > >> Michael Kluge, M.Sc. > >> > >> Technische Universit?t Dresden > >> Center for Information Services and > >> High Performance Computing (ZIH) > >> D-01062 Dresden > >> Germany > >> > >> Contact: > >> Willersbau, Room WIL A 208 > >> Phone: (+49) 351 463-34217 > >> Fax: (+49) 351 463-37773 > >> e-mail: michael.kluge at tu-dresden.de > >> WWW: http://www.tu-dresden.de/zih > >> > >> _______________________________________________ > >> Lustre-devel mailing list > >> Lustre-devel at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > > > > > -- > > > > Michael Kluge, M.Sc. > > > > Technische Universit?t Dresden > > Center for Information Services and > > High Performance Computing (ZIH) > > D-01062 Dresden > > Germany > > > > Contact: > > Willersbau, Room WIL A 208 > > Phone: (+49) 351 463-34217 > > Fax: (+49) 351 463-37773 > > e-mail: michael.kluge at tu-dresden.de > > WWW: http://www.tu-dresden.de/zih > > > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5997 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100617/5402077b/attachment.bin
OK, I found time do do programming again. There are still some issues, but I got a first screenshot with the data from Roberts log. Michael Am 17.06.2010 16:35, schrieb Robert Read:> I don''t have the nids handy (I''ll try to get that later today), but these are the server logs: > > ior-634/hyperion-sun-ost6.ior.log > ior-634/hyperion-sun-ost7.ior.log > ior-634/hyperion-sun-ost5.ior.log > ior-634/hyperion-mds7.ior.log > > robert > > On Jun 17, 2010, at 7:02 , Michael Kluge wrote: > >> Hmm. Ok. Is there a way to figure out what type of file I have if I have >> nothing like the (most probably incomplete) file itself? Whether it is a >> client or a server log and which NID the node has had? Right now the >> code is executing a not-so-easy mapping algorithm over all files to >> figure this out. The way to distinguish between client and server was to >> look at those lines which I expected to see only on the server (the >> "incoming req at ....."). >> >> Michael >> >> Am Mittwoch, den 16.06.2010, 08:50 -0600 schrieb Andreas Dilger: >>> On 2010-06-16, at 02:46, Michael Kluge wrote: >>>> thanks for the trace files. Every single file contains log entries like: >>>> >>>> line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback()) incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296 >>>> >>>> Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers? >>> >>> Clients can receive requests from the server for things like lock cancellations, so this line will appear in both client and server logs. We don''t have a test system with 684 servers... >>> >>>> Am 01.06.2010 um 21:39 schrieb Michael Kluge: >>>> >>>>>> On 2010-06-01, at 06:12, di.wang wrote: >>>>>>> Michael, I do not think you need all the trace logs from the clients. right? >>>>>> >>>>>> Actually, I think he does want trace logs from all of the clients. >>>>> >>>>> Yes, 600 is a good number. Vampir can easily handle this. If possible, I''d like to have all server traces as well to include this information. Right now I am only putting these RPC''s in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed. >>>>> >>>>> >>>>> Michael >>>>> >>>>>>> I had thought the current target is to make sure vampire can handle >>>>>>> the trace from "big" enough clusters. >>>>>>> >>>>>>> Actually, I am not sure whether vampire(with this trace analyse) should >>>>>>> work in this way as we do now? i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire? >>>>>>> >>>>>>> Or I miss sth here? Michale, could you please explain a bit? What is >>>>>>> your idea how vampire could help the end users? or the target here is >>>>>>> just helping the developers and sysadmin to understand the system? >>>>>> >>>>>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines). Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes. This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin. >>>>>> >>>>>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID. The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd). The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own. That would be a significant undertaking, I think. >>>>>> >>>>>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed. >>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: di.wang [mailto:di.wang at oracle.com] >>>>>>>>> Sent: 01 June 2010 12:50 PM >>>>>>>>> To: Robert Read >>>>>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman >>>>>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace >>>>>>>>> log(probably only enable rpctrace) from 1 OST and some of clients >>>>>>>>> (probably 2 is enough). >>>>>>>>> Note: please make sure these 2 clients did communicate the OST during >>>>>>>>> the IOR. >>>>>>>>> >>>>>>>>> Michael, I do not think you need all the trace logs from the clients. right? >>>>>>>>> >>>>>>>>> Robert >>>>>>>>> >>>>>>>>> If there are available time slots for this test on Hyperion, who can >>>>>>>>> help to get these logs? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Wangdi >>>>>>>>> >>>>>>>>> Robert Read wrote: >>>>>>>>> >>>>>>>>>> What should I run then? Do have scripts to capture this? >>>>>>>>>> >>>>>>>>>> robert >>>>>>>>>> >>>>>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hi Robert, >>>>>>>>>>> >>>>>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Michael >>>>>>>>>>> >>>>>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Hello, Michael >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat >>>>>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is >>>>>>>>>>>>>> now called "performance radar". It can produce a heat map for a >>>>>>>>>>>>>> counter >>>>>>>>>>>>>> and does some other things as well. I could send a picture around, but >>>>>>>>>>>>>> need at first an bigger trace (more hosts generating traces in >>>>>>>>>>>>>> parallel). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Right now I do not have big clusters available to generate the trace. >>>>>>>>>>>>> I will see what I can do here. >>>>>>>>>>>>> >>>>>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion. >>>>>>>>>>>> >>>>>>>>>>>> robert >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> WangDi >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Lustre-devel mailing list >>>>>>>>>>>>>> Lustre-devel at lists.lustre.org<mailto:Lustre-devel at lists.lustre.org> >>>>>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> Michael Kluge, M.Sc. >>>>>>>>>>> >>>>>>>>>>> Technische Universit?t Dresden >>>>>>>>>>> Center for Information Services and >>>>>>>>>>> High Performance Computing (ZIH) >>>>>>>>>>> D-01062 Dresden >>>>>>>>>>> Germany >>>>>>>>>>> >>>>>>>>>>> Contact: >>>>>>>>>>> Willersbau, Room WIL A 208 >>>>>>>>>>> Phone: (+49) 351 463-34217 >>>>>>>>>>> Fax: (+49) 351 463-37773 >>>>>>>>>>> e-mail: michael.kluge at tu-dresden.de<mailto:michael.kluge at tu-dresden.de> >>>>>>>>>>> WWW: http://www.tu-dresden.de/zih >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Lustre-devel mailing list >>>>>>>> Lustre-devel at lists.lustre.org >>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Lustre-devel mailing list >>>>>>> Lustre-devel at lists.lustre.org >>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>> >>>>>> >>>>>> Cheers, Andreas >>>>>> -- >>>>>> Andreas Dilger >>>>>> Lustre Technical Lead >>>>>> Oracle Corporation Canada Inc. >>>>>> >>>>>> _______________________________________________ >>>>>> Lustre-devel mailing list >>>>>> Lustre-devel at lists.lustre.org >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Michael Kluge, M.Sc. >>>>> >>>>> Technische Universit?t Dresden >>>>> Center for Information Services and >>>>> High Performance Computing (ZIH) >>>>> D-01062 Dresden >>>>> Germany >>>>> >>>>> Contact: >>>>> Willersbau, Room WIL A 208 >>>>> Phone: (+49) 351 463-34217 >>>>> Fax: (+49) 351 463-37773 >>>>> e-mail: michael.kluge at tu-dresden.de >>>>> WWW: http://www.tu-dresden.de/zih >>>>> >>>>> _______________________________________________ >>>>> Lustre-devel mailing list >>>>> Lustre-devel at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>> >>>> >>>> -- >>>> >>>> Michael Kluge, M.Sc. >>>> >>>> Technische Universit?t Dresden >>>> Center for Information Services and >>>> High Performance Computing (ZIH) >>>> D-01062 Dresden >>>> Germany >>>> >>>> Contact: >>>> Willersbau, Room WIL A 208 >>>> Phone: (+49) 351 463-34217 >>>> Fax: (+49) 351 463-37773 >>>> e-mail: michael.kluge at tu-dresden.de >>>> WWW: http://www.tu-dresden.de/zih >>>> >>> >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Lustre Technical Lead >>> Oracle Corporation Canada Inc. >>> >>> >> >> -- >> >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de >> WWW: http://www.tu-dresden.de/zih >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: lustre2.png Type: image/png Size: 28695 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100620/81293fa9/attachment-0002.png -------------- next part -------------- A non-text attachment was scrubbed... Name: lustre1.png Type: image/png Size: 63875 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100620/81293fa9/attachment-0003.png
The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file. Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas? Michael Am 20.06.2010 um 22:44 schrieb Michael Kluge:> OK, I found time do do programming again. There are still some issues, but I got a first screenshot with the data from Roberts log. > > Michael > > Am 17.06.2010 16:35, schrieb Robert Read: >> I don''t have the nids handy (I''ll try to get that later today), but these are the server logs: >> >> ior-634/hyperion-sun-ost6.ior.log >> ior-634/hyperion-sun-ost7.ior.log >> ior-634/hyperion-sun-ost5.ior.log >> ior-634/hyperion-mds7.ior.log >> >> robert >> >> On Jun 17, 2010, at 7:02 , Michael Kluge wrote: >> >>> Hmm. Ok. Is there a way to figure out what type of file I have if I have >>> nothing like the (most probably incomplete) file itself? Whether it is a >>> client or a server log and which NID the node has had? Right now the >>> code is executing a not-so-easy mapping algorithm over all files to >>> figure this out. The way to distinguish between client and server was to >>> look at those lines which I expected to see only on the server (the >>> "incoming req at ....."). >>> >>> Michael >>> >>> Am Mittwoch, den 16.06.2010, 08:50 -0600 schrieb Andreas Dilger: >>>> On 2010-06-16, at 02:46, Michael Kluge wrote: >>>>> thanks for the trace files. Every single file contains log entries like: >>>>> >>>>> line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback()) incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296 >>>>> >>>>> Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers? >>>> >>>> Clients can receive requests from the server for things like lock cancellations, so this line will appear in both client and server logs. We don''t have a test system with 684 servers... >>>> >>>>> Am 01.06.2010 um 21:39 schrieb Michael Kluge: >>>>> >>>>>>> On 2010-06-01, at 06:12, di.wang wrote: >>>>>>>> Michael, I do not think you need all the trace logs from the clients. right? >>>>>>> >>>>>>> Actually, I think he does want trace logs from all of the clients. >>>>>> >>>>>> Yes, 600 is a good number. Vampir can easily handle this. If possible, I''d like to have all server traces as well to include this information. Right now I am only putting these RPC''s in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed. >>>>>> >>>>>> >>>>>> Michael >>>>>> >>>>>>>> I had thought the current target is to make sure vampire can handle >>>>>>>> the trace from "big" enough clusters. >>>>>>>> >>>>>>>> Actually, I am not sure whether vampire(with this trace analyse) should >>>>>>>> work in this way as we do now? i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire? >>>>>>>> >>>>>>>> Or I miss sth here? Michale, could you please explain a bit? What is >>>>>>>> your idea how vampire could help the end users? or the target here is >>>>>>>> just helping the developers and sysadmin to understand the system? >>>>>>> >>>>>>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines). Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes. This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin. >>>>>>> >>>>>>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID. The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd). The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own. That would be a significant undertaking, I think. >>>>>>> >>>>>>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed. >>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: di.wang [mailto:di.wang at oracle.com] >>>>>>>>>> Sent: 01 June 2010 12:50 PM >>>>>>>>>> To: Robert Read >>>>>>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman >>>>>>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace >>>>>>>>>> log(probably only enable rpctrace) from 1 OST and some of clients >>>>>>>>>> (probably 2 is enough). >>>>>>>>>> Note: please make sure these 2 clients did communicate the OST during >>>>>>>>>> the IOR. >>>>>>>>>> >>>>>>>>>> Michael, I do not think you need all the trace logs from the clients. right? >>>>>>>>>> >>>>>>>>>> Robert >>>>>>>>>> >>>>>>>>>> If there are available time slots for this test on Hyperion, who can >>>>>>>>>> help to get these logs? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Wangdi >>>>>>>>>> >>>>>>>>>> Robert Read wrote: >>>>>>>>>> >>>>>>>>>>> What should I run then? Do have scripts to capture this? >>>>>>>>>>> >>>>>>>>>>> robert >>>>>>>>>>> >>>>>>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hi Robert, >>>>>>>>>>>> >>>>>>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Michael >>>>>>>>>>>> >>>>>>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, Michael >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat >>>>>>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is >>>>>>>>>>>>>>> now called "performance radar". It can produce a heat map for a >>>>>>>>>>>>>>> counter >>>>>>>>>>>>>>> and does some other things as well. I could send a picture around, but >>>>>>>>>>>>>>> need at first an bigger trace (more hosts generating traces in >>>>>>>>>>>>>>> parallel). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Right now I do not have big clusters available to generate the trace. >>>>>>>>>>>>>> I will see what I can do here. >>>>>>>>>>>>>> >>>>>>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion. >>>>>>>>>>>>> >>>>>>>>>>>>> robert >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>> WangDi >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Lustre-devel mailing list >>>>>>>>>>>>>>> Lustre-devel at lists.lustre.org<mailto:Lustre-devel at lists.lustre.org> >>>>>>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> Michael Kluge, M.Sc. >>>>>>>>>>>> >>>>>>>>>>>> Technische Universit?t Dresden >>>>>>>>>>>> Center for Information Services and >>>>>>>>>>>> High Performance Computing (ZIH) >>>>>>>>>>>> D-01062 Dresden >>>>>>>>>>>> Germany >>>>>>>>>>>> >>>>>>>>>>>> Contact: >>>>>>>>>>>> Willersbau, Room WIL A 208 >>>>>>>>>>>> Phone: (+49) 351 463-34217 >>>>>>>>>>>> Fax: (+49) 351 463-37773 >>>>>>>>>>>> e-mail: michael.kluge at tu-dresden.de<mailto:michael.kluge at tu-dresden.de> >>>>>>>>>>>> WWW: http://www.tu-dresden.de/zih >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Lustre-devel mailing list >>>>>>>>> Lustre-devel at lists.lustre.org >>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Lustre-devel mailing list >>>>>>>> Lustre-devel at lists.lustre.org >>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>> >>>>>>> >>>>>>> Cheers, Andreas >>>>>>> -- >>>>>>> Andreas Dilger >>>>>>> Lustre Technical Lead >>>>>>> Oracle Corporation Canada Inc. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Lustre-devel mailing list >>>>>>> Lustre-devel at lists.lustre.org >>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Michael Kluge, M.Sc. >>>>>> >>>>>> Technische Universit?t Dresden >>>>>> Center for Information Services and >>>>>> High Performance Computing (ZIH) >>>>>> D-01062 Dresden >>>>>> Germany >>>>>> >>>>>> Contact: >>>>>> Willersbau, Room WIL A 208 >>>>>> Phone: (+49) 351 463-34217 >>>>>> Fax: (+49) 351 463-37773 >>>>>> e-mail: michael.kluge at tu-dresden.de >>>>>> WWW: http://www.tu-dresden.de/zih >>>>>> >>>>>> _______________________________________________ >>>>>> Lustre-devel mailing list >>>>>> Lustre-devel at lists.lustre.org >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>>> >>>>> >>>>> -- >>>>> >>>>> Michael Kluge, M.Sc. >>>>> >>>>> Technische Universit?t Dresden >>>>> Center for Information Services and >>>>> High Performance Computing (ZIH) >>>>> D-01062 Dresden >>>>> Germany >>>>> >>>>> Contact: >>>>> Willersbau, Room WIL A 208 >>>>> Phone: (+49) 351 463-34217 >>>>> Fax: (+49) 351 463-37773 >>>>> e-mail: michael.kluge at tu-dresden.de >>>>> WWW: http://www.tu-dresden.de/zih >>>>> >>>> >>>> >>>> Cheers, Andreas >>>> -- >>>> Andreas Dilger >>>> Lustre Technical Lead >>>> Oracle Corporation Canada Inc. >>>> >>>> >>> >>> -- >>> >>> Michael Kluge, M.Sc. >>> >>> Technische Universit?t Dresden >>> Center for Information Services and >>> High Performance Computing (ZIH) >>> D-01062 Dresden >>> Germany >>> >>> Contact: >>> Willersbau, Room A 208 >>> Phone: (+49) 351 463-34217 >>> Fax: (+49) 351 463-37773 >>> e-mail: michael.kluge at tu-dresden.de >>> WWW: http://www.tu-dresden.de/zih >>> _______________________________________________ >>> Lustre-devel mailing list >>> Lustre-devel at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >> >> > > > -- > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih > <lustre2.png><lustre1.png>_______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100622/644c5c4f/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: lustre3.png Type: image/png Size: 153070 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100622/644c5c4f/attachment-0001.png
I think better to use client UUID instead of NID as client identification. Because in your''s case - you can''t separate info from two clients which run on same node. On Jun 22, 2010, at 19:12, Michael Kluge wrote:> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file. > > Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas? > > > Michael > > > > <lustre3.png> > > > > > http://lists.lustre.org/mailman/listinfo/lustre-devel-------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100623/f04b9bb3/attachment.html
Hi Alexey, thanks for the note. How do I do this if I only have a debug log? I see a field "cluuid+ref" in those lines of the log file that we are interested in. cluuid is probably "client uuid". This UUID is definitely associated to the node, that also owns the NID that appears in the same log line? What does "+ref" stands for? Michael Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov:> I think better to use client UUID instead of NID as client identification. Because in your''s case - you can''t separate info from two clients which run on same node. > > > On Jun 22, 2010, at 19:12, Michael Kluge wrote: > >> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file. >> >> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas? >> >> >> Michael >> >> >> >> <lustre3.png> >> >> >> >> >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > > > -------------------------------------- > Alexey Lyashkov > alexey.lyashkov at clusterstor.com > > > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100623/c50279fa/attachment.html
Hi Mchael, that easy to get from logs.>>00000100:00100000:0:1197446488.301126:0:28197:0:(service.c:721:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pi d:xid:nid:opc ll_ost_26:lustre-MDT0000-mdtlov_UUID+5:27309:266:12345-0 at lo:400>>so.. ll_ost_26 - thread name, which handle request, lustre-MDT0000-mdtlov_UUID - client which send request. +5 - number references to that request (unneed for you). 27309 - pid which handle request (or which send on client) 266 - xid - internal request id. 12345-0 - sender NID @lo - LND used to deliver request = @lo -> loopback, tcp -> tcp, @o2ib, - IB hardware, @ptl - cray portals. 400 - command, you can look to enums "*_cmd_t " to decode that command identifier (in that example - 400 OBD_PING) as other suggestions fix scaling, at you example only few clients has 100 requests in flight, but legend on bottom show up to 510. On Jun 23, 2010, at 15:50, Michael Kluge wrote:> Hi Alexey, > > thanks for the note. How do I do this if I only have a debug log? I see a field "cluuid+ref" in those lines of the log file that we are interested in. cluuid is probably "client uuid". This UUID is definitely associated to the node, that also owns the NID that appears in the same log line? What does "+ref" stands for? > > > Michael > > Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov: > >> I think better to use client UUID instead of NID as client identification. Because in your''s case - you can''t separate info from two clients which run on same node. >> >> >> On Jun 22, 2010, at 19:12, Michael Kluge wrote: >> >>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file. >>> >>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas? >>> >>> >>> Michael >>> >>> >>> >>> <lustre3.png> >>> >>> >>> >>> >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >> >> >> >> -------------------------------------- >> Alexey Lyashkov >> alexey.lyashkov at clusterstor.com >> >> >> >> > > > -- > > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100623/b973f159/attachment-0001.html
> 00000100:00100000:0:1197446488.301126:0:28197:0:(service.c:721:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pi > d:xid:nid:opc ll_ost_26:lustre-MDT0000-mdtlov_UUID+5:27309:266:12345-0 at lo:400 > >> > so.. > ll_ost_26 - thread name, which handle request, > lustre-MDT0000-mdtlov_UUID - client which send request. > +5 - number references to that request (unneed for you). > 27309 - pid which handle request (or which send on client) > 266 - xid - internal request id. > 12345-0 - sender NID > @lo - LND used to deliver request = @lo -> loopback, tcp -> tcp, @o2ib, - IB hardware, @ptl - cray portals. > 400 - command, you can look to enums "*_cmd_t " to decode that command identifier (in that example - 400 OBD_PING)OK, thanks. I''ll need to think about this as it might make sense to have one line per thread and not per UUID? Or is there already a 1:1 mapping?> as other suggestions fix scaling, at you example only few clients has 100 requests in flight, but legend on bottom show up to 510.Yes, did that while you were writing the mail ;) Definitely looks much better. Michael> > > On Jun 23, 2010, at 15:50, Michael Kluge wrote: > >> Hi Alexey, >> >> thanks for the note. How do I do this if I only have a debug log? I see a field "cluuid+ref" in those lines of the log file that we are interested in. cluuid is probably "client uuid". This UUID is definitely associated to the node, that also owns the NID that appears in the same log line? What does "+ref" stands for? >> >> >> Michael >> >> Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov: >> >>> I think better to use client UUID instead of NID as client identification. Because in your''s case - you can''t separate info from two clients which run on same node. >>> >>> >>> On Jun 22, 2010, at 19:12, Michael Kluge wrote: >>> >>>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file. >>>> >>>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas? >>>> >>>> >>>> Michael >>>> >>>> >>>> >>>> <lustre3.png> >>>> >>>> >>>> >>>> >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >>> >>> >>> -------------------------------------- >>> Alexey Lyashkov >>> alexey.lyashkov at clusterstor.com >>> >>> >>> >>> >> >> >> -- >> >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room WIL A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de >> WWW: http://www.tu-dresden.de/zih >> >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100623/c0c14636/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: client_rpcs_in_flight.jpg Type: image/jpg Size: 326747 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100623/c0c14636/attachment-0001.jpg
On 2010-06-23, at 05:50, Michael Kluge wrote:> thanks for the note. How do I do this if I only have a debug log? I see a field "cluuid+ref" in those lines of the log file that we are interested in. cluuid is probably "client uuid". This UUID is definitely associated to the node, that also owns the NID that appears in the same log line? What does "+ref" stands for?While Alexey''s comment is correct, in that there can be multiple Lustre client mounts on a single node, this is generally only used for testing. For client connections the UUID is just a random value like "f4726b1e-f7eb-479f-b163-784ea45adf32", so using the NID is much more useful to the viewer in the vast majority of cases. If you wanted to distinguish the multiple mounts from a single client it would be best to just do this internally by tracking both the NID and the UUID, but only printing the NID on the output. For the rare case where requests have the same NID but a different UUID you could show this as "NID:2", "NID:3" or similar. That preserves the distinction between client connections, while not making the output completely useless. Even better than plain numeric NIDs would be to do IP->hostname conversion for the case of TCP and IB LNDs, if this works.> Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov: > >> I think better to use client UUID instead of NID as client identification. Because in your''s case - you can''t separate info from two clients which run on same node. >> >> >> On Jun 22, 2010, at 19:12, Michael Kluge wrote: >> >>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file. >>> >>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas? >>> >>> >>> Michael >>> >>> >>> >>> <lustre3.png> >>> >>> >>> >>> >>> http://lists.lustre.org/mailman/listinfo/lustre-devel >> >> >> >> -------------------------------------- >> Alexey Lyashkov >> alexey.lyashkov at clusterstor.com >> >> >> >> > > > -- > > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-develCheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
> While Alexey''s comment is correct, in that there can be multiple Lustre client mounts on a single node, this is generally only used for testing. For client connections the UUID is just a random value like "f4726b1e-f7eb-479f-b163-784ea45adf32", so using the NID is much more useful to the viewer in the vast majority of cases.Well, what I can do with OTF and Vampir is to use a process hierarchy. So for each NID I can have a couple of UUID as ''child processes''. Like the way one looks at hybrid MPI & OpenMP programs. Where we have MPI processes and treat OpenMP threads as child processes.> If you wanted to distinguish the multiple mounts from a single client it would be best to just do this internally by tracking both the NID and the UUID, but only printing the NID on the output. For the rare case where requests have the same NID but a different UUID you could show this as "NID:2", "NID:3" or similar. That preserves the distinction between client connections, while not making the output completely useless.Yes, that fits into the statement above.> Even better than plain numeric NIDs would be to do IP->hostname conversion for the case of TCP and IB LNDs, if this works.Well, typically I only have the debug log. Which might be incomplete. I don''t think I have something that can do this conversion reliably? Michael>> Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov: >> >>> I think better to use client UUID instead of NID as client identification. Because in your''s case - you can''t separate info from two clients which run on same node. >>> >>> >>> On Jun 22, 2010, at 19:12, Michael Kluge wrote: >>> >>>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file. >>>> >>>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas? >>>> >>>> >>>> Michael >>>> >>>> >>>> >>>> <lustre3.png> >>>> >>>> >>>> >>>> >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>> >>> >>> >>> -------------------------------------- >>> Alexey Lyashkov >>> alexey.lyashkov at clusterstor.com >>> >>> >>> >>> >> >> >> -- >> >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room WIL A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de >> WWW: http://www.tu-dresden.de/zih >> >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20100624/77e1f513/attachment.html
Hi Robert, could please sent me the command how to generate such a log? lctl debug_daemon start /tmp/$HOSTNAME sysctl -w lnet.debug=+rpctrace Still produces some kind of binary format (at least with my 1.8.3) which looks different from what you sent me. Regards, Michael> > Right now I do not have big clusters available to generate the trace. > > I will see what I can do here. > > If ~600 clients is big enough we could generate that on Hyperion. > > robert > > > > > Thanks > > WangDi > >> > >> > >> > >>>> > >>>> > >>> > >> > >> ------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> Lustre-devel mailing list > >> Lustre-devel at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-devel > >> > > > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5997 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100922/c7d27756/attachment.bin
Rahul Deshmukh
2010-Sep-22 13:57 UTC
[Lustre-discuss] [Lustre-devel] Lustre RPC visualization
Michael Kluge wrote:> Hi Robert, > > could please sent me the command how to generate such a log? > > lctl debug_daemon start /tmp/$HOSTNAME > sysctl -w lnet.debug=+rpctraceI think this command will help you. lctl debug_file /tmp/$HOSTNAME > /tmp/$HOSTNAME_readable> > Still produces some kind of binary format (at least with my 1.8.3) which > looks different from what you sent me. > > > Regards, MichaelRegards, Rahul.> >>> Right now I do not have big clusters available to generate the trace. >>> I will see what I can do here. >> If ~600 clients is big enough we could generate that on Hyperion. >> >> robert >> >>> Thanks >>> WangDi >>>> >>>> >>>>>> >>>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Lustre-devel mailing list >>>> Lustre-devel at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel >>>> >> > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel
On 2010-09-22, at 06:46, Michael Kluge wrote:> could please sent me the command how to generate such a log? > > lctl debug_daemon start /tmp/$HOSTNAME > sysctl -w lnet.debug=+rpctrace > > Still produces some kind of binary format (at least with my 1.8.3) which > looks different from what you sent me.There are a few different ways to generate a Lustre debug log: - "lctl dk [filename]" (short for "lctl debug_kernel [filename]") will dump the current debug buffer to [filename], or stdout if unspecified. It will decode the binary debug log into ASCII at this time - "lctl debug_daemon {filename}" starts the binary kernel debug log to be written to {filename} as the debug pages are filled - in case of LBUG/LASSERT it also dumps the binary debug log to the file /tmp/lustre.log.{timestamp} (unless specified otherwise) In the last two cases, it is necessary to run "lctl debug_file {infile} [outfile]" decodes binary the input file to the ASCII output file. This should preferably be run on the system where the binary file was generated, though generally the binary format is only changing between systems of different endianness. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.