Satoshi Isono
2011-Feb-11 03:16 UTC
[Lustre-discuss] How to detect process owner on client
Dear members, I am looking into the way which can detect userid or jobid on the Lustre client. Assumed the following condition; 1) Any users run any jobs through scheduler like PBS Pro, LSF or SGE. 2) A users processes occupy Lustre I/O. 3) Some Lustre servers (MDS?/OSS?) can detect high I/O stress on each server. 4) But Lustre server cannot make the mapping between jobid/userid and Lustre I/O processes having heavy stress, because there aren''t userid on Lustre servers. 5) I expect that Lustre can monitor and can make the mapping. 6) If possible for (5), we can make a script which launches scheduler command like as qdel. 7) Heavy users job will be killed by job scheduler. I want (5) for Lustre capability, but I guess current Lustre 1.8 cannot perform (5). On the other hand, in order to map Lustre process to userid/jobid, are there any ways using like rpctrace or nid stats? Can you please your advice or comments? Regards, Satoshi Isono
Michael Kluge
2011-Feb-11 06:18 UTC
[Lustre-discuss] How to detect process owner on client
Hi Satoshi, I am not aware of any possibility to map the current statistics in /proc to UIDs. But I might be wrong. We had a script like this a while ago which did not kill the I/O intensive processes but told us the PIDs. What we did is collecting for ~30 seconds the number of I/O operations per node via /proc on all nodes. Then we attached an strace process to each process on nodes with heavy I/O load. This strace intercepted only the I/O calls and wrote one log file per process. If this strace is running for the same amount of time for each process on a host, you just need to sort the log files for size. Regards, Michael Am Donnerstag, den 10.02.2011, 21:16 -0600 schrieb Satoshi Isono:> Dear members, > > I am looking into the way which can detect userid or jobid on the Lustre client. Assumed the following condition; > > 1) Any users run any jobs through scheduler like PBS Pro, LSF or SGE. > 2) A users processes occupy Lustre I/O. > 3) Some Lustre servers (MDS?/OSS?) can detect high I/O stress on each server. > 4) But Lustre server cannot make the mapping between jobid/userid and Lustre I/O processes having heavy stress, because there aren''t userid on Lustre servers. > 5) I expect that Lustre can monitor and can make the mapping. > 6) If possible for (5), we can make a script which launches scheduler command like as qdel. > 7) Heavy users job will be killed by job scheduler. > > I want (5) for Lustre capability, but I guess current Lustre 1.8 cannot perform (5). On the other hand, in order to map Lustre process to userid/jobid, are there any ways using like rpctrace or nid stats? Can you please your advice or comments? > > Regards, > Satoshi Isono > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5973 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110211/66177c5a/attachment.bin
Andreas Dilger
2011-Feb-11 16:34 UTC
[Lustre-discuss] How to detect process owner on client
On 2011-02-10, at 23:18, Michael Kluge wrote:> I am not aware of any possibility to map the current statistics in /proc > to UIDs. But I might be wrong. We had a script like this a while ago > which did not kill the I/O intensive processes but told us the PIDs. > > What we did is collecting for ~30 seconds the number of I/O operations > per node via /proc on all nodes. Then we attached an strace process to > each process on nodes with heavy I/O load. This strace intercepted only > the I/O calls and wrote one log file per process. If this strace is > running for the same amount of time for each process on a host, you just > need to sort the log files for size.On the OSS and MDS nodes there are per-client statistics that allow this kind of tracking. They can be seen in /proc/fs/lustre/obdfilter/*/exports/*/stats for detailed information (e.g. broken down by RPC type, bytes read/written), or /proc/fs/lustre/ost/OSS/*/req_history to just get a dump of the recent RPCs sent by each client. A little script was discussed in the thread "How to determine which lustre clients are loading filesystem" (2010-07-08):> Another way that I heard some sites were doing this is to use the "rpc history". They may already have a script to do this, but the basics are below: > > oss# lctl set_param ost.OSS.*.req_buffer_history_max=10240 > {wait a few seconds to collect some history} > oss# lctl get_param ost.OSS.*.req_history > > This will give you a list of the past (up to) 10240 RPCs for the "ost_io" RPC service, which is what you are observing the high load on: > > 3436037:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957534353:448:Complete:1278612656:0s(-6s) opc 3 > 3436038:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536190:448:Complete:1278615489:1s(-41s) opc 3 > 3436039:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536193:448:Complete:1278615490:0s(-6s) opc 3 > > This output is in the format: > > identifier:target_nid:source_nid:rpc_xid:rpc_size:rpc_status:arrival_time:service_time(deadline) opcode > > Using some shell scripting, one can find the clients sending the most RPC requests: > > oss# lctl get_param ost.OSS.*.req_history | tr ":" " " | cut -d" " -f3,9,10 | sort | uniq -c | sort -nr | head -20 > > > 3443 12345-192.168.20.159 at tcp opc 3 > 1215 12345-192.168.20.157 at tcp opc 3 > 121 12345-192.168.20.157 at tcp opc 4 > > This will give you a sorted list of the top 20 clients that are sending the most RPCs to the ost and ost_io services, along with the operation being done (3 = OST_READ, 4 = OST_WRITE, etc. see lustre/include/lustre/lustre_idl.h).> Am Donnerstag, den 10.02.2011, 21:16 -0600 schrieb Satoshi Isono: >> Dear members, >> >> I am looking into the way which can detect userid or jobid on the Lustre client. Assumed the following condition; >> >> 1) Any users run any jobs through scheduler like PBS Pro, LSF or SGE. >> 2) A users processes occupy Lustre I/O. >> 3) Some Lustre servers (MDS?/OSS?) can detect high I/O stress on each server. >> 4) But Lustre server cannot make the mapping between jobid/userid and Lustre I/O processes having heavy stress, because there aren''t userid on Lustre servers. >> 5) I expect that Lustre can monitor and can make the mapping. >> 6) If possible for (5), we can make a script which launches scheduler command like as qdel. >> 7) Heavy users job will be killed by job scheduler. >> >> I want (5) for Lustre capability, but I guess current Lustre 1.8 cannot perform (5). On the other hand, in order to map Lustre process to userid/jobid, are there any ways using like rpctrace or nid stats? Can you please your advice or comments? >> >> Regards, >> Satoshi Isono >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > -- > > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
Michael Kluge
2011-Feb-11 18:09 UTC
[Lustre-discuss] How to detect process owner on client
But it does not give you PIDs or user names? Or is there a way to find these with standard lustre tools? Michael Am 11.02.2011 17:34, schrieb Andreas Dilger:> On 2011-02-10, at 23:18, Michael Kluge wrote: >> I am not aware of any possibility to map the current statistics in /proc >> to UIDs. But I might be wrong. We had a script like this a while ago >> which did not kill the I/O intensive processes but told us the PIDs. >> >> What we did is collecting for ~30 seconds the number of I/O operations >> per node via /proc on all nodes. Then we attached an strace process to >> each process on nodes with heavy I/O load. This strace intercepted only >> the I/O calls and wrote one log file per process. If this strace is >> running for the same amount of time for each process on a host, you just >> need to sort the log files for size. > > On the OSS and MDS nodes there are per-client statistics that allow this kind of tracking. They can be seen in /proc/fs/lustre/obdfilter/*/exports/*/stats for detailed information (e.g. broken down by RPC type, bytes read/written), or /proc/fs/lustre/ost/OSS/*/req_history to just get a dump of the recent RPCs sent by each client. > > A little script was discussed in the thread "How to determine which lustre clients are loading filesystem" (2010-07-08): > >> Another way that I heard some sites were doing this is to use the "rpc history". They may already have a script to do this, but the basics are below: >> >> oss# lctl set_param ost.OSS.*.req_buffer_history_max=10240 >> {wait a few seconds to collect some history} >> oss# lctl get_param ost.OSS.*.req_history >> >> This will give you a list of the past (up to) 10240 RPCs for the "ost_io" RPC service, which is what you are observing the high load on: >> >> 3436037:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957534353:448:Complete:1278612656:0s(-6s) opc 3 >> 3436038:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536190:448:Complete:1278615489:1s(-41s) opc 3 >> 3436039:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536193:448:Complete:1278615490:0s(-6s) opc 3 >> >> This output is in the format: >> >> identifier:target_nid:source_nid:rpc_xid:rpc_size:rpc_status:arrival_time:service_time(deadline) opcode >> >> Using some shell scripting, one can find the clients sending the most RPC requests: >> >> oss# lctl get_param ost.OSS.*.req_history | tr ":" " " | cut -d" " -f3,9,10 | sort | uniq -c | sort -nr | head -20 >> >> >> 3443 12345-192.168.20.159 at tcp opc 3 >> 1215 12345-192.168.20.157 at tcp opc 3 >> 121 12345-192.168.20.157 at tcp opc 4 >> >> This will give you a sorted list of the top 20 clients that are sending the most RPCs to the ost and ost_io services, along with the operation being done (3 = OST_READ, 4 = OST_WRITE, etc. see lustre/include/lustre/lustre_idl.h). > > >> Am Donnerstag, den 10.02.2011, 21:16 -0600 schrieb Satoshi Isono: >>> Dear members, >>> >>> I am looking into the way which can detect userid or jobid on the Lustre client. Assumed the following condition; >>> >>> 1) Any users run any jobs through scheduler like PBS Pro, LSF or SGE. >>> 2) A users processes occupy Lustre I/O. >>> 3) Some Lustre servers (MDS?/OSS?) can detect high I/O stress on each server. >>> 4) But Lustre server cannot make the mapping between jobid/userid and Lustre I/O processes having heavy stress, because there aren''t userid on Lustre servers. >>> 5) I expect that Lustre can monitor and can make the mapping. >>> 6) If possible for (5), we can make a script which launches scheduler command like as qdel. >>> 7) Heavy users job will be killed by job scheduler. >>> >>> I want (5) for Lustre capability, but I guess current Lustre 1.8 cannot perform (5). On the other hand, in order to map Lustre process to userid/jobid, are there any ways using like rpctrace or nid stats? Can you please your advice or comments? >>> >>> Regards, >>> Satoshi Isono >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> >> -- >> >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de >> WWW: http://www.tu-dresden.de/zih >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer > Whamcloud, Inc. > > > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih
Andreas Dilger
2011-Feb-11 20:59 UTC
[Lustre-discuss] How to detect process owner on client
On 2011-02-11, at 11:09, Michael Kluge wrote:> But it does not give you PIDs or user names? Or is there a way to find these with standard lustre tools?I think for most purposes, the req_history should be enough to identify the problem node and then a simple "ps" or looking at the job scheduler for that node would identify the problem. However, if process-level tracking is needed, this is also possible to track on either the client or server, using the RPCTRACE functionality in the Lustre kernel debug logs. client# lctl set_param debug=+rpctrace {wait to collect some logs} client# lctl dk /tmp/debug.cli client# less /tmp/debug.cli : : 00000100:00100000:1:1297449409.192077:0:32392:0:(client.c:2095:ptlrpc_queue_wait ()) Sending RPC pname:cluuid:pid:xid:nid:opc ls:028fd87f-1865-3915-a864-fc829a4d 7a4c:32392:x1359928498575499:0 at lo:37 : This lists the names of the fields being printed in the RPCTRACE message. We are particularly interested in the "pname" and "pid" fields, and maybe "opc" (opcode). This shows that "ls", pid 32392 is sending an opcode 37 request (MDT_READPAGE, per lustre/include/lustre/lustre_idl.h). This RPC is identified with xid "x1359928498575499" on client UUID "028fd87f-1865-3915-a864-fc829a4d7a4c". The xid is not guaranteed to be unique between clients, but is relatively relatively unique in most debug logs. On the server we can see the same RPC in the debug logs: server# lctl dk /tmp/debug.mds server# grep ":x1359928498575499:" /tmp/debug.mds 00000100:00100000:1:1297449409.192178:0:5174:0:(service.c:1276:ptlrpc_server_log _handling_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_mdt_rdpg_0 0:028fd87f-1865-3915-a864-fc829a4d7a4c+6:32392:x1359928498575499:12345-0 at lo:37 Here we see that the RPC for this xid and client pid was processed by a service thread, but the server-side debug log does not have the client process name, but rather the process name of the thread handling the RPC. In any case, it is definitely possible to track down this information just from the server in a variety of ways. Adding a job identifier and possibly a rank # to the Lustre RPC messages is definitely something that we''ve thought about, but it would need help from userspace (MPI, job scheduler, etc) in order to be useful, so it hasn''t been done yet.> Am 11.02.2011 17:34, schrieb Andreas Dilger: >> >> On the OSS and MDS nodes there are per-client statistics that allow this kind of tracking. They can be seen in /proc/fs/lustre/obdfilter/*/exports/*/stats for detailed information (e.g. broken down by RPC type, bytes read/written), or /proc/fs/lustre/ost/OSS/*/req_history to just get a dump of the recent RPCs sent by each client. >> >> A little script was discussed in the thread "How to determine which lustre clients are loading filesystem" (2010-07-08): >> >>> Another way that I heard some sites were doing this is to use the "rpc history". They may already have a script to do this, but the basics are below: >>> >>> oss# lctl set_param ost.OSS.*.req_buffer_history_max=10240 >>> {wait a few seconds to collect some history} >>> oss# lctl get_param ost.OSS.*.req_history >>> >>> This will give you a list of the past (up to) 10240 RPCs for the "ost_io" RPC service, which is what you are observing the high load on: >>> >>> 3436037:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957534353:448:Complete:1278612656:0s(-6s) opc 3 >>> 3436038:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536190:448:Complete:1278615489:1s(-41s) opc 3 >>> 3436039:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536193:448:Complete:1278615490:0s(-6s) opc 3 >>> >>> This output is in the format: >>> >>> identifier:target_nid:source_nid:rpc_xid:rpc_size:rpc_status:arrival_time:service_time(deadline) opcode >>> >>> Using some shell scripting, one can find the clients sending the most RPC requests: >>> >>> oss# lctl get_param ost.OSS.*.req_history | tr ":" " " | cut -d" " -f3,9,10 | sort | uniq -c | sort -nr | head -20 >>> >>> >>> 3443 12345-192.168.20.159 at tcp opc 3 >>> 1215 12345-192.168.20.157 at tcp opc 3 >>> 121 12345-192.168.20.157 at tcp opc 4 >>> >>> This will give you a sorted list of the top 20 clients that are sending the most RPCs to the ost and ost_io services, along with the operation being done (3 = OST_READ, 4 = OST_WRITE, etc. see lustre/include/lustre/lustre_idl.h).Cheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
Satoshi Isono
2011-Feb-14 10:04 UTC
[Lustre-discuss] How to detect process owner on client
Dear Andreas, Michael, Thanks for your messages. They are very useful for me. I try to do test. Regards, Satoshi Isono -----Original Message----- From: Andreas Dilger [mailto:adilger at whamcloud.com] Sent: Saturday, February 12, 2011 5:59 AM To: Michael Kluge Cc: Satoshi Isono; lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] How to detect process owner on client On 2011-02-11, at 11:09, Michael Kluge wrote:> But it does not give you PIDs or user names? Or is there a way to find these with standard lustre tools?I think for most purposes, the req_history should be enough to identify the problem node and then a simple "ps" or looking at the job scheduler for that node would identify the problem. However, if process-level tracking is needed, this is also possible to track on either the client or server, using the RPCTRACE functionality in the Lustre kernel debug logs. client# lctl set_param debug=+rpctrace {wait to collect some logs} client# lctl dk /tmp/debug.cli client# less /tmp/debug.cli : : 00000100:00100000:1:1297449409.192077:0:32392:0:(client.c:2095:ptlrpc_queue_wait ()) Sending RPC pname:cluuid:pid:xid:nid:opc ls:028fd87f-1865-3915-a864-fc829a4d 7a4c:32392:x1359928498575499:0 at lo:37 : This lists the names of the fields being printed in the RPCTRACE message. We are particularly interested in the "pname" and "pid" fields, and maybe "opc" (opcode). This shows that "ls", pid 32392 is sending an opcode 37 request (MDT_READPAGE, per lustre/include/lustre/lustre_idl.h). This RPC is identified with xid "x1359928498575499" on client UUID "028fd87f-1865-3915-a864-fc829a4d7a4c". The xid is not guaranteed to be unique between clients, but is relatively relatively unique in most debug logs. On the server we can see the same RPC in the debug logs: server# lctl dk /tmp/debug.mds server# grep ":x1359928498575499:" /tmp/debug.mds 00000100:00100000:1:1297449409.192178:0:5174:0:(service.c:1276:ptlrpc_server_log _handling_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_mdt_rdpg_0 0:028fd87f-1865-3915-a864-fc829a4d7a4c+6:32392:x1359928498575499:12345-0 at lo:37 Here we see that the RPC for this xid and client pid was processed by a service thread, but the server-side debug log does not have the client process name, but rather the process name of the thread handling the RPC. In any case, it is definitely possible to track down this information just from the server in a variety of ways. Adding a job identifier and possibly a rank # to the Lustre RPC messages is definitely something that we''ve thought about, but it would need help from userspace (MPI, job scheduler, etc) in order to be useful, so it hasn''t been done yet.> Am 11.02.2011 17:34, schrieb Andreas Dilger: >> >> On the OSS and MDS nodes there are per-client statistics that allow this kind of tracking. They can be seen in /proc/fs/lustre/obdfilter/*/exports/*/stats for detailed information (e.g. broken down by RPC type, bytes read/written), or /proc/fs/lustre/ost/OSS/*/req_history to just get a dump of the recent RPCs sent by each client. >> >> A little script was discussed in the thread "How to determine which lustre clients are loading filesystem" (2010-07-08): >> >>> Another way that I heard some sites were doing this is to use the "rpc history". They may already have a script to do this, but the basics are below: >>> >>> oss# lctl set_param ost.OSS.*.req_buffer_history_max=10240 >>> {wait a few seconds to collect some history} >>> oss# lctl get_param ost.OSS.*.req_history >>> >>> This will give you a list of the past (up to) 10240 RPCs for the "ost_io" RPC service, which is what you are observing the high load on: >>> >>> 3436037:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957534353:448:Complete:1278612656:0s(-6s) opc 3 >>> 3436038:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536190:448:Complete:1278615489:1s(-41s) opc 3 >>> 3436039:192.168.20.1 at tcp:12345-192.168.20.159 at tcp:x1340648957536193:448:Complete:1278615490:0s(-6s) opc 3 >>> >>> This output is in the format: >>> >>> identifier:target_nid:source_nid:rpc_xid:rpc_size:rpc_status:arrival_time:service_time(deadline) opcode >>> >>> Using some shell scripting, one can find the clients sending the most RPC requests: >>> >>> oss# lctl get_param ost.OSS.*.req_history | tr ":" " " | cut -d" " -f3,9,10 | sort | uniq -c | sort -nr | head -20 >>> >>> >>> 3443 12345-192.168.20.159 at tcp opc 3 >>> 1215 12345-192.168.20.157 at tcp opc 3 >>> 121 12345-192.168.20.157 at tcp opc 4 >>> >>> This will give you a sorted list of the top 20 clients that are sending the most RPCs to the ost and ost_io services, along with the operation being done (3 = OST_READ, 4 = OST_WRITE, etc. see lustre/include/lustre/lustre_idl.h).Cheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
On 02/10/2011 09:16 PM, Satoshi Isono wrote:> Dear members, > > I am looking into the way which can detect userid or jobid on the Lustre client. Assumed the following condition; > > 1) Any users run any jobs through scheduler like PBS Pro, LSF or SGE. > 2) A users processes occupy Lustre I/O. > 3) Some Lustre servers (MDS?/OSS?) can detect high I/O stress on each server. > 4) But Lustre server cannot make the mapping between jobid/userid and Lustre I/O processes having heavy stress, because there aren''t userid on Lustre servers. > 5) I expect that Lustre can monitor and can make the mapping. > 6) If possible for (5), we can make a script which launches scheduler command like as qdel. > 7) Heavy users job will be killed by job scheduler. > > I want (5) for Lustre capability, but I guess current Lustre 1.8 cannot perform (5). On the other hand, in order to map Lustre process to userid/jobid, are there any ways using like rpctrace or nid stats? Can you please your advice or comments?I''ve written a utility called lltop which gathers I/O statistics from Lustre servers, along with job assignment data from cluster batch schedulers, to give a job-by-job accounting of filesystem load. Here''s its output with names changed to protect the innocent: $ sudo tacc_lltop work JOBID WR_MB RD_MB REQS OWNER WORKDIR 1823815 2101 0 4176 al /work/000/al/job1 1823060 774 0 1570 bob /work/000/bob/fftw 1823634 323 3 3244 chas /work/000/chas/boltzeq 1823768 289 0 5108 deb /work/000/deb/mesh-08 1823085 55 0 110 ed /work/000/ed/jumble login3 18 3 2961 We use it on several systems, only with SGE so far, but it''s hookable to other schedulers. See https://github.com/jhammond/lltop for source and documentation. Best, John -- John L. Hammond, Ph.D. TACC, The University of Texas at Austin jhammond at tacc.utexas.edu
Ashley Pittman
2011-Feb-16 08:56 UTC
[Lustre-discuss] How to detect process owner on client
On 15 Feb 2011, at 22:17, John Hammond wrote:> I''ve written a utility called lltop which gathers I/O statistics from > Lustre servers, along with job assignment data from cluster batch > schedulers, to give a job-by-job accounting of filesystem load. Here''s > its output with names changed to protect the innocent: > > $ sudo tacc_lltop work > JOBID WR_MB RD_MB REQS OWNER WORKDIR > 1823815 2101 0 4176 al /work/000/al/job1 > 1823060 774 0 1570 bob /work/000/bob/fftw > 1823634 323 3 3244 chas /work/000/chas/boltzeq > 1823768 289 0 5108 deb /work/000/deb/mesh-08 > 1823085 55 0 110 ed /work/000/ed/jumble > login3 18 3 2961 > > We use it on several systems, only with SGE so far, but it''s hookable to > other schedulers. > > See https://github.com/jhammond/lltop for source and documentation.That looks very useful! We won''t be able to use this directly at DDN because we don''t integrate with the right bits of the stack but I''ll be sure to make sure our HPC customers hear about it if they are looking for this kind of data. I also have some code which would work with other schedulers if people are interested. Ashley,
Sebastien Piechurski
2011-Mar-01 11:40 UTC
[Lustre-discuss] How to detect process owner on client
Hi Satoshi, I don''t have a complete solution to your problem, but I have written a script which lets me find at least the lustre client responsible for the bad I/Os. We are using PBSPro with nodes set as job-exclusive, so determining the job and user is then a lot more easier. The script does that: Dump the attributes in /proc/fs/lustre/obdfilter/*/exports/*/stats Sleep a few seconds (tunable) Dump again all the attributes, and then use diff to see which clients changed their IO count. These changes are then sorted numerically. The final result is a list of IP adresses with the number of IO done during the sleep period for each. The last one in the list (because it is sorted) points to the responsible client(s). Hope the method helps.> -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of > Satoshi Isono > Sent: vendredi 11 f?vrier 2011 04:16 > To: lustre-discuss at lists.lustre.org > Subject: [Lustre-discuss] How to detect process owner on client > > Dear members, > > I am looking into the way which can detect userid or jobid on > the Lustre client. Assumed the following condition; > > 1) Any users run any jobs through scheduler like PBS Pro, LSF or SGE. > 2) A users processes occupy Lustre I/O. > 3) Some Lustre servers (MDS?/OSS?) can detect high I/O > stress on each server. > 4) But Lustre server cannot make the mapping between > jobid/userid and Lustre I/O processes having heavy stress, > because there aren''t userid on Lustre servers. > 5) I expect that Lustre can monitor and can make the mapping. > 6) If possible for (5), we can make a script which launches > scheduler command like as qdel. > 7) Heavy users job will be killed by job scheduler. > > I want (5) for Lustre capability, but I guess current Lustre > 1.8 cannot perform (5). On the other hand, in order to map > Lustre process to userid/jobid, are there any ways using like > rpctrace or nid stats? Can you please your advice or comments? > > Regards, > Satoshi Isono > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >