Peter J. Braam
2006-Jul-13 17:37 UTC
[Lustre-discuss] RE: Client based IO extent size patch
Several people have made enquiries about progress with the statistics. Here is a snippet of the most recent discussion. - Peter - Hi Kalpak, Very happy to see this so quickly. This looks good I think. I have a few questions: 1. could this easily be tracked per process? See in the attached email what I mentioned to cray the other day. Perhaps we could have two files of the kind you have built: one as you have it now, one which tracks the last 10 processes individually (so it would have up to 10x more lines perhaps). 2. make sure the statistics can be cleared easily by truncating or echoing something to the proc file. Also, did you make sure you use the available macros to keep the implementation simple. Please discuss this with beaver@clusterfs.com (not with me really) - they are the "team" in CFS that this effort resides under. - Peter - > -----Original Message----- > From: Kalpak Shah > Sent: Thursday, July 13, 2006 8:48 AM > To: Andreas Dilger; Peter J. Braam > Cc: bojenic; cfsproj > Subject: Client based IO extent size patch > > Hi Peter, Andreas, > > I am implementing part 1. of IO statistics in SOW3. I have > implemented the client based IO extent patch. A new file in > "rw_extents_stats" has been created in the llite folder in > /proc. It sorts the calls into buckets according to their sizes. > > Following is a sample output for the same. > > > [root@phobes lustre]# cat /proc/fs/lustre/llite/fs0/rw_extent_stats > snapshot_time: 1152814463.787173 (secs.usecs) > > read write > extents calls % cum % | calls % cum % > 0K - 4K: 3 50 50 | 0 0 0 > 4K - 8K: 0 0 50 | 0 0 0 > 8K - 16K: 0 0 50 | 1 25 25 > 16K - 32K: 0 0 50 | 0 0 25 > 32K - 64K: 0 0 50 | 0 0 25 > 64K - 128K: 0 0 50 | 0 0 25 > 128K - 256K: 0 0 50 | 0 0 25 > 256K - 512K: 0 0 50 | 0 0 25 > 512K - 1024K: 0 0 50 | 0 0 25 > 1024K - 2048K: 0 0 50 | 0 0 25 > 2048K - 4096K: 0 0 50 | 0 0 25 > 4096K - 8192K: 0 0 50 | 0 0 25 > 8192K - 16384K: 0 0 50 | 0 0 25 > 16384K - 32768K: 3 50 100 | 3 75 100 > > > The patch for the same is attached. (client-io-stats.patch). > Kindly let me know if this is sufficient or further changes > are required. > > Regards, > Kalpak. > ======================================= We have asked one of our teams to work on building more IO statistics into Lustre. I have just asked and perhaps we can get a plan for the read-write statistics in the coming week. Iirc I suggested something along the following lines to them: NID-PID #count R/W/NCR/NCW <4K 4-16K 16-64K 64-256K 256-1024K>1024KWhere: 1. There is a line for each process on a Lustre client node (keep say 1000 nidpids by default) 2. Each bucket (the columns with "Ks") would contain a % of the total number of R or W requests made. 3. For each nid-pid there would be 4 lines, documenting - Writes - Reads - NCR/NCW Logically (ie within an object) non-contiguous subsequent reads/writes where the first of a pair lies in the bucket. This would pretty much capture the IO behavior of an application. It may turn out to be hard to record this on the OST because the OST is not aware of the PID, but then normally there are only a few jobs per client running. It could be attractive to split this out further by object. But a bit of experience will show what we can use most easily. - Peter -