We have a use case for walking the file system where just path information is not sufficient, we use an application agedu ( http://www.chiark.greenend.org.uk/~sgtatham/agedu/ ) which apart from walking the file can take a text file as its data source and with this we can see where old data is and when it was last accessed in an easy to understand graphical form. An additional use case would be a file system where we quota disc space via groups, and finding directories with out the setguid bit set so that permission can be enforced would be useful ( although implementing -perm would be preferable and could be investigated independantly). I have been thinking about how hard it would be to extend lfs find to provide additional information. I thought I would have a look at the code a while ago and I have thrown something together. I would like to know if a) This is a reasonable approach. b) if the stat structure in param passed into cb_find_init is known always to be valid, is there a better way of getting the size back from the MDS than doing a stat ? It would be used thus: ./lfs find /lustre/scratch101/blastdb/Supported/ --ls | head 147456 1328639981 1327687808 1327687808 356 0 40755 /lustre/scratch101/blastdb/Supported/ 222153834 1327502260 1324495116 1324495116 356 102 100644 /lustre/scratch101/blastdb/Supported/embl_est_inv-3.nhr 248 1328573050 1326893658 1326893658 356 102 100644 /lustre/scratch101/blastdb/Supported/nr.pal 1000000120 1323250179 1324499009 1324499009 356 102 100644 /lustre/scratch101/blastdb/Supported/embl_gss_mus-2 11936568 1323250179 1324499103 1324499103 356 102 100644 /lustre/scratch101/blastdb/Supported/embl_gss_pln-1.xnt 4096 1328692476 1328544166 1328544394 356 102 40755 /lustre/scratch101/blastdb/Supported/uniprot_index ls -la /lustre/scratch101/blastdb/Supported/embl_est_inv-3.nhr -rw-r--r-- 1 pubseq crontab 222153834 2011-12-21 19:18 /lustre/scratch101/blastdb/Supported/embl_est_inv-3.nhr I believe this patch set is based off lustre-1.8.7wc1> 2043,2058c2043,2047 > < if (param->ls) { > < if (param->have_fileinfo == 0) { > < /*if (st->st_size == 0 ) {*/ > < /* Here I would like to pull > < * back the size guess back from > < * the MDS but I need a hint :) */ > < stat(path,st); > < } > < llapi_printf(LLAPI_MSG_NORMAL, "%d %d %d %d %d %d %03o %s",st->st_size,st->st_atime,st->st_mtime,st->st_ctime,st->st_uid,st->st_gid,st->st_mode, path); > < } > < else > < llapi_printf(LLAPI_MSG_NORMAL, "%s", path); > < if (param->zeroend) > < llapi_printf(LLAPI_MSG_NORMAL, "%c", ''\0''); > < else > < llapi_printf(LLAPI_MSG_NORMAL, "\n"); > --- > > llapi_printf(LLAPI_MSG_NORMAL, "%s", path); > > if (param->zeroend) > > llapi_printf(LLAPI_MSG_NORMAL, "%c", ''\0''); > > else > > llapi_printf(LLAPI_MSG_NORMAL, "\n");diff ./lustre/utils/lfs.c ../../lustre_source1/lustre-1.8.7wc1/lustre/utils/lfs.c 133c133 < " [--maxdepth|-D N] [[!] --name|-n <pattern>] [--print0|-P] [--ls]\n" ---> " [--maxdepth|-D N] [[!] --name|-n <pattern>] [--print0|-P]\n"452d451 < #define FIND_LS 4 477d475 < {"ls", no_argument, 0, FIND_LS}, 595,598d592 < case FIND_LS: < new_fashion = 1; < param.ls = 1; < break; diff ./lustre/include/lustre/liblustreapi.h ../../lustre_source1/lustre-1.8.7wc1/lustre/include/lustre/liblustreapi.h 121d120 < ls:1, -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120217/e1be8d78/attachment.html
James, Depending on how often you need the information updated, and how perfectly accurate it has to be, you may find that just using normal "find" on a snapshot of the MDT is more efficient. (IIRC that size-on-mds is relatively close these days.) If you have your MDT on LVM for making backups (for example), you could also periodically run your find on the same snapshot. With your "lfs find" approach, I think you will have to decide if you want to trust the size-on-mds anyway or query the OSTs which will slow things down a lot. There is more than one way to do it. -Nathan PS: I love the color-coded bar graph approach... very informative in a compact display! On 02/17/2012 02:00 AM, James Beal wrote:> We have a use case for walking the file system where just path > information is not sufficient, we use an application agedu > ( http://www.chiark.greenend.org.uk/~sgtatham/agedu/ ) which apart from > walking the file can take a text file as its data source and with this > we can see where old data is and when it was last accessed in an easy to > understand graphical form. An additional use case would be a file system > where we quota disc space via groups, and finding directories with out > the setguid bit set so that permission can be enforced would be useful ( > although implementing -perm would be preferable and could be > investigated independantly). > > I have been thinking about how hard it would be to extend lfs find to > provide additional information. I thought I would have a look at the > code a while ago and I have thrown something together. > > I would like to know if > > a) This is a reasonable approach. > b) if the stat structure in param passed into cb_find_init is known > always to be valid, is there a better way of getting the size back from > the MDS than doing a stat ? > > It would be used thus: > > ./lfs find /lustre/scratch101/blastdb/Supported/ --ls | head > 147456 1328639981 1327687808 1327687808 356 0 40755 > /lustre/scratch101/blastdb/Supported/ > 222153834 1327502260 1324495116 1324495116 356 102 100644 > /lustre/scratch101/blastdb/Supported/embl_est_inv-3.nhr > 248 1328573050 1326893658 1326893658 356 102 100644 > /lustre/scratch101/blastdb/Supported/nr.pal > 1000000120 1323250179 1324499009 1324499009 356 102 100644 > /lustre/scratch101/blastdb/Supported/embl_gss_mus-2 > 11936568 1323250179 1324499103 1324499103 356 102 100644 > /lustre/scratch101/blastdb/Supported/embl_gss_pln-1.xnt > 4096 1328692476 1328544166 1328544394 356 102 40755 > /lustre/scratch101/blastdb/Supported/uniprot_index > > > ls -la /lustre/scratch101/blastdb/Supported/embl_est_inv-3.nhr > -rw-r--r-- 1 pubseq crontab 222153834 2011-12-21 19:18 > /lustre/scratch101/blastdb/Supported/embl_est_inv-3.nhr > > I believe this patch set is based off lustre-1.8.7wc1 > >> 2043,2058c2043,2047 >> < if (param->ls) { >> < if (param->have_fileinfo == 0) { >> < /*if (st->st_size == 0 ) {*/ >> < /* Here I would like to pull >> < * back the size guess back from >> < * the MDS but I need a hint :) */ >> < stat(path,st); >> < } >> < llapi_printf(LLAPI_MSG_NORMAL, "%d %d %d %d >> %d %d %03o >> %s",st->st_size,st->st_atime,st->st_mtime,st->st_ctime,st->st_uid,st->st_gid,st->st_mode, >> path); >> < } >> < else >> < llapi_printf(LLAPI_MSG_NORMAL, "%s", path); >> < if (param->zeroend) >> < llapi_printf(LLAPI_MSG_NORMAL, "%c", ''\0''); >> < else >> < llapi_printf(LLAPI_MSG_NORMAL, "\n"); >> --- >> > llapi_printf(LLAPI_MSG_NORMAL, "%s", path); >> > if (param->zeroend) >> > llapi_printf(LLAPI_MSG_NORMAL, "%c", ''\0''); >> > else >> > llapi_printf(LLAPI_MSG_NORMAL, "\n"); > > > > diff ./lustre/utils/lfs.c > ../../lustre_source1/lustre-1.8.7wc1/lustre/utils/lfs.c > 133c133 > < " [--maxdepth|-D N] [[!] --name|-n <pattern>] > [--print0|-P] [--ls]\n" > --- >> " [--maxdepth|-D N] [[!] --name|-n <pattern>] [--print0|-P]\n" > 452d451 > < #define FIND_LS 4 > 477d475 > < {"ls", no_argument, 0, FIND_LS}, > 595,598d592 > < case FIND_LS: > < new_fashion = 1; > < param.ls = 1; > < break; > > > diff ./lustre/include/lustre/liblustreapi.h > ../../lustre_source1/lustre-1.8.7wc1/lustre/include/lustre/liblustreapi.h > 121d120 > < ls:1, > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Nathan Dauchy NOAA R&D HPCS, Senior Systems Engineer 325 Broadway, MS R-GSD2, Boulder, CO 80305 303-497-4675 office, 303-482-7377 mobile nathan.dauchy at noaa.gov
On 17 Feb 2012, at 17:45, Nathan Dauchy wrote:> James, > > Depending on how often you need the information updated, and how > perfectly accurate it has to be, you may find that just using normal > "find" on a snapshot of the MDT is more efficient. (IIRC that > size-on-mds is relatively close these days.) If you have your MDT on > LVM for making backups (for example), you could also periodically run > your find on the same snapshot.We generally do have the mdt on LVM however I am adverse to taking snapshots of it as the performance hit while you have a snapshot is pretty severe. And we don''t back any of our lustre systems up.> With your "lfs find" approach, I think you will have to decide if you > want to trust the size-on-mds anyway or query the OSTs which will slow > things down a lot. There is more than one way to do it.Given that we run the scan about once a month, If the size-on-mds was relatively up to date then I would love to use it. On our general purpose filesystems running a scan using real stats is not feasible. However I don''t have a clue how to get at the size on mdt, If anyone would point me at some documentation then that would be grand.> > -Nathan > > PS: I love the color-coded bar graph approach... very informative in a > compact display!Just in case it wasn''t clear agedu is written by Simon Tatham who is also know for PuTTY, I have sent him a few emails and he responds very quickly and he seems a very bright fellow. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.