Hugo Connery
2007-Sep-14 20:50 UTC
file statistics collection using stat(2) data obtained by rsync
Hi, I'm using rsync + hard-link copy as a backup mechanism (happily :-) and wish now to collect file usage statistics per user per rsync operation. rsync must preform a stat(2) or eqiuvalent on each file to determine if it needs to access the file content for synchronisation, and I would like to 'piggy back' my statistics on the stat(2) details that rsync has collected (and thus avoid generating all those stat(2)s again with another directory heirarchy traversal). Has this been done? Is there an 'API' for plugging in a file statistics collector? Assuming that this is not a poor design consideration and that no existing mechanism exists, I will happy write this. In which case, I would like to ask if others would be interested in this, and if so, what sort of statistics collection they would be interested in. [my concept is to collect file totals into access time slots (e.g (0-3), (3-6), (6-12), (12-36), 36 + month slots) per user and dump this data to file for collection by a tool to then deliver it to RRDTool for pretty graphs]. Thanks in advance, and regards, -- Hugo Connery IT Administrator, Institute of Environment & Resources, DTU http://www.er.dtu.dk
Matt McCutchen
2007-Sep-15 03:10 UTC
file statistics collection using stat(2) data obtained by rsync
On 9/14/07, Hugo Connery <hmc@er.dtu.dk> wrote:> I[...] wish now to collect file usage statistics per user per rsync operation.There is already a --stats option that provides some statistics. Does that give you everything you want? If not, what other statistics do you want? Matt
Matt McCutchen
2007-Sep-15 15:32 UTC
file statistics collection using stat(2) data obtained by rsync
Note: In the future, please Cc the rsync list in your responses so that others can help you if I become unavailable and so that future users can refer to your message. On 9/15/07, Hugo Connery <hmc@er.dtu.dk> wrote:> I want to obtain summary statistics grouped by file owner and access times. i.e at the > end of the operation report for each user the number of bytes that the user has stored > that has been accessed within a group of time periods (last 3 months, 3-6 months, 6-12 months etc.) > This basically forms a table of data sizes.The calculation of these statistics appears to be completely orthogonal to what rsync is doing (copying files). Unless keeping the number of stat(2) calls low is critical in your scenario, I think it would be much easier and more appropriate to write a separate script to calculate the statistics. Matt