I want to get ACLs (output similar to that of smbcacls) for a *lot* of files (potentially millions). I can only process about 10 files per second when running the command (`smbcacls -U ...` via a Python wrapper), I'm looking for a faster way. Does anyone know any libraries or other commands that could help me? Failing that, I assume that much of the time taken is spent on authenticating the user/pw for each request. Would it be possible to write something that keeps the connection so that multiple requests can be made without reauthenticating (I'm not familiar with how LDAP/AD/Samba works)? I have looked at the source of smbcacls but nothing jumped out at me. Many thanks
Hi Peter, On Thu, 28 Nov 2013 15:00:18 +0000 Peter Flood <info at whywouldwe.com> wrote:> Does anyone know any libraries or other commands that could help me?The cifs.ko kernel client utilities package ships a getcifsacl binary which may be worth a try.> Failing that, I assume that much of the time taken is spent on > authenticating the user/pw for each request. Would it be possible to > write something that keeps the connection so that multiple requests can > be made without reauthenticating (I'm not familiar with how > LDAP/AD/Samba works)? I have looked at the source of smbcacls but > nothing jumped out at me.Noel (cc'ed) just finished a bunch of changes adding inheritance propagation to smbcacls: http://cgit.freedesktop.org/~noelp/noelp-samba/log/?h=smbcalcs-inherit-v2 It doesn't do recursion on --get, but the new code could certainly be leveraged to add this feature. Cheers, David
On 13-11-28 10:00 AM, Peter Flood wrote:> I want to get ACLs (output similar to that of smbcacls) for a *lot* of > files (potentially millions). I can only process about 10 files per > second when running the command (`smbcacls -U ...` via a Python > wrapper), I'm looking for a faster way.It's kind of ugly, but a quick workaround may be doing the calls in parallel using a worker queue in python: http://docs.python.org/2/library/queue.html You'd be able to have an arbitrary number of outstanding requests at the same time. It might do the trick for you. M. -- Michael Brown | `One of the main causes of the fall of Systems Consultant | the Roman Empire was that, lacking zero, Net Direct Inc. | they had no way to indicate successful ?: +1 519 883 1172 x5106 | termination of their C programs.' - Firth
On Thu, 2013-11-28 at 15:00 +0000, Peter Flood wrote:> I want to get ACLs (output similar to that of smbcacls) for a *lot* of > files (potentially millions). I can only process about 10 files per > second when running the command (`smbcacls -U ...` via a Python > wrapper), I'm looking for a faster way. > > Does anyone know any libraries or other commands that could help me? > > Failing that, I assume that much of the time taken is spent on > authenticating the user/pw for each request. Would it be possible to > write something that keeps the connection so that multiple requests can > be made without reauthenticating (I'm not familiar with how > LDAP/AD/Samba works)? I have looked at the source of smbcacls but > nothing jumped out at me.See the code in python/samba/netcmd/gpo.py that sets a remote ACL for GPOs. This could be called from your own script, avoiding the connection cost. Otherwise, make sure you authenticate with kerberos, as this will be much faster, even if you connect per file with smbcacls. Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
Hi Peter, On 11/12/13 15:38, Peter Flood wrote:> Hi Noel[...]> We just tried this branch. In our setup we get ~250 files per second > with smbcacls -r. A few weeks ago we got about 10 files per second by > making repeated calls to smbcalcs via subprocess in python. In a test > just now we got ~225 files per second with our scan then > pysmbc/smbcacls approach (described earlier), however we also stat > every file and write some data to db (maybe 3 bulk writes per 1,000 > files) so it's not a direct comparison. > > One interesting thing we did notice was that we got ~250 additional > 'files' with the recursive smbcalcs, I assume those are the > directories, can you confirm that directories are also output with -r?yes directories are also output> Is there a way to filter those out?not directly, presumably you pipe the output of smbcalcs to a file, you should with a script be able to postprocess the complete output of 'smbcalcs -r--get' and scrape/filter whatever information you need out of that. thanks, Noel