I am creating a web-based gui to provide searching of files shared via SMB on a local network. It goes through and finds each computer, its shares, and its files using smbclient. Currently, a nasty (and incredibly slow) set of loops in php recurses the directories on each share using a smbclient call for each directory up to a set depth. I know for a fact of several computers on my LAN with links/shortcuts that would lead to infinite recursion if no depth limit or other measure was used to prevent this. I've tried smbclient's recurse option, which is of course *much* faster, but it unfortunately follows the links for as long as I've allowed it to run. Is there any current way to set a depth limit, or even better, just simply prevent visiting a directory twice (I've heard that wget hashes the directory contents and then compares them upon each new directory to prevent recursion at all...) ?? Any help would be *greatly* appreciated. I am not opposed to coding something into smbclient myself, I actually prefer doing that instead of doing it outside of smbclient. If nothing is currently available, pointers/suggestions would be great & I will try to put something together for all to benefit... Thanks Justin
I'm not sure how helpful this will be, but here goes: AFAIK SMB (and thus anything using it) is totally unaware of the idea of circularly linked filesystems because they don't happen in Windows (unless your FS is screwed. Incidentally this is also the reason I blithely assume that you're talking about Samba servers rather than Windows boxes. Complain if I'm wrong). I don't know of any way you could deal with this at the client side, short of checksumming the directory contents, and even that would be potentially harmful (no way to tell for sure that it is a repeat directory and not just a very similar one). What I know of that can be done is rather less elegant than a clean client-side solution would be, but if you don't really depend on following any symbolic links in the search process, you can work around the problem by duplicating every share in your smb.conf file (changing the name in a consistent way, like adding "search_" at the beginning) and then in the new share definitions set browsable = no writable = no guest only = yes follow symlinks = no Respectively these will prevent the share from showing up in browse lists, force it to be read only, not require authentication for read access (meaning you don't need to have your script spit real passwords around over the network) and prevent the Samba server from following symbolic links for the client. Unless you've been hard linking directories this should solve the recursion depth problem. Of course if you've got large numbers of servers/shares the cure may be worse than the disease. I think it should be possible using the fancy machine-name-specific include features in smb.conf to have appropriate options come on just for specific clients (i.e. the machine that runs the search program), but I've never tried this. I'm sure some of the guru's can say if and how it could be done. At 05:49 PM 11/5/01 -0500, Justin Yackoski wrote:>Is there any current way to set a depth limit, or even better, just >simply prevent visiting a directory twice (I've heard that wget hashes-- Clinical Psychologist: Someone who tries to figure out whether an infant has more fun in infancy than an adult has in adultery.
Chris Watt wrote:> At 09:09 AM 11/6/01 -0500, Justin Yackoski wrote: > >> I may be wrong, but it >>seems to me that the best solution is to checksum/hash the directory >>contents and name, and I find it hard to believe that two directories >>would be exactly identical very often... >> > > Do you get the same effect if you actually smbmount the filesystem and do a > ls -R on the mountpoint? > I guess you probably would actually. . .Yes, actually I've determined that whether I use smbmount or smbclient, I can recurse until I get 40 directories deep in the share. This is the case no matter how I do the recursion, so I assume 40 is hardcoded someplace as the max limit. I noticed on page 1 of the wget manpage, 2nd P under Description: "Infinite recursion loops are always avoided by hashing the retrieved data." Is this for some reason not applicable to smbclient, or would it be useful to add this, or should I be posting this to the samba developer list??> Ok, the best solution I can suggest is that you use a contents checksum or > depth limitDo you mean in smbclient? So currently thats not possible, you're suggesting it is useful to add such a feature/option?> It may take you a fair chunk of time to update the index, but as long as > you don't write to the same file your PHP script is trying to read from, > you can do this in parallel with processing queries (if you need to update > the index during "office hours" when the Windows boxes happen to be turned > on). If you want to do complex queries (using info other than the filename) > it might be worth sticking your index in mySQL (or something similar), > which happens to play nicely with PHP.I actually am using MySQL at the moment. Indexing 100,000+ files with a text file (and doing searches!), well, yes that would be bad. Thanks again for your help Justin
Seemingly Similar Threads
- Prevent infinite recursion in rwrite()
- Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
- infinite recursion during package installation with methods, setAs
- evaluation is nested too deeply: infinite recursion?
- infinite recursion when printing former S4 objects