Kees van Vloten
2023-Aug-10 13:01 UTC
[Samba] Spotlight indexing with fscrawler for multiple shares
Hi Matthias, Op 10-08-2023 om 14:46 schreef Matthias K?hne | Ellerhold Aktiengesellschaft via samba:> Hey Kees, > > disclaimer: shameless self-plug!! > > If you dont need content indexing you can use my indexer: > https://github.com/Ellerhold/fs2es-indexerI have looked at it because of troubles with FScrawler and I love your solution because it does not need heavy weight java. But there is one thing FScrawler is good at: it indexes all kinds of metadata of files (like exif data in photos etc), it can even do OCR. This is what the fs2es-indexer does not seem to do (to my understanding). That is the reason why I am stuck with FScrawler for now. Hopefully I am wrong and you are going to tell me that fs2es-indexer has all the functionality of FScrawler but not the issues :-) The other thing is that I am pushing data to Opensearch which requires me to patch and? compile FScrawler, another complexity I don't like very much. - Kees> > Ive created it because I couldnt get FScrawler to work correctly. > > You can add as many directories as you like in the config, it'll crawl > it through one daemon service. > > I'm planning on adding smb.conf parsing, so you dont even have to add > these directories into the yaml file and just use samba as you would. > > Let me know if you need some help setting it up or otherwise. > > Have a nice day, > > Matthias. > > Am 04.08.23 um 19:56 schrieb Kees van Vloten via samba: >> Hi Team, >> >> >> Did anybody solve the issue of FScrawler crawling over multiple >> shares, preferably from a single job or from a single service? >> >> Setting up a service for FScrawler per share does not scale very nice... >> >> >> - Kees. >> >>
Matthias Kühne | Ellerhold Aktiengesellschaft
2023-Aug-10 13:38 UTC
[Samba] Spotlight indexing with fscrawler for multiple shares
Hey Kees, fs2es-indexer is designed to be a lightweight alternative to FSCrawler. So no ... it doesnt do any content indexing or saves much of the metadata. As far as I understand it the OCR and other stuff makes FScrawler that big. And we dont need any of that - we just want to search for file names. BUT Im open for merge requests ;-) I currently getting away with a lot less complexity because I dont need to watch for changes in files. Because thats not something I'm indexing. If I'd be adding more metadata (even only size!) I have to verify that it stays correct and start to listen to "file X has changed" events somehow... fanotify seems like a sweet framework for that, but sadly ZFS is incompatible with it... Samba does not let me get this data efficiently either, so Im forced to regular scans of the whole fs.... which might take a while depending on the amount of files. Adding support for opensearch though shouldnt be that hard, right? I've already got a version switch for ES v7 and v8, adding OS to it should be easy enough! Have a nice day, Matthias. Am 10.08.23 um 15:01 schrieb Kees van Vloten via samba:> Hi Matthias, > > Op 10-08-2023 om 14:46 schreef Matthias K?hne | Ellerhold > Aktiengesellschaft via samba: >> Hey Kees, >> >> disclaimer: shameless self-plug!! >> >> If you dont need content indexing you can use my indexer: >> https://github.com/Ellerhold/fs2es-indexer > > I have looked at it because of troubles with FScrawler and I love your > solution because it does not need heavy weight java. > > But there is one thing FScrawler is good at: it indexes all kinds of > metadata of files (like exif data in photos etc), it can even do OCR. > This is what the fs2es-indexer does not seem to do (to my understanding). > > That is the reason why I am stuck with FScrawler for now. > > Hopefully I am wrong and you are going to tell me that fs2es-indexer > has all the functionality of FScrawler but not the issues :-) > > The other thing is that I am pushing data to Opensearch which requires > me to patch and? compile FScrawler, another complexity I don't like > very much. > > - Kees > >> >> Ive created it because I couldnt get FScrawler to work correctly. >> >> You can add as many directories as you like in the config, it'll crawl >> it through one daemon service. >> >> I'm planning on adding smb.conf parsing, so you dont even have to add >> these directories into the yaml file and just use samba as you would. >> >> Let me know if you need some help setting it up or otherwise. >> >> Have a nice day, >> >> Matthias. >> >> Am 04.08.23 um 19:56 schrieb Kees van Vloten via samba: >>> Hi Team, >>> >>> >>> Did anybody solve the issue of FScrawler crawling over multiple >>> shares, preferably from a single job or from a single service? >>> >>> Setting up a service for FScrawler per share does not scale very >>> nice... >>> >>> >>> - Kees. >>> >>> >-- Senior Webentwickler Datenschutzbeauftragter Ellerhold Aktiengesellschaft Friedrich-List-Str. 4 01445 Radebeul Telefon: +49 (0) 351 83933-61 Web: www.ellerhold.de Facebook: www.facebook.com/ellerhold.gruppe Instagram: www.instagram.com/ellerhold.gruppe Twitter: https://twitter.com/EllerholdGruppe Amtsgericht Dresden / HRB 23769 Vorstand: Stephan Ellerhold, Maximilian Ellerhold Vorsitzender des Aufsichtsrates: Frank Ellerhold ---Diese E-Mail und Ihre Anlagen enthalten vertrauliche Mitteilungen. Sollten Sie nicht der beabsichtigte Adressat sein, so bitten wir Sie um Mitteilung und um sofortiges l?schen dieser E-Mail und der Anlagen. Unsere Hinweise zum Datenschutz finden Sie hier: http://www.ellerhold.de/datenschutz/ This e-mail and its attachments are privileged and confidential. If you are not the intended recipient, please notify us and immediately delete this e-mail and its attachments. You can find our privacy policy here: http://www.ellerhold.de/datenschutz/