hallo, I'm indexing a filesystem using omindex, and users can query the database via omega: everything works fine now I'd like to add an option like "search files by name" and I'm wondering how to do this can omega search files by name directly? how should I build the query? could I use scriptindex to index, eg., the locate database? how? should I index file names directly? can omindex do this or should I use scriptindex? how? thanks in advice tindal
Olly Betts
2008-May-06 09:53 UTC
[Xapian-discuss] locate and omega: how to index file names?
On Mon, May 05, 2008 at 12:44:18PM +0200, tindal wrote:> I'm indexing a filesystem using omindex, and users can query the > database via omega: everything works fine > > now I'd like to add an option like "search files by name" and I'm > wondering how to do this > > can omega search files by name directly? how should I build the query?Not if you index with omindex, since it doesn't index the full path of files in any way.> could I use scriptindex to index, eg., the locate database? how?If you're able to dump the locate database's contents, just write a script in your favourite scripting language to convert that to scriptindex's input format.> should I index file names directly? > can omindex do this or should I use scriptindex? how?Directly? You can certainly index the filenames as you index the files but you'd have to modify omindex to do this, or recurse the directory tree dumping it into scriptindex's input format. Or write your own indexer from scratch. Cheers, Olly
ok, that's my (still partial) solution: find /dir/* -type f -printf 'url=%p\npath=%h\nname=%f\nsize=%s\nmodtime=%AY-%Am-%Ad\n\n' |awk '{if ($1 ~ /^path=/) gsub(/\//, "\n="); if ($1 ~ /^name=/) sub(/\./,"\nformat="); print}'| scriptindex /database/dir/filelist filelist2omega.script in which filelist2omega.script contains: url : index field=id field=url name : weight=3 indexnopos hash field=name path : indexnopos field=path format : index field=format size : index field=size modtime : index field=modtime and that's the record format for scriptindex: url=/full/url/of/the/file.txt path=full =url =of =the name=file format=txt size=436110 modtime=2008-05-06 now 3 more questions: the date format is not correct, as omega doesn't show it: which is the correct one? omega reads the size wrong, as it says, in this example, "436 bytes": why? for the search I'd like to be able to choose between the "default" database (made with omindex) and my "filelist" database, but I end up searching both databases the relevant code in the template is: <INPUT TYPE=radio NAME="DB" VALUE="default" $if{$eq{$dbname,default},CHECKED}>Search the contents<br> <INPUT TYPE=radio NAME="DB" VALUE="filelist" $if{$eq{$dbname,filelist},CHECKED}>Search the names<br> I find that after the first change of database $dbname contains "default/filelist" or "filelist/default": how can I reset it? thanks tindal