On Wed, Apr 03, 2013 at 03:16:50PM +0200, Julien Pfefferkorn
wrote:> I noticed that Omega indexes file names. The file name seems to indexed as
> several words if the name contains space characters.
>
> In my share I often separate words in the file name using "-" or
"_" or even
> using a capital letter at the beginning of each word (I guess this is also
> the case for many other users):
>
> Examples:
>
> this-is-a-file.txt
>
> this_is_a_file.txt
>
> thisIsAFile.txt
>
> In those cases, a noticed that omega does not index the individual words,
> but only the full basename as one single word.
The last two are true, but you're incorrect about hyphens:
$ mkdir test
$ echo hello > test/this-is-a-test.txt
$ omindex --verbose --db tmp.db test
omindex: --url not specified, assuming '/'.
[Entering directory ""]
Indexing "this-is-a-test.txt" as text/plain ... added
$ delve -r1 tmp.db
Term List for record #1: D20130415 Etxt I* M201304 Oolly P/ Ttext/plain
U/this-is-a-test.txt Y2013 Za Zhello Zis Ztest Zthis a hello is test this
> It would be helpful, if omega would index each respective word, to ease the
> search.
Currently the leafname is just handled the same way as text inside the document.
We need to handle it the same way or else typing the leafname in as a search
wouldn't match the file in such cases, which would be confusing. But we
could
additionally index it split at punctuation and/or case transitions. I'm not
sure exactly what the best algorithm would be though.
> Is it planned to add that feature in omega? Should I write a feature
request
> in trac?
Yes, that's the best way to make sure a suggestion doesn't get lost.
> It seems that omega does not index the file name if the MIME type cannot be
> indexed.
>
> In order to be able to search all files by their name, it would be helpful,
> if omega would index the file name in that case.
Yes, we don't index files unless we know how to.
You can make this happen for particular mimetypes with a dummy filter:
--filter=application/octet-stream:/bin/true
But there's no way to tell it to do that for all unknown types currently.
> Is it planned to add this feature in omega? Should I write a feature
request
> in trac?
Yes.
> It seems that omega does not currently index folder names
>
> In order to be able to search for folder by its name, it would be helpful,
> if omega would index it.
>
> Is it planned to add this feature in omega? Should I write a feature
request
> in trac?
Only indexing the leafname was a deliberate choice - the thinking was that
indexing the folder name for every file would make searches including a
word from the folder name very noisy, since every file in such a folder
would match.
It could probably be an optional feature, or perhaps it wouldn't actually
be problematic in practice.
Cheers,
Olly