thr3ads.net - Xapian discuss - [Xapian-discuss] omindex one file at a time? [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Will Partain

2012-Dec-13 13:05 UTC

[Xapian-discuss] omindex one file at a time?

Hi, all -- I want to do Plain Old Omindex'ing *but* the mapping
between my documents' filenames and the URLs where I hope search
users to find them is, uh..., strange.  The simplest thing (to
me) would be to run omindex for each document, e.g.

  omindex --no-delete -U /cool-url-1 /funky/doc/file-blah.pdf
  omindex --no-delete -U /cool-url-7 /doc/funky/ohmy/blah-file.txt
  ... and so on...

Of course, this doesn't work because the pathnames don't signify
directories.  I'm guessing the same thing can be done with
'scriptindex' -- but I really want what just plain old omindex
does.

A horrible? way might be to copy each document into a temp
directory and run omindex -- but I'm guessing the URLs would come
out wrong (it would append the filename onto the end).

All good ideas welcome.  Thanks,

Will

Olly Betts

2012-Dec-14 00:35 UTC

head link

[Xapian-discuss] omindex one file at a time?

On Thu, Dec 13, 2012 at 08:05:38AM -0500, Will Partain
wrote:> Hi, all -- I want to do Plain Old Omindex'ing *but* the mapping
> between my documents' filenames and the URLs where I hope search
> users to find them is, uh..., strange.  The simplest thing (to
> me) would be to run omindex for each document, e.g.
> 
>   omindex --no-delete -U /cool-url-1 /funky/doc/file-blah.pdf
>   omindex --no-delete -U /cool-url-7 /doc/funky/ohmy/blah-file.txt
>   ... and so on...
> 
> Of course, this doesn't work because the pathnames don't signify
> directories.  I'm guessing the same thing can be done with
> 'scriptindex' -- but I really want what just plain old omindex
> does.
Running omindex once for each document will be slow.  If you have a lot
of documents, you really want to batch updates for good indexing
performance.
> A horrible? way might be to copy each document into a temp
> directory and run omindex -- but I'm guessing the URLs would come
> out wrong (it would append the filename onto the end).
I'd just symlink them all into a temporary directory structure and use
-f so omindex will follow the symlinks - e.g.:

$ mkdir tmp
$ ln -s /home/olly/git/survex/doc/manual.pdf tmp/cool-url-1
$ ln -s /home/olly/tmp.txt tmp/cool-url-7
$ ./omindex --db cool-url.db -f tmp
$ delve cool-url.db -1a|grep U
U/cool-url-1
U/cool-url-7

This will work so long as your omindex was built with libmagic (which is
optional in 1.2.x, but a hard requirement on trunk) and libmagic can
detect the filetype from the contents of the file.

Cheers,
    Olly

Reasonably Related Threads

Search for more possibly parallel threads

Xapian discuss - Dec 2012 - omindex one file at a time?

[Xapian-discuss] omindex one file at a time?

[Xapian-discuss] omindex one file at a time?

Reasonably Related Threads