YuLun Cai
2017-Apr-23 16:22 UTC
Question about the ticket #743 omindex: delay libmagic checks
> > I'd suggest to start with you just look at moving the libmagic check after > the filesize checks, so you don't need to get into whether libmagic or > the database check is cheaper on average.hi, Olly, I have moved the libmagic check after the filesize check directly, https://github.com/caiyulun/xapian/commit/3a97d9ee5397fa900a473aa9b3d8eeb720177a4e can you provide your comments on it and give some advice about the next steps? I think it is hard to say which is cheaper between the libmagic and database check Thanks 2017-04-21 13:37 GMT+08:00 Olly Betts <olly at survex.com>:> On Fri, Apr 21, 2017 at 01:52:38AM +0800, YuLun Cai wrote: > > I'm working on the ticket #743 omindex: delay libmagic checks > > <https://trac.xapian.org/ticket/743>. As the ticket's > > Description mention, the call to libmagic is expensive than call the > stat, > > so we can check the size by call the stat to get size before call > > libmagic to get a mime type. > > Yes. > > > But how about the timestamps check? since timestamps check need to > iterate > > the DB to check if the file has been indexed and hasn't changed(in > > `index_check_existing` function in omega\index_file.cc), so it is > expensive > > too. Should we call the libmagic before or after the timestamps, or do we > > have another way to check the timestamps? > > We also have an upper bound on the newest timestamp in the database at the > start of the run, so we can often avoid this check for new files (at least > if they were created since the end of the previous index run). > > But that just quickly tells us "yes" for such files (at least on the basis > of > timestamp) so we'd need to check them with libmagic anyway. To get a "no" > based on timestamp we need to check against the database. > > I'd suggest to start with you just look at moving the libmagic check after > the filesize checks, so you don't need to get into whether libmagic or > the database check is cheaper on average. > > > What's more, how should we write tests to prove the omindex works > > correctly, to generate some practical directories and use omindex to > index > > it then check the things in DB? > > We don't (sadly) have any tests of omindex behaviour currently, but having > some would be great. > > You'd need to work out what cases you're aiming to test and then script up > suitable changes to the directory between the omindex runs. > > Cheers, > Olly >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170424/a3329853/attachment.html>
James Aylett
2017-Apr-23 16:26 UTC
Question about the ticket #743 omindex: delay libmagic checks
On 23 Apr 2017, at 17:22, YuLun Cai <buptcyl at gmail.com> wrote:> I'd suggest to start with you just look at moving the libmagic check after > the filesize checks, so you don't need to get into whether libmagic or > the database check is cheaper on average. > > hi, Olly, I have moved the libmagic check after the filesize check directly, > > https://github.com/caiyulun/xapian/commit/3a97d9ee5397fa900a473aa9b3d8eeb720177a4e > > can you provide your comments on it and give some advice about the next steps?Can you create a pull request from this? It provides easier tools for feeding back on proposed changes. J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/
YuLun Cai
2017-Apr-24 01:19 UTC
Question about the ticket #743 omindex: delay libmagic checks
Hi, James, I have created a PR https://github.com/xapian/xapian/pull/153 <https://github.com/xapian/xapian/pull/153> looking forward to the response. Thanks 2017-04-24 0:26 GMT+08:00 James Aylett <james at tartarus.org>:> On 23 Apr 2017, at 17:22, YuLun Cai <buptcyl at gmail.com> wrote: > > > I'd suggest to start with you just look at moving the libmagic check > after > > the filesize checks, so you don't need to get into whether libmagic or > > the database check is cheaper on average. > > > > hi, Olly, I have moved the libmagic check after the filesize check > directly, > > > > https://github.com/caiyulun/xapian/commit/3a97d9ee5397fa900a473aa9b3d8ee > b720177a4e > > > > can you provide your comments on it and give some advice about the next > steps? > > Can you create a pull request from this? It provides easier tools for > feeding back on proposed changes. > > J > > -- > James Aylett > devfort.com — spacelog.org — tartarus.org/james/ > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170424/836f799f/attachment.html>