thr3ads.net - Xapian discuss - Amount of writes during index creation [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Jean-Francois Dockes

2019-Jan-31 19:44 UTC

Amount of writes during index creation

Olly Betts writes:
 > On Mon, Jan 21, 2019 at 03:25:01PM +0100, Jean-Francois Dockes wrote:
 > > I have had a problem report from a Recoll user about the amount of
writes
 > > during index creation.
 > > 
 > > https://opensourceprojects.eu/p/recoll1/tickets/67/
 > > 
 > > The issue is that the index is on SSD and that the amount of writes
is
 > > significant compared to the SSD life expectancy (index size > 250
GB).
 > > 
 > > From the numbers he supplied, it seems to me that the total amount of
block
 > > writes is roughly quadratic with the index size.
 > > 
 > > First question: is this expected, or is Recoll doing something wrong
?
 > 
 > It isn't expected.
 > 
 > I think this is probably due to a bug which coincidentally was
 > discovered earlier this week by Germán M. Bravo.  I've now fixed it
 > and backported ready for 1.4.10.  If you're able to test to confirm
 > if this solves your problem that would be very useful - see
 > f19bcb96857419469f74f748e7fe8eaccaedc0fd on the RELEASE/1.4 branch:
 > 
 >
https://git.xapian.org/?p=xapian;a=commitdiff;h=f19bcb96857419469f74f748e7fe8eaccaedc0fd
 > 
 > Anything which uses a term for a unique document identifier is likely to
 > be affected.
 > 
 > Cheers,
 >     Olly

I have run a number of tests, with data mostly from a project gutenberg dvd
and other books, with relatively modest index sizes, from 1 to 24 GB.

Quite curiously, in this zone, with all Xapian versions I tried, the ratio
from index size to the amount of writes is roughly proportional to the index
size to the power 1.5

TotalWrites / (IndexSize**1.5) ~= K

So, not quadratic, which is good news. For big indexes, 1.5 is not so good
but probably somewhat expected.

The other good news is that the patch above decreases the amount of writing
by a significant factor, around 4.5 for the biggest index I tried.

The amount of writes is estimated with iostat before/after. The disk has
nothing else to do.

idxflushmb is the number of megabytes of input text between Xapian commits.

xapiandb,kb	writes,kb	K*1000	sz/w

xapian 1.4.5 idxflushmb 200

1544724		6941286		3.62	4.49	
3080540		16312960	3.02	5.30	
4606060		21054756	2.13	4.57	
6123140		33914344	2.24	5.54	
7631788		50452348	2.39	6.61	

xapian git master latest idxflushmb 200				

1402524		1597352 	0.96	1.14	
2223076		3291588 	0.99	1.48
2678404		4121024 	0.94	1.54	
3842372		7219404		0.96	1.88	
4964132		10850844	0.98	2.19	
6062204		14751196	0.99	2.43	
19677680	125418760	1.44	6.37

xapian git master before patch idxflushmb 200				

24707840	750228444	6.11	30.36	

So that was 750 GB of writes for the big index before the patch...

As you can see my beautiful law does not hold so well for the biggest index :)
(K = 1.44)
It's not quite the same data though, so I would need more tests, but I
think I'll stop here...

The improvement brought by the patch is nice. It remains that for people
using big indexes on SSD, the amount of writes is still something to
consider, and splitting the index probably makes sense ? What do you think ?

I'll run another test this night with a smaller flush interval to see if it
changes things.

Cheers,

jf

Bron Gondwana

2019-Feb-02 12:43 UTC

head link

Amount of writes during index creation

This is quite possibly part of the underlying write explosion that we ran into
when we wrote:

https://fastmail.blog/2014/12/01/email-search-system/

Which now almost 5 years on, has been running like a champion! We're really
pleased with how well it works. Xapian reads from multiple databases are really
easy, and the immediate writes onto tmpfs and daily compacts work really well.
We also have a cron job which runs hourly and will do immediate compacts to disk
from memory if the tmpfs hits more than 50% of its nominal size, and it keeps us
from almost ever needing to do any manual management as this thing indexed
millions of new emails per day across our cluster.

And then when we do the compact down to disk, it's a single thread
compacting indexes while new emails still index to tmpfs, so there's always
tons of IO available for searches.

I think even with more efficient IO patterns, I'd still stick with the
design we have. It's really nice :)

Bron.

On Fri, Feb 1, 2019, at 06:47, Jean-Francois Dockes
wrote:> Olly Betts writes:
> > On Mon, Jan 21, 2019 at 03:25:01PM +0100, Jean-Francois Dockes wrote:
> > > I have had a problem report from a Recoll user about the amount
of writes
> > > during index creation.
> > > 
> > > https://opensourceprojects.eu/p/recoll1/tickets/67/
> > > 
> > > The issue is that the index is on SSD and that the amount of
writes is
> > > significant compared to the SSD life expectancy (index size >
250 GB).
> > > 
> > > From the numbers he supplied, it seems to me that the total
amount of block
> > > writes is roughly quadratic with the index size.
> > > 
> > > First question: is this expected, or is Recoll doing something
wrong ?
> > 
> > It isn't expected.
> > 
> > I think this is probably due to a bug which coincidentally was
> > discovered earlier this week by Germán M. Bravo. I've now fixed it
> > and backported ready for 1.4.10. If you're able to test to confirm
> > if this solves your problem that would be very useful - see
> > f19bcb96857419469f74f748e7fe8eaccaedc0fd on the RELEASE/1.4 branch:
> > 
> >
https://git.xapian.org/?p=xapian;a=commitdiff;h=f19bcb96857419469f74f748e7fe8eaccaedc0fd
> > 
> > Anything which uses a term for a unique document identifier is likely
to
> > be affected.
> > 
> > Cheers,
> > Olly
> 
> I have run a number of tests, with data mostly from a project gutenberg dvd
> and other books, with relatively modest index sizes, from 1 to 24 GB.
> 
> Quite curiously, in this zone, with all Xapian versions I tried, the ratio
> from index size to the amount of writes is roughly proportional to the
index
> size to the power 1.5
> 
> TotalWrites / (IndexSize**1.5) ~= K
> 
> So, not quadratic, which is good news. For big indexes, 1.5 is not so good
> but probably somewhat expected.
> 
> The other good news is that the patch above decreases the amount of writing
> by a significant factor, around 4.5 for the biggest index I tried.
> 
> The amount of writes is estimated with iostat before/after. The disk has
> nothing else to do.
> 
> idxflushmb is the number of megabytes of input text between Xapian commits.
> 
> xapiandb,kb writes,kb K*1000 sz/w
> 
> xapian 1.4.5 idxflushmb 200
> 
> 1544724 6941286 3.62 4.49
> 3080540 16312960 3.02 5.30
> 4606060 21054756 2.13 4.57
> 6123140 33914344 2.24 5.54
> 7631788 50452348 2.39 6.61
> 
> xapian git master latest idxflushmb 200
> 
> 1402524 1597352 0.96 1.14
> 2223076 3291588 0.99 1.48
> 2678404 4121024 0.94 1.54
> 3842372 7219404 0.96 1.88
> 4964132 10850844 0.98 2.19
> 6062204 14751196 0.99 2.43
> 19677680 125418760 1.44 6.37
> 
> xapian git master before patch idxflushmb 200
> 
> 24707840 750228444 6.11 30.36
> 
> So that was 750 GB of writes for the big index before the patch...
> 
> As you can see my beautiful law does not hold so well for the biggest index
:)
> (K = 1.44)
> It's not quite the same data though, so I would need more tests, but I
> think I'll stop here...
> 
> The improvement brought by the patch is nice. It remains that for people
> using big indexes on SSD, the amount of writes is still something to
> consider, and splitting the index probably makes sense ? What do you think
?
> 
> I'll run another test this night with a smaller flush interval to see
if it
> changes things.
> 
> Cheers,
> 
> jf
> 
> 
-- 
 Bron Gondwana
 brong at fastmail.fm

Jean-Francois Dockes

2019-Feb-03 09:07 UTC

head link

Amount of writes during index creation

Bron Gondwana writes:
 > This is quite possibly part of the underlying write explosion that we ran
into when we wrote:
 > 
 > https://fastmail.blog/2014/12/01/email-search-system/
 > 
 > Which now almost 5 years on, has been running like a champion! We're
really pleased with how well it works. Xapian reads from multiple databases are
really easy, and the immediate writes onto tmpfs and daily compacts work really
well. We also have a cron job which runs hourly and will do immediate compacts
to disk from memory if the tmpfs hits more than 50% of its nominal size, and it
keeps us from almost ever needing to do any manual management as this thing
indexed millions of new emails per day across our cluster.
 > 
 > And then when we do the compact down to disk, it's a single thread
compacting indexes while new emails still index to tmpfs, so there's always
tons of IO available for searches.
 > 
 > I think even with more efficient IO patterns, I'd still stick with the
design we have. It's really nice :)
 > 
 > Bron.


Thank you for these informations.

I re-ran the 20 GB index creation with the latest xapian git code but a
much smaller commit threshold (20 MB instead of 200). There were more than
800 GB of data written (instead of 125 GB).

So it would seem that the right approach for creating big indexes is to:

- Always set the commit interval as high as the available RAM allows.

- Use the future Xapian 1.4.10, the patch brings a significant improvement.

- Segment the index, then use xapian-compact to merge if needed. It would
  be interesting to see how the fastmail approach works for an initial bulk
  index creation, compared to just segmenting, that is, what is the optimal
  number of merges?

JF

Olly Betts

2019-Feb-03 22:32 UTC

head link

Amount of writes during index creation

On Thu, Jan 31, 2019 at 08:44:44PM +0100, Jean-Francois Dockes
wrote:> I have run a number of tests, with data mostly from a project gutenberg dvd
> and other books, with relatively modest index sizes, from 1 to 24 GB.
> 
> Quite curiously, in this zone, with all Xapian versions I tried, the ratio
> from index size to the amount of writes is roughly proportional to the
index
> size to the power 1.5
> 
> TotalWrites / (IndexSize**1.5) ~= K
I could perhaps believe it would tend to O(n*log(n)) eventually due to
the number of levels in the B-tree being log(n) (though the number of
levels is bounded above by a fairly small constant so one could
argue that's O(n)).

But probably the merging on commit will actually determine the O()
behaviour, and that's harder to determine theoretically.
> The amount of writes is estimated with iostat before/after. The disk has
> nothing else to do.
There's a script in git which allows more precise I/O analysis by
logging relevant I/O using strace:

xapian-maintainer-tools/profiling/strace-analyse

Using strace means other processes are definitely excluded and you get
to see which tables (and even which blocks) the I/O is, e.g. a small
update to a small database gives:

read 0 from tmp.db/record.DB
read 0 from tmp.db/termlist.DB
read 0 from tmp.db/position.DB
read 0 from tmp.db/postlist.DB
write 1 to tmp.db/postlist.DB
write 1 to tmp.db/position.DB
write 1 to tmp.db/termlist.DB
write 1 to tmp.db/record.DB
sync tmp.db/postlist.tmp
sync tmp.db/postlist.DB
read 1 from tmp.db/postlist.DB
sync tmp.db/position.tmp
sync tmp.db/position.DB
read 1 from tmp.db/position.DB
sync tmp.db/termlist.tmp
sync tmp.db/termlist.DB
read 1 from tmp.db/termlist.DB
sync tmp.db/record.tmp
sync tmp.db/record.DB
read 1 from tmp.db/record.DB
> idxflushmb is the number of megabytes of input text between Xapian commits.
> 
> xapiandb,kb	writes,kb	K*1000	sz/w
> 
> xapian 1.4.5 idxflushmb 200
If you're going to the trouble of profiling, probably best to use the
latest release (1.4.5 was released in 2017).
> 1544724		6941286		3.62	4.49	
> 3080540		16312960	3.02	5.30	
> 4606060		21054756	2.13	4.57	
> 6123140		33914344	2.24	5.54	
> 7631788		50452348	2.39	6.61	
> 				
> xapian git master latest idxflushmb 200				
> 
> 1402524		1597352 	0.96	1.14	
> 2223076		3291588 	0.99	1.48
> 2678404		4121024 	0.94	1.54	
> 3842372		7219404		0.96	1.88	
> 4964132		10850844	0.98	2.19	
> 6062204		14751196	0.99	2.43	
> 19677680	125418760	1.44	6.37
> 				
> xapian git master before patch idxflushmb 200				
> 
> 24707840	750228444	6.11	30.36	
OK, so the patch makes a very significant difference here.

There are other changes between RELEASE/1.4 and master which will
likely affect improve indexing speed and memory use, but I'm not sure
there's anything which would affect disk writes (unless we end up
swapping to disk with 1.4 but master avoids doing so due to lower memory
usage).
> The improvement brought by the patch is nice. It remains that for people
> using big indexes on SSD, the amount of writes is still something to
> consider, and splitting the index probably makes sense ? What do you think
?
If you want to build a very large DB it's almost certain to be faster to
build it as a series of smaller DBs and merge them.

At least with the current backends (glass and older) - the plan for the
next backend (honey) is that it'll actually behave like that behind the
scenes, but that part isn't fully written yet.

Cheers,
    Olly

Jean-Francois Dockes

2019-Feb-04 18:00 UTC

head link

Amount of writes during index creation

Olly Betts writes:
 > On Thu, Jan 31, 2019 at 08:44:44PM +0100, Jean-Francois Dockes wrote:
 > > I have run a number of tests, with data mostly from a project
 > > gutenberg dvd and other books, with relatively modest index sizes,
 > > from 1 to 24 GB.
 > > 
 > > Quite curiously, in this zone, with all Xapian versions I tried, the
 > > ratio from index size to the amount of writes is roughly proportional
 > > to the index size to the power 1.5
 > > 
 > > TotalWrites / (IndexSize**1.5) ~= K
 > 
 > I could perhaps believe it would tend to O(n*log(n)) eventually due to
 > the number of levels in the B-tree being log(n) (though the number of
 > levels is bounded above by a fairly small constant so one could
 > argue that's O(n)).
 > 
 > But probably the merging on commit will actually determine the O()
 > behaviour, and that's harder to determine theoretically.

The 1.5 exponent is indeed frankly bizarre, but it holds rather well for
index sizes from 1.5 to 24 GB in this configuration... Just a curiosity.

 > > size       	writes  	K	writes/size
 > >
 > > 1402524		1597352 	0.96	1.14	
 > > 2223076		3291588 	0.99	1.48
 > > 2678404		4121024 	0.94	1.54	
 > > 3842372		7219404		0.96	1.88	
 > > 4964132		10850844	0.98	2.19	
 > > 6062204		14751196	0.99	2.43	
 > > 19677680   	125418760	1.44	6.37
 > > 24349248   	166162068	1.38	6.82

 > > The amount of writes is estimated with iostat before/after. The disk
has
 > > nothing else to do.
 > 
 > There's a script in git which allows more precise I/O analysis by
 > logging relevant I/O using strace:
 > 
 > xapian-maintainer-tools/profiling/strace-analyse
 > 
 > Using strace means other processes are definitely excluded and you get
 > to see which tables (and even which blocks) the I/O is, e.g. a small
 > update to a small database gives:
 > 
 > [...]


I tried to use strace -c, but for some reason, the pwrite counts in the
results were erratic (sometimes getting something like 11 writes after
indexing), probably some issue with my script, so I did not use them.

The output was to a backup disk, with no other activity during the tests.
 
 > If you're going to the trouble of profiling, probably best to use the
 > latest release (1.4.5 was released in 2017).

I was trying an older release to see if something had changed for the worse
recently.

 > > xapian git master latest idxflushmb 200				
 > > xapian git master before patch idxflushmb 200				

 > There are other changes between RELEASE/1.4 and master which will
 > likely affect improve indexing speed and memory use, but I'm not sure
 > there's anything which would affect disk writes (unless we end up
 > swapping to disk with 1.4 but master avoids doing so due to lower memory
 > usage).

Oops, sorry, the lines above should have read RELEASE/1.4, not master. Only
the later test with a small flush interval was done with master (by mistake).

Definitely no swapping to this disk.

 > > The improvement brought by the patch is nice. It remains that for
 > > people using big indexes on SSD, the amount of writes is still
 > > something to consider, and splitting the index probably makes sense ?
 > > What do you think ?
 > 
 > If you want to build a very large DB it's almost certain to be faster
to
 > build it as a series of smaller DBs and merge them.

Thanks for the confirmation, this is what the reporting user has concluded,
I'll confirm to them that it is the right approach.

 > At least with the current backends (glass and older) - the plan for the
 > next backend (honey) is that it'll actually behave like that behind
the
 > scenes, but that part isn't fully written yet.

I am sure that people with big indexes will appreciate !

Cheers,

jf

Possibly Parallel Threads

Search for more apparently analagous threads

Xapian discuss - Feb 2019 - Amount of writes during index creation

Amount of writes during index creation

Amount of writes during index creation

Amount of writes during index creation

Amount of writes during index creation

Amount of writes during index creation

Possibly Parallel Threads