Olly Betts writes: > On Mon, Jan 21, 2019 at 03:25:01PM +0100, Jean-Francois Dockes wrote: > > I have had a problem report from a Recoll user about the amount of writes > > during index creation. > > > > https://opensourceprojects.eu/p/recoll1/tickets/67/ > > > > The issue is that the index is on SSD and that the amount of writes is > > significant compared to the SSD life expectancy (index size > 250 GB). > > > > From the numbers he supplied, it seems to me that the total amount of block > > writes is roughly quadratic with the index size. > > > > First question: is this expected, or is Recoll doing something wrong ? > > It isn't expected. > > I think this is probably due to a bug which coincidentally was > discovered earlier this week by Germán M. Bravo. I've now fixed it > and backported ready for 1.4.10. If you're able to test to confirm > if this solves your problem that would be very useful - see > f19bcb96857419469f74f748e7fe8eaccaedc0fd on the RELEASE/1.4 branch: > > https://git.xapian.org/?p=xapian;a=commitdiff;h=f19bcb96857419469f74f748e7fe8eaccaedc0fd > > Anything which uses a term for a unique document identifier is likely to > be affected. > > Cheers, > Olly I have run a number of tests, with data mostly from a project gutenberg dvd and other books, with relatively modest index sizes, from 1 to 24 GB. Quite curiously, in this zone, with all Xapian versions I tried, the ratio from index size to the amount of writes is roughly proportional to the index size to the power 1.5 TotalWrites / (IndexSize**1.5) ~= K So, not quadratic, which is good news. For big indexes, 1.5 is not so good but probably somewhat expected. The other good news is that the patch above decreases the amount of writing by a significant factor, around 4.5 for the biggest index I tried. The amount of writes is estimated with iostat before/after. The disk has nothing else to do. idxflushmb is the number of megabytes of input text between Xapian commits. xapiandb,kb writes,kb K*1000 sz/w xapian 1.4.5 idxflushmb 200 1544724 6941286 3.62 4.49 3080540 16312960 3.02 5.30 4606060 21054756 2.13 4.57 6123140 33914344 2.24 5.54 7631788 50452348 2.39 6.61 xapian git master latest idxflushmb 200 1402524 1597352 0.96 1.14 2223076 3291588 0.99 1.48 2678404 4121024 0.94 1.54 3842372 7219404 0.96 1.88 4964132 10850844 0.98 2.19 6062204 14751196 0.99 2.43 19677680 125418760 1.44 6.37 xapian git master before patch idxflushmb 200 24707840 750228444 6.11 30.36 So that was 750 GB of writes for the big index before the patch... As you can see my beautiful law does not hold so well for the biggest index :) (K = 1.44) It's not quite the same data though, so I would need more tests, but I think I'll stop here... The improvement brought by the patch is nice. It remains that for people using big indexes on SSD, the amount of writes is still something to consider, and splitting the index probably makes sense ? What do you think ? I'll run another test this night with a smaller flush interval to see if it changes things. Cheers, jf
This is quite possibly part of the underlying write explosion that we ran into when we wrote: https://fastmail.blog/2014/12/01/email-search-system/ Which now almost 5 years on, has been running like a champion! We're really pleased with how well it works. Xapian reads from multiple databases are really easy, and the immediate writes onto tmpfs and daily compacts work really well. We also have a cron job which runs hourly and will do immediate compacts to disk from memory if the tmpfs hits more than 50% of its nominal size, and it keeps us from almost ever needing to do any manual management as this thing indexed millions of new emails per day across our cluster. And then when we do the compact down to disk, it's a single thread compacting indexes while new emails still index to tmpfs, so there's always tons of IO available for searches. I think even with more efficient IO patterns, I'd still stick with the design we have. It's really nice :) Bron. On Fri, Feb 1, 2019, at 06:47, Jean-Francois Dockes wrote:> Olly Betts writes: > > On Mon, Jan 21, 2019 at 03:25:01PM +0100, Jean-Francois Dockes wrote: > > > I have had a problem report from a Recoll user about the amount of writes > > > during index creation. > > > > > > https://opensourceprojects.eu/p/recoll1/tickets/67/ > > > > > > The issue is that the index is on SSD and that the amount of writes is > > > significant compared to the SSD life expectancy (index size > 250 GB). > > > > > > From the numbers he supplied, it seems to me that the total amount of block > > > writes is roughly quadratic with the index size. > > > > > > First question: is this expected, or is Recoll doing something wrong ? > > > > It isn't expected. > > > > I think this is probably due to a bug which coincidentally was > > discovered earlier this week by Germán M. Bravo. I've now fixed it > > and backported ready for 1.4.10. If you're able to test to confirm > > if this solves your problem that would be very useful - see > > f19bcb96857419469f74f748e7fe8eaccaedc0fd on the RELEASE/1.4 branch: > > > > https://git.xapian.org/?p=xapian;a=commitdiff;h=f19bcb96857419469f74f748e7fe8eaccaedc0fd > > > > Anything which uses a term for a unique document identifier is likely to > > be affected. > > > > Cheers, > > Olly > > I have run a number of tests, with data mostly from a project gutenberg dvd > and other books, with relatively modest index sizes, from 1 to 24 GB. > > Quite curiously, in this zone, with all Xapian versions I tried, the ratio > from index size to the amount of writes is roughly proportional to the index > size to the power 1.5 > > TotalWrites / (IndexSize**1.5) ~= K > > So, not quadratic, which is good news. For big indexes, 1.5 is not so good > but probably somewhat expected. > > The other good news is that the patch above decreases the amount of writing > by a significant factor, around 4.5 for the biggest index I tried. > > The amount of writes is estimated with iostat before/after. The disk has > nothing else to do. > > idxflushmb is the number of megabytes of input text between Xapian commits. > > xapiandb,kb writes,kb K*1000 sz/w > > xapian 1.4.5 idxflushmb 200 > > 1544724 6941286 3.62 4.49 > 3080540 16312960 3.02 5.30 > 4606060 21054756 2.13 4.57 > 6123140 33914344 2.24 5.54 > 7631788 50452348 2.39 6.61 > > xapian git master latest idxflushmb 200 > > 1402524 1597352 0.96 1.14 > 2223076 3291588 0.99 1.48 > 2678404 4121024 0.94 1.54 > 3842372 7219404 0.96 1.88 > 4964132 10850844 0.98 2.19 > 6062204 14751196 0.99 2.43 > 19677680 125418760 1.44 6.37 > > xapian git master before patch idxflushmb 200 > > 24707840 750228444 6.11 30.36 > > So that was 750 GB of writes for the big index before the patch... > > As you can see my beautiful law does not hold so well for the biggest index :) > (K = 1.44) > It's not quite the same data though, so I would need more tests, but I > think I'll stop here... > > The improvement brought by the patch is nice. It remains that for people > using big indexes on SSD, the amount of writes is still something to > consider, and splitting the index probably makes sense ? What do you think ? > > I'll run another test this night with a smaller flush interval to see if it > changes things. > > Cheers, > > jf > >-- Bron Gondwana brong at fastmail.fm
Bron Gondwana writes: > This is quite possibly part of the underlying write explosion that we ran into when we wrote: > > https://fastmail.blog/2014/12/01/email-search-system/ > > Which now almost 5 years on, has been running like a champion! We're really pleased with how well it works. Xapian reads from multiple databases are really easy, and the immediate writes onto tmpfs and daily compacts work really well. We also have a cron job which runs hourly and will do immediate compacts to disk from memory if the tmpfs hits more than 50% of its nominal size, and it keeps us from almost ever needing to do any manual management as this thing indexed millions of new emails per day across our cluster. > > And then when we do the compact down to disk, it's a single thread compacting indexes while new emails still index to tmpfs, so there's always tons of IO available for searches. > > I think even with more efficient IO patterns, I'd still stick with the design we have. It's really nice :) > > Bron. Thank you for these informations. I re-ran the 20 GB index creation with the latest xapian git code but a much smaller commit threshold (20 MB instead of 200). There were more than 800 GB of data written (instead of 125 GB). So it would seem that the right approach for creating big indexes is to: - Always set the commit interval as high as the available RAM allows. - Use the future Xapian 1.4.10, the patch brings a significant improvement. - Segment the index, then use xapian-compact to merge if needed. It would be interesting to see how the fastmail approach works for an initial bulk index creation, compared to just segmenting, that is, what is the optimal number of merges? JF
On Thu, Jan 31, 2019 at 08:44:44PM +0100, Jean-Francois Dockes wrote:> I have run a number of tests, with data mostly from a project gutenberg dvd > and other books, with relatively modest index sizes, from 1 to 24 GB. > > Quite curiously, in this zone, with all Xapian versions I tried, the ratio > from index size to the amount of writes is roughly proportional to the index > size to the power 1.5 > > TotalWrites / (IndexSize**1.5) ~= KI could perhaps believe it would tend to O(n*log(n)) eventually due to the number of levels in the B-tree being log(n) (though the number of levels is bounded above by a fairly small constant so one could argue that's O(n)). But probably the merging on commit will actually determine the O() behaviour, and that's harder to determine theoretically.> The amount of writes is estimated with iostat before/after. The disk has > nothing else to do.There's a script in git which allows more precise I/O analysis by logging relevant I/O using strace: xapian-maintainer-tools/profiling/strace-analyse Using strace means other processes are definitely excluded and you get to see which tables (and even which blocks) the I/O is, e.g. a small update to a small database gives: read 0 from tmp.db/record.DB read 0 from tmp.db/termlist.DB read 0 from tmp.db/position.DB read 0 from tmp.db/postlist.DB write 1 to tmp.db/postlist.DB write 1 to tmp.db/position.DB write 1 to tmp.db/termlist.DB write 1 to tmp.db/record.DB sync tmp.db/postlist.tmp sync tmp.db/postlist.DB read 1 from tmp.db/postlist.DB sync tmp.db/position.tmp sync tmp.db/position.DB read 1 from tmp.db/position.DB sync tmp.db/termlist.tmp sync tmp.db/termlist.DB read 1 from tmp.db/termlist.DB sync tmp.db/record.tmp sync tmp.db/record.DB read 1 from tmp.db/record.DB> idxflushmb is the number of megabytes of input text between Xapian commits. > > xapiandb,kb writes,kb K*1000 sz/w > > xapian 1.4.5 idxflushmb 200If you're going to the trouble of profiling, probably best to use the latest release (1.4.5 was released in 2017).> 1544724 6941286 3.62 4.49 > 3080540 16312960 3.02 5.30 > 4606060 21054756 2.13 4.57 > 6123140 33914344 2.24 5.54 > 7631788 50452348 2.39 6.61 > > xapian git master latest idxflushmb 200 > > 1402524 1597352 0.96 1.14 > 2223076 3291588 0.99 1.48 > 2678404 4121024 0.94 1.54 > 3842372 7219404 0.96 1.88 > 4964132 10850844 0.98 2.19 > 6062204 14751196 0.99 2.43 > 19677680 125418760 1.44 6.37 > > xapian git master before patch idxflushmb 200 > > 24707840 750228444 6.11 30.36OK, so the patch makes a very significant difference here. There are other changes between RELEASE/1.4 and master which will likely affect improve indexing speed and memory use, but I'm not sure there's anything which would affect disk writes (unless we end up swapping to disk with 1.4 but master avoids doing so due to lower memory usage).> The improvement brought by the patch is nice. It remains that for people > using big indexes on SSD, the amount of writes is still something to > consider, and splitting the index probably makes sense ? What do you think ?If you want to build a very large DB it's almost certain to be faster to build it as a series of smaller DBs and merge them. At least with the current backends (glass and older) - the plan for the next backend (honey) is that it'll actually behave like that behind the scenes, but that part isn't fully written yet. Cheers, Olly
Olly Betts writes: > On Thu, Jan 31, 2019 at 08:44:44PM +0100, Jean-Francois Dockes wrote: > > I have run a number of tests, with data mostly from a project > > gutenberg dvd and other books, with relatively modest index sizes, > > from 1 to 24 GB. > > > > Quite curiously, in this zone, with all Xapian versions I tried, the > > ratio from index size to the amount of writes is roughly proportional > > to the index size to the power 1.5 > > > > TotalWrites / (IndexSize**1.5) ~= K > > I could perhaps believe it would tend to O(n*log(n)) eventually due to > the number of levels in the B-tree being log(n) (though the number of > levels is bounded above by a fairly small constant so one could > argue that's O(n)). > > But probably the merging on commit will actually determine the O() > behaviour, and that's harder to determine theoretically. The 1.5 exponent is indeed frankly bizarre, but it holds rather well for index sizes from 1.5 to 24 GB in this configuration... Just a curiosity. > > size writes K writes/size > > > > 1402524 1597352 0.96 1.14 > > 2223076 3291588 0.99 1.48 > > 2678404 4121024 0.94 1.54 > > 3842372 7219404 0.96 1.88 > > 4964132 10850844 0.98 2.19 > > 6062204 14751196 0.99 2.43 > > 19677680 125418760 1.44 6.37 > > 24349248 166162068 1.38 6.82 > > The amount of writes is estimated with iostat before/after. The disk has > > nothing else to do. > > There's a script in git which allows more precise I/O analysis by > logging relevant I/O using strace: > > xapian-maintainer-tools/profiling/strace-analyse > > Using strace means other processes are definitely excluded and you get > to see which tables (and even which blocks) the I/O is, e.g. a small > update to a small database gives: > > [...] I tried to use strace -c, but for some reason, the pwrite counts in the results were erratic (sometimes getting something like 11 writes after indexing), probably some issue with my script, so I did not use them. The output was to a backup disk, with no other activity during the tests. > If you're going to the trouble of profiling, probably best to use the > latest release (1.4.5 was released in 2017). I was trying an older release to see if something had changed for the worse recently. > > xapian git master latest idxflushmb 200 > > xapian git master before patch idxflushmb 200 > There are other changes between RELEASE/1.4 and master which will > likely affect improve indexing speed and memory use, but I'm not sure > there's anything which would affect disk writes (unless we end up > swapping to disk with 1.4 but master avoids doing so due to lower memory > usage). Oops, sorry, the lines above should have read RELEASE/1.4, not master. Only the later test with a small flush interval was done with master (by mistake). Definitely no swapping to this disk. > > The improvement brought by the patch is nice. It remains that for > > people using big indexes on SSD, the amount of writes is still > > something to consider, and splitting the index probably makes sense ? > > What do you think ? > > If you want to build a very large DB it's almost certain to be faster to > build it as a series of smaller DBs and merge them. Thanks for the confirmation, this is what the reporting user has concluded, I'll confirm to them that it is the right approach. > At least with the current backends (glass and older) - the plan for the > next backend (honey) is that it'll actually behave like that behind the > scenes, but that part isn't fully written yet. I am sure that people with big indexes will appreciate ! Cheers, jf