I have only one box[1] running 3 sub-systems[2] at my system, are these numbers resonsable[3]?? [1] - From dmesg (FreeBSD 6.1-RELEASE): AMD Sempron(tm) Processor 3000+ (1808.33-MHz K8-class CPU) real memory = 2080309248 (1983 MB) avail memory = 1997869056 (1905 MB) ad0: 76350MB <SAMSUNG SP0802N TK200-04> at ata0-master UDMA33 [2] The sub-systems are: 1 - A server giving adreesses of documents to be indexed 2 - A server receiving these documents and replacing (I don't add, just replaces) 3 - The indexer/parser of documents send all them to "2" after X documents [3] - With 500K documents the flush (automatict) is very slow (the last started 2 hours ago from this mail, and not ended yet), the database(flint) are with 19GB, look delve result: number of documents = 560857 average document length = 1348.51 My worse is about 2hours of flush needed, but if you saw me that it's normal, I'll implement a system to create another database after 500K documents and xapian-compact the old one. -- SDM Underlinux http://stiod.wordpress.com Membro da equipe UnderLinux -- PEP-8 There is only 2 kinds of peoples in the world, who know English, and me. oO
Felix Antonius Wilhelm Ostmann
2007-Jan-19 14:39 UTC
[Xapian-discuss] Are these numbers resonsable?
Rafael "SDM" Sierra schrieb:> I have only one box[1] running 3 sub-systems[2] at my system, are these > numbers resonsable[3]?? > > [1] - From dmesg (FreeBSD 6.1-RELEASE): > AMD Sempron(tm) Processor 3000+ (1808.33-MHz K8-class CPU) > real memory = 2080309248 (1983 MB) > avail memory = 1997869056 (1905 MB) > ad0: 76350MB <SAMSUNG SP0802N TK200-04> at ata0-master UDMA33 > > [2] The sub-systems are: > 1 - A server giving adreesses of documents to be indexed > 2 - A server receiving these documents and replacing (I don't add, just > replaces) > 3 - The indexer/parser of documents send all them to "2" after X > documents > > [3] - With 500K documents the flush (automatict) is very slow (the last > started 2 hours ago from this mail, and not ended yet), the > database(flint) > are with 19GB, look delve result: > number of documents = 560857 > average document length = 1348.51 > > My worse is about 2hours of flush needed, but if you saw me that it's > normal, I'll implement a system to create another database after 500K > documents and xapian-compact the old one.i use exact this system, but after 200k documents i create one new database :) -- Mit freundlichen Gr??en Felix Antonius Wilhelm Ostmann -------------------------------------------------- Websuche Search Technology GmbH & Co. KG Martinistra?e 3 - D-49080 Osnabr?ck - Germany Tel.: +49 541 40666-0 - Fax: +49 541 40666-22 Email: info@websuche.de - Website: www.websuche.de -------------------------------------------------- AG Osnabr?ck - HRA 200252 - Ust-Ident: DE814737310 Komplement?rin: Websuche Search Technology Verwaltungs GmbH - AG Osnabr?ck - HRB 200359 Gesch?ftsf?hrer: Diplom Kaufmann Martin Steinkamp --------------------------------------------------
On 1/19/07, Felix Antonius Wilhelm Ostmann <ostmann@websuche.de> wrote:> > Rafael "SDM" Sierra schrieb:[cut]> My worse is about 2hours of flush needed, but if you saw me that it's > > normal, I'll implement a system to create another database after 500K > > documents and xapian-compact the old one. > i use exact this system, but after 200k documents i create one new > database :)Your updates/inserts become slow after 200K? I can do it too, but I'll need think in a solution to update older databases -- SDM Underlinux http://stiod.wordpress.com Membro da equipe UnderLinux -- PEP-8 There is only 2 kinds of peoples in the world, who know English, and me. oO
On Fri, Jan 19, 2007 at 03:48:06AM -0800, Rafael SDM Sierra wrote:> I have only one box[1] running 3 sub-systems[2] at my system, are these > numbers resonsable[3]?? > > [1] - From dmesg (FreeBSD 6.1-RELEASE): > AMD Sempron(tm) Processor 3000+ (1808.33-MHz K8-class CPU) > real memory = 2080309248 (1983 MB) > avail memory = 1997869056 (1905 MB) > ad0: 76350MB <SAMSUNG SP0802N TK200-04> at ata0-master UDMA33Probably not particularly fast disk - SATA is UDMA133, though I suspect by that point the bus bandwidth to the drive ceases to be a bottleneck currently.> [2] The sub-systems are: > 1 - A server giving adreesses of documents to be indexed > 2 - A server receiving these documents and replacing (I don't add, just > replaces)This could be an issue. Sequential insertions (or appends) into the underlying B-trees are optimised specially. For flint this means that adding or replacing an ascending sequence of adjacent document ids (or actually an ascending sequence where any doc ids skipped over don't already exist) is faster and produces a smaller database. So depending on the pattern of the document ids you specify, calling replace could be significantly slower than calling add would be. There's probably scope for improving this case (I know there's scope for further optimising appending a sequence), but it would be interesting to know why you want to set the document ids and what the pattern (if any) is. Cheers, Olly