Eric Parusel
2004-Dec-21 21:33 UTC
[Xapian-discuss] Search::Xapian add_database'd search results are odd?
Sorry if this is the wrong forum to discuss Search::Xapian issues -- this just seems like the best place.. Anyways, I've been testing out using $db->add_database() when searching, and it seems like the docids I'm getting out of it are incorrect, almost as though they're "double" what they should be (numerically)... the docids that exist should be around 950,000 and 1000000 not around 1900000, etc... $xapiandirbase . '-11' and $xapiandirbase . '-10' both exist. quick example: ==================================================================my $db = Search::Xapian::Database->new($xapiandirbase . '-11' || die("Error.\n")); my $db->add_database(Search::Xapian::Database->new($ARGV[0] . '-10' || die("Error.\n"))); my $query = Search::Xapian::Query->new(OP_AND, 'word', 'word2'); print "Query: " . $query . "\n"; my $enq = $db->enquire($query); my @matches = $enq->matches(0,1000000); foreach my $match ( @matches ) { print $match->get_docid . ','; } ================================================================== If I don't use add_database, or add_database is wrapped in eval and fails due to me pointing to a xapian db that doesn't exist, and either query xapian db "-10" or "-11", I get docids returned in the 900000-1000000 range approximately. If I use add_database() in either order (-10 then -11, or vice versa) then I get ids seemingly doubled. -10 and -11 have unique markers in them (M10 and M11 respectively), so if I use one of the markers as the keywords, I get the same number of results whether I add on the 2nd database or not. I just get results that are seemingly approximately doubled? I'm running Search::Xapian 0.8.4.0 and xapian 0.8.4: # rpm -qa | grep xapian xapian-core-0.8.4-1 xapian-core-libs-0.8.4-1 xapian-core-debuginfo-0.8.4-1 xapian-core-devel-0.8.4-1 Thanks for any help you can offer, Eric -------------- next part -------------- xapian-read-test.pl in original form, reading domain.com-2004-11, no add_database: # ./xapian-read-test.pl /data1/xapian/domain.com-2004 M11 warren shannon Query: Xapian::Query((M11 AND warren AND shannon)) Parsing query 'Xapian::Query((M11 AND warren AND shannon))' 611 results found {975786,975788,976495,976496,976603,976604,976605,976653,976654,976655,976656,977247,978602,978603,978604,982193,982194,982195,982196,982237,982883,983505,983521,983522,983538,983539,983540,984358,984366,984367,984768,984769,984782,984924,987743,988632,988633,988634,988637,988638,988639,990152,990864,990865,990866,993950,993959,996317,996318,996319,996429,996430,996431,996503,996504,996505,996594,996595,996596,996981,1000204,1000205,1000215,1000286,1000287,1000288,1000340,1000382,1000433,1000434,1000435,1000867,1000868,1000869,977244,977245,977246,990160,975793,982927,982928,982929,994283,994284,994285,983865,983866,983867,988662,988663,988664,983762,983763,983764,977138,994079,977164,977165,977166,977167,984994,984995,984996,1001216,1001217,1001218,996426,984305,984306,984307,984352,984353,984354,996397,1000197,982933,982934,982935,1000211,984361,984362,984363,978573,989961,989962,989963,989322,994093,997031,984341,984342,983550,983551,983552,983553,989326,989327,989328,978580,983565,984826,984827,984828,994825,994826,994827,984990,984991,984992,1001031,983883,983884,983885,997075,997076,997077,984372,984373,984374,978618,996955,996956,996957,975801,975932,978553,978554,978555,982512,984902,984903,1000267,1000268,1000269,976616,976617,983819,983820,983821,989511,989512,989513,996964,996965,996966,984315,984843,984844,984845,987788,987789,987790,989466,989467,989468,977965,1000377,978000,978001,978002,978715,978716,978717,978718,978719,978720,978721,978722,978723,984864,984865,984866,977925,982923,982924,982925,995436,995437,995438,988400,988401,988402,990791,990792,976607,976608,976609,983004,995442,996973,996974,996975,984927,984928,984931,991434,991435,991436,984926,989351,989352,989353,983104,984929,984919,977517,978703,978704,978705,988537,988538,988539,993738,993739,984870,984871,984872,988783,988784,988785,989340,989341,989342,982930,982931,982932,984950,984951,984471,996214,996215,996216,996223,996224,996225,996219,996220,996221,996279,996280,996281,993758,993759,993760,993955,993956,993957,977858,977859,977860,994206,990079,990080,990081,990082,1000912,984887,984888,984889,1000929,983298,984487,984488,984489,989313,989314,989315,977864,977865,977866,1001133,975809,975810,975811,984777,984778,984779,978812,996978,975830,977936,995465,975828,978003,978004,978005,978817,988677,978825,973971,988680,975779,993839,993840,993841,983027,983028,983029,983030,977872,993860,993862,978828,990616,975822,978031,978874,978875,978876,978837,978838,978839,987814,987815,987816,988682,977206,983609,988689,988690,988691,977526,1000880,1000881,1000882,978801,983617,983618,983619,983620,990166,990167,990168,978808,978809,978810,987874,987875,987876,990170,990171,990172,978822,978823,978824,983640,990656,975974,983592,983593,983594,983595,983683,983684,983685,1000233,1000234,1000235,995500,995501,995502,995503,988613,988614,988615,996038,978831,978832,978833,987725,987726,990173,990174,990175,984878,984879,984880,984881,999531,988081,988082,988083,975829,989589,989590,989591,975676,975677,975678,975790,975791,975792,990176,996688,976048,976049,976050,989443,989444,989445,978853,987745,987746,987747,1000221,1000222,1000223,982164,982165,982166,983622,983623,983624,984144,1000650,985013,985014,985015,990408,995177,998011,984632,985018,985019,985020,982167,975883,991023,979094,985021,985022,985023,982187,982188,982189,978305,988266,982213,982214,982215,978007,978008,978009,1001188,1001189,1001190,978527,978528,978529,994592,994593,994594,990562,985184,986102,978532,978533,978534,988549,978624,978625,978626,988563,982299,982300,982301,987888,987889,989400,989401,989402,994539,994540,994541,1000225,1000226,1000227,988672,988673,988674,991809,991810,991811,976096,976097,976098,976103,976104,976105,976107,976108,976109,984382,984383,984384,975910,975911,975912,1000900,975918,975919,975920,975938,975939,975940,977326,990031,990032,990033,1001446,987948,990083,990084,990085,990182,990183,990184,990189,990190,990191,990199,990200,990201,1001255,990207,990208,990209,990090,990091,990092,990194,990195,990196,990692,990216,983571,983572,983573,1000361,1000362,976759,976760,976761,993181,995821,995822,995824,975453,987530,981892,995888,995889,995890,996415,996416,996417,978841,978842,978843,996442,996443,996444,978863,978864,978865,995893,995894,995895,995903,995904,995905} Elapsed time: 0.0667 seconds. # vi xapian-read-test.pl xapian-read-test.pl modified to add-database domain.com-2004-10: # ./xapian-read-test.pl /data1/xapian/domain.com-2004 M11 warren shannon Query: Xapian::Query((M11 AND warren AND shannon)) Parsing query 'Xapian::Query((M11 AND warren AND shannon))' 611 results found {} Elapsed time: 0.0847 seconds.
Olly Betts
2004-Dec-21 21:56 UTC
[Xapian-discuss] Search::Xapian add_database'd search results are odd?
On Tue, Dec 21, 2004 at 01:32:13PM -0800, Eric Parusel wrote:> Anyways, I've been testing out using $db->add_database() when searching, > and it seems like the docids I'm getting out of it are incorrect, almost > as though they're "double" what they should be (numerically)... > > the docids that exist should be around 950,000 and 1000000 not around > 1900000, etc...If you search over more than one database, the docids in the underlying databases are mapped to avoid collisions. The mapping is (at least currently): did_merged = (did_raw - 1) * number_of_databases + offset where offset ranges from 1 to number_of_databases. You can generally just treat did_merged as an opaque value and use it with the combined database to retrieve the appropriate document. Cheers, Olly