Hello all,
Is it possible to use parallel indexing and still ensure unique documents in
the merged index? Using the canned example, I''m ending up with
entries. It''s just adding them all together even though I''ve
defined unique
a :key.
How can I tell the IndexWriter to keep my uniqueness constraints?
For example, imagine that I have two indexes of a phone book:
"index_one" contains a unique set of names A-through-P (let''s
say the key is
their phone number).
"index_two" contains a unique set of names K-through-Z.
When I merge them, I would hope to get a unique index of A-through-Z, but
I''m getting double entries where they overlap, K-through-P.
Here''s some code to demonstrate. My :id field is a long-ish unique
alphanumeric string. In the example below, "one" and "two"
are actually
identical copies, each containing about 60,000 docs. I was hoping to get a
combined index containing the same 60,000 docs, but ended up with 120,000.
Any help will be greatly appreciated. Thanks!
one = "Documents/bucket/index_1"
two = "Documents/bucket/index_2"
merged = "Documents/bucket/merged_index"
pfa = PerFieldAnalyzer.new(LetterAnalyzer.new)
pfa[:id] = WhiteSpaceAnalyzer.new
field_infos = FieldInfos.new(:term_vector => :no)
field_infos.add_field(:id, :index => :untokenized)
index_two = Ferret::I.new(
:key => :id,
:max_buffer_memory => 0x8000000,
:merge_factor => 5,
:path => one,
:analyzer => pfa,
:field_infos => field_infos)
index_one = Ferret::I.new(
:key => :id,
:max_buffer_memory => 0x8000000,
:merge_factor => 5,
:path => two,
:analyzer => pfa,
:field_infos => field_infos)
readers = []
readers << IndexReader.new(one)
readers << IndexReader.new(two)
puts "size of index_one = "+index_one.size.to_s
puts "size of index_two = "+index_two.size.to_s
index_writer = IndexWriter.new(:path => merged)
readers.each{ |reader| reader.close() }
i = Ferret::I.new(:path => merged)
puts "size before optimize = "+i.size.to_s
puts "size after optimize = "+i.size.to_s
-------------- next part --------------
An HTML attachment was scrubbed...