Hi, I followed the howto to use keys for documents: http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument If I add two documents with the same id, only one gets added to the index as expected. However, I have found the key and id do not match. So, attempting to access the index with the id does not work. For instance, when I run this search: INDEX.search_each(query) do |doc, score| logger.debug("Found doc: #{doc}, id: #{INDEX[doc][''id'']}") end The following is output: Found doc: 3, id: 69 Found doc: 17, id: 88 Is this as designed or am I missing something? Thanks, Tom
On Jan 27, 2006, at 8:10 AM, Tom Davies wrote:> I followed the howto to use keys for documents: > > http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument > > If I add two documents with the same id, only one gets added to the > index as expected. However, I have found the key and id do not match. > So, attempting to access the index with the id does not work. > > For instance, when I run this search: > > INDEX.search_each(query) do |doc, score| > logger.debug("Found doc: #{doc}, id: #{INDEX[doc][''id'']}") > end > > The following is output: > > Found doc: 3, id: 69 > Found doc: 17, id: 88 > > Is this as designed or am I missing something?The doc variable in your code is what is known in Lucene as the document "id". This is an internal number used by the index. It has no relation to the primary key feature that Ferret adds. You''ve called your field "id", which confuses things a bit. The document id is subject to change, if documents are deleted in the middle and the index is optimized. So don''t rely on the internal number for anything long-term. Erik
Hi Erik, Thanks for your response. Perhaps I am misunderstanding the how to, but it implies that when you create an index and map the key to the id as follows: index = Index::Index.new(:key => :id) index << {:id => 23, :data => "This is the data..."} index << {:id => 23, :data => "This is the new data..."} Then you can access this document by using either of the following: index["23"] #Get document with key 23 index[23] #Get document with internal number 23. It is NOT key field. It is just internal Ferret id. This implies that the id and key are the same, but according to my first email example, they are not. Is this howto just misleading? Based on what you said, the internal number will not necessarily match the key. Tom On 1/27/06, Erik Hatcher <erik at ehatchersolutions.com> wrote:> > On Jan 27, 2006, at 8:10 AM, Tom Davies wrote: > > I followed the howto to use keys for documents: > > > > http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument > > > > If I add two documents with the same id, only one gets added to the > > index as expected. However, I have found the key and id do not match. > > So, attempting to access the index with the id does not work. > > > > For instance, when I run this search: > > > > INDEX.search_each(query) do |doc, score| > > logger.debug("Found doc: #{doc}, id: #{INDEX[doc][''id'']}") > > end > > > > The following is output: > > > > Found doc: 3, id: 69 > > Found doc: 17, id: 88 > > > > Is this as designed or am I missing something? > > The doc variable in your code is what is known in Lucene as the > document "id". This is an internal number used by the index. It has > no relation to the primary key feature that Ferret adds. You''ve > called your field "id", which confuses things a bit. > > The document id is subject to change, if documents are deleted in the > middle and the index is optimized. So don''t rely on the internal > number for anything long-term. > > Erik > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
Hi Tom, I can see how this would be confusing. The internal id and the id you give a document are unrelated and they''ll only be the same like this when you add documents in order starting with id 0. I''ll change the howto to remove the confusion. Cheers, Dave On 1/28/06, Tom Davies <atomgiant at gmail.com> wrote:> Hi Erik, > > Thanks for your response. Perhaps I am misunderstanding the how to, > but it implies that when you create an index and map the key to the id > as follows: > > index = Index::Index.new(:key => :id) > index << {:id => 23, :data => "This is the data..."} > index << {:id => 23, :data => "This is the new data..."} > > Then you can access this document by using either of the following: > > index["23"] #Get document with key 23 > index[23] #Get document with internal number 23. It is NOT key > field. It is just internal Ferret id. > > This implies that the id and key are the same, but according to my > first email example, they are not. Is this howto just misleading? > Based on what you said, the internal number will not necessarily match > the key. > > Tom > > > On 1/27/06, Erik Hatcher <erik at ehatchersolutions.com> wrote: > > > > On Jan 27, 2006, at 8:10 AM, Tom Davies wrote: > > > I followed the howto to use keys for documents: > > > > > > http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument > > > > > > If I add two documents with the same id, only one gets added to the > > > index as expected. However, I have found the key and id do not match. > > > So, attempting to access the index with the id does not work. > > > > > > For instance, when I run this search: > > > > > > INDEX.search_each(query) do |doc, score| > > > logger.debug("Found doc: #{doc}, id: #{INDEX[doc][''id'']}") > > > end > > > > > > The following is output: > > > > > > Found doc: 3, id: 69 > > > Found doc: 17, id: 88 > > > > > > Is this as designed or am I missing something? > > > > The doc variable in your code is what is known in Lucene as the > > document "id". This is an internal number used by the index. It has > > no relation to the primary key feature that Ferret adds. You''ve > > called your field "id", which confuses things a bit. > > > > The document id is subject to change, if documents are deleted in the > > middle and the index is optimized. So don''t rely on the internal > > number for anything long-term. > > > > Erik > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >