Gupteshwar Joshi
2007-Feb-21 10:36 UTC
[Xapian-discuss] Indexing specific data.(Help required)
Hello, Thank you for your last replies regarding postgres connection. I will try that sometime but I am approaching bit different way. I have created some csv out of the postgres database and applying indexing on that csv. But problem is that my data is in Devnagari script and I use UTF-8 encoding for it's support. By applying some scripting called souindics which extracts sound code out of the word and store it with in English letters. So, I have to process first with above step and then with xapian php-binding to index and search . But in this process my original document gets besides and result appears as my sound code instead. So, is there any thing by which I can maintain the reference to original document?. Can it be possible to index only specific column of the csv? Thank you Regards -- (((()))) (@ @) (_) +----oOO------------Ooo----------+ | Gupteshwar D Joshi | | | +------------------------------------+ |___|___| | | | | ooO Ooo
On Wed, Feb 21, 2007 at 04:06:30PM +0530, Gupteshwar Joshi wrote:> But problem is that my data is in Devnagari script and I use UTF-8 encoding > for it's support. > By applying some scripting called souindics which extracts sound code > out of the word and store it with in English letters. > So, I have to process first with above step and then with xapian > php-binding to index and search . > But in this process my original document gets besides and result appears as > my sound code instead.Note that there's no reason why you can't store the UTF-8 Devnagari script version in the document data, but generate terms from the anglicised version. It occurs to me that this "sounindics" is a term normalisation procedure, so it's a lot like a stemming algorithm in many ways. I can't find any information on Google about it though - "sounindics" has no matches and "sounindic" only finds a passing mention in someone's CV (a reference to using Xapian in fact!) But perhaps this algorithm should be wrapped as a Xapian::Stem class, which would make it very easy to index and query Devnagari script in this way. Do you have a reference for it?> So, is there any thing by which I can maintain the reference to original > document?.An alternative approach is to store a unique id for the postgres database record in the Xapian document data.> Can it be possible to index only specific column of the csv?Erm, of course. Just only generate terms from that column! Cheers, Olly