Gupteshwar Joshi
2007-Feb-21  10:36 UTC
[Xapian-discuss] Indexing specific data.(Help required)
Hello,
 Thank you for your last replies regarding postgres connection.
I will try that sometime but I am approaching bit different way.
I have created some csv out of the postgres database and applying indexing
on that csv.
But problem is that my data is in Devnagari script and I use UTF-8 encoding
for it's support.
By applying some scripting called souindics which extracts sound code out of
the word and
store it with in English letters.
 So, I have to process first with above step and then with xapian
php-binding to index and search .
But in this process my original document gets besides and result appears as
my sound code instead.
So, is there any thing by which I can maintain the reference to original
document?.
Can it be possible to index only specific column of the csv?
Thank you
Regards
-- 
              (((())))
              (@ @)
                 (_)
+----oOO------------Ooo----------+
|      Gupteshwar D Joshi            |
|                                              |
+------------------------------------+
              |___|___|
                 | | | |
              ooO Ooo
On Wed, Feb 21, 2007 at 04:06:30PM +0530, Gupteshwar Joshi wrote:> But problem is that my data is in Devnagari script and I use UTF-8 encoding > for it's support. > By applying some scripting called souindics which extracts sound code > out of the word and store it with in English letters. > So, I have to process first with above step and then with xapian > php-binding to index and search . > But in this process my original document gets besides and result appears as > my sound code instead.Note that there's no reason why you can't store the UTF-8 Devnagari script version in the document data, but generate terms from the anglicised version. It occurs to me that this "sounindics" is a term normalisation procedure, so it's a lot like a stemming algorithm in many ways. I can't find any information on Google about it though - "sounindics" has no matches and "sounindic" only finds a passing mention in someone's CV (a reference to using Xapian in fact!) But perhaps this algorithm should be wrapped as a Xapian::Stem class, which would make it very easy to index and query Devnagari script in this way. Do you have a reference for it?> So, is there any thing by which I can maintain the reference to original > document?.An alternative approach is to store a unique id for the postgres database record in the Xapian document data.> Can it be possible to index only specific column of the csv?Erm, of course. Just only generate terms from that column! Cheers, Olly