Hi My name is Shiv and I am student at IIIT-D ( http://www.iiitd.edu.in/ ) pursuing my undergrad education in Computer Science. I am currently in my final year. I am following the Xapian Gsoc guide ( https://trac.xapian.org/wiki/GSoC%20Guide#Mentoring). I have successfully checked out and built the code. I am trying out the python codes given at ( http://getting-started-with-xapian.readthedocs.org/en/latest/overview.html#datasets-and-example-code). The following are the steps that I have performed.: - Checked out the code at https://github.com/xapian/xapian-docsprint - - python index1.py ../../data/states.csv index1_states.py.db - - python search1.py index1_states.py.db/ Montana However I am getting the following output: INFO:xapian.search:'Montana'[0:10] Could you please help me with what I am missing or doing wrong because I am not able to find the actual matches for Montana that is present in data/states.csv Thanks Shiv -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160215/4c1a0628/attachment.html>
On 15 Feb 2016, at 16:27, Shiv Kandikuppa <shiv12095 at iiitd.ac.in> wrote:> I am following the Xapian Gsoc guide (https://trac.xapian.org/wiki/GSoC%20Guide#Mentoring). I have successfully checked out and built the code. I am trying out the python codes given at (http://getting-started-with-xapian.readthedocs.org/en/latest/overview.html#datasets-and-example-code). The following are the steps that I have performed.: > ? Checked out the code at https://github.com/xapian/xapian-docsprint > ? python index1.py ../../data/states.csv index1_states.py.db > ? python search1.py index1_states.py.db/ Montana > However I am getting the following output: > INFO:xapian.search:'Montana'[0:10]Can you point me to where in the getting started guide it suggests you should use the states dataset with index1.py? Because they aren?t compatible. (In theory we could make the different data sets compatible with index1.py, but we can?t for instance make the museum catalogue data compatible with index_ranges2.py (which uses the year of admission to the union and population to provide sortable values). You can see the states dataset in action (with index_ranges2.py) at in the howto on range queries, where it discussed handling dates (https://getting-started-with-xapian.readthedocs.org/en/latest/howtos/range_queries.html#handling-dates). J -- James Aylett, occasional trouble-maker xapian.org
[Please keep replies on list, so that everyone can help and benefit.] On 15 Feb 2016, at 20:39, Shiv Kandikuppa <shiv12095 at iiitd.ac.in> wrote:> I initially misunderstood the guide and wrongly tried the above steps. Now I have gone over the documentation for the parts pertaining to indexing, searching and range queries.Great.> I could not understand the part [The xapian.DateValueRangeProcessor is working on value slot 2, with an ?epoch? of 1860 (so two digit years will be considered as starting at 1860 and going forward as far 1959).]. I have tried the command: > > python code/python/search_ranges2.py statesdb 61..64 > > hoping to get all states that were admitted between 1861 and 1864 but instead got the output as INFO:xapian.search:'61..64'[0:100] > > Could you help me as to how do we use the 2 digit years in our range queries.Okay, this isn?t quite as clear as it should be in the documentation. The _DateValueRangeProcessor_, operating on slot 2, is dealing with dates in the form ?mm/dd/yy? and ?mm/dd/yyyy?, and that?s the one that copes with two-digit years. The _NumberValueRangeProcessor_, operating on slot 1, and doesn?t have this feature (because it doesn?t know anything about dates or years). If you needed this functionality, you could write a custom _ValueRangeProcessor_ to do the processing, like _PopulationValueRangeProcessor_ at the end of the section. (You?d want to make the adjustments in `__call__()` before calling to the wrapped _NumberValueRangeProcessor_.) I?ll see if I can improve the way that part?s written so it?s less confusing.> python code/python/search_ranges2.py statesdb 10.. > > causes an Exception > File "code/python/search_ranges2.py", line 97, in <module> > search(dbpath = sys.argv[1], querystring = " ".join(sys.argv[2:])) > File "code/python/search_ranges2.py", line 72, in search > population = support.format_numeral(fields.get('population', 0)) > File "/home/shiv/Xapian-doc/xapian-docsprint/code/python/support.py", line 140, in format_numeral > raise ValueError("Numeral must be an int type to format") > ValueError: Numeral must be an int type to formatThat?s a problem with the states dataset, which I?ve now fixed; if you update from github it should work properly. Sorry about that! J -- James Aylett, occasional trouble-maker xapian.org
I checked out the latest code and it works fine now. Could you please guide me on what else can I do to contribute to the community. I am comfortable working with python and Java and have a decent understanding of C++ Thanks Shiv On Tue, Feb 16, 2016 at 5:43 PM, James Aylett <james-xapian at tartarus.org> wrote:> [Please keep replies on list, so that everyone can help and benefit.] > > On 15 Feb 2016, at 20:39, Shiv Kandikuppa <shiv12095 at iiitd.ac.in> wrote: > > > I initially misunderstood the guide and wrongly tried the above steps. > Now I have gone over the documentation for the parts pertaining to > indexing, searching and range queries. > > Great. > > > I could not understand the part [The xapian.DateValueRangeProcessor is > working on value slot 2, with an ?epoch? of 1860 (so two digit years will > be considered as starting at 1860 and going forward as far 1959).]. I have > tried the command: > > > > python code/python/search_ranges2.py statesdb 61..64 > > > > hoping to get all states that were admitted between 1861 and 1864 but > instead got the output as INFO:xapian.search:'61..64'[0:100] > > > > Could you help me as to how do we use the 2 digit years in our range > queries. > > Okay, this isn?t quite as clear as it should be in the documentation. The > _DateValueRangeProcessor_, operating on slot 2, is dealing with dates in > the form ?mm/dd/yy? and ?mm/dd/yyyy?, and that?s the one that copes with > two-digit years. The _NumberValueRangeProcessor_, operating on slot 1, and > doesn?t have this feature (because it doesn?t know anything about dates or > years). > > If you needed this functionality, you could write a custom > _ValueRangeProcessor_ to do the processing, like > _PopulationValueRangeProcessor_ at the end of the section. (You?d want to > make the adjustments in `__call__()` before calling to the wrapped > _NumberValueRangeProcessor_.) > > I?ll see if I can improve the way that part?s written so it?s less > confusing. > > > python code/python/search_ranges2.py statesdb 10.. > > > > causes an Exception > > File "code/python/search_ranges2.py", line 97, in <module> > > search(dbpath = sys.argv[1], querystring = " ".join(sys.argv[2:])) > > File "code/python/search_ranges2.py", line 72, in search > > population = support.format_numeral(fields.get('population', 0)) > > File "/home/shiv/Xapian-doc/xapian-docsprint/code/python/support.py", > line 140, in format_numeral > > raise ValueError("Numeral must be an int type to format") > > ValueError: Numeral must be an int type to format > > That?s a problem with the states dataset, which I?ve now fixed; if you > update from github it should work properly. Sorry about that! > > J > > -- > James Aylett, occasional trouble-maker > xapian.org > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160216/7312aa59/attachment.html>