Thanks that helped :). I am still trying to cover add_value some more though since I seem to not understand it totally. I guess it is because I am used to Lucene and Sphinx and Solr and it appears that Xapian seems to attach the type of value stored more on add_value. Like for example I am still a bit confused on how slotno actually works and what it actually is. I think the main thing is showing, in php (or C++ in the official docs), how to index a complex document of say: { "_id": string "text": string "tags": multivalue "date_created": timestamp } And sorting it via the different ways and showing exactly how to define that date_created is a timestamp and sort on that time stamp and also showing how to query tags field. This would really break the ice for anyone wanting to learn Xapian in PHP. Especially if they are used to other search techs and are finding it hard to get their mind to see another way of doing it. Using the API doc directly did help quite a bit but at times I was finding myself trying to reference, say, add_value to a more indepth topic about itself and failing. Faceting section is great that shows examples and everything and quite a few other sections are good but I am kinda stuck when it comes to actually indexing and defining types Like in the official documentation it would be awesome if indexing and adding documents sections would give you examples on indexing in the native C++ and explaining how the different indexing method works and what add_value is and how it all kinda binds together with slotno and that and just in general how Xapian indexing, add of documents, add of values, the actual methods and all that works. I see a lot of documentation based around the methods but none on the methods themselves. If I am being blind with the docs do correct me :) Thanks, On 21 September 2011 12:00, <xapian-discuss-request at lists.xapian.org> wrote:> Send Xapian-discuss mailing list submissions to > xapian-discuss at lists.xapian.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.xapian.org/mailman/listinfo/xapian-discuss > or, via email, send a message with subject or body 'help' to > xapian-discuss-request at lists.xapian.org > > You can reach the person managing the list at > xapian-discuss-owner at lists.xapian.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Xapian-discuss digest..." > > > Today's Topics: > > 1. Understanding API Documentation for PHP (Sam Millman) > 2. Re: Understanding API Documentation for PHP (Peter Van Dijk) > 3. Re: Understanding API Documentation for PHP (James Aylett) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 20 Sep 2011 15:53:20 +0100 > From: Sam Millman <sam.millman at gmail.com> > Subject: [Xapian-discuss] Understanding API Documentation for PHP > To: xapian-discuss at lists.xapian.org > Message-ID: > <CALKyTE4sGEQOHK+b52rHEGiEewO6_WfNR_r-H5XnO4a+uEXRuA at mail.gmail.com > > > Content-Type: text/plain; charset=ISO-8859-1 > > Hey everyone, > > I am brand new to Xapian so forgive me if I am just being noob. > > I looked over the sparse documentation for the Xapian library and its PHP > hooks and I am really confused how to complete my index. > > I understand how to add documents etc etc etc and how to build queries but > how I do specify in add_value what field type xapian should take (i.e. > tokenized, unindexed, indexed)? > > Is there a list of slotno's anywhere that I can reference to? > > In a general sense is there any more programmer orientated > documentation/tutorials rather than researcher orientated document than > http://xapian.org/docs/ that better describes the steps of indexing and > searching etc? > > Thanks in advance, > > > ------------------------------ > > Message: 2 > Date: Wed, 21 Sep 2011 09:23:36 +1000 > From: Peter Van Dijk <pvandijk at vision6.com.au> > Subject: Re: [Xapian-discuss] Understanding API Documentation for PHP > To: xapian-discuss <Xapian-discuss at lists.xapian.org> > Message-ID: > <CALyzhQHfX=x8nFL=QhM01S0V4XD56VmKh1mOBRespyn87yObDg at mail.gmail.com > > > Content-Type: text/plain; charset=ISO-8859-1 > > On 21 September 2011 00:53, Sam Millman <sam.millman at gmail.com> wrote: > > > I understand how to add documents etc etc etc and how to build queries > but > > how I do specify in add_value what field type xapian should take (i.e. > > tokenized, unindexed, indexed)? > > > > I'm not sure if i'm interpreting what you're saying correctly, but if you > want to tokenize or index things, you want to look towards > XapianTermGenerator::index_text instead. > > Values are stored against the documents and aren't directly part of the > indexed text, so just set your class up with some basic constants, const > VALUE_X = 0; const VALUE_Y = 1; > and then pass them to add_value and get_value as required, ie. > $document->add_value(self::VALUE_X, $anything); or > $document->get_value(VALUE_X); > > In a general sense is there any more programmer orientated > > documentation/tutorials rather than researcher orientated document than > > http://xapian.org/docs/ that better describes the steps of indexing and > > searching etc? > > > The docs arent really researcher oriented; the overview page is a good > place > to start, as it describes how the api is used: > http://xapian.org/docs/overview.html > > That being said, having recently worked on a PHP based deployment of Xapian > i can tell you i struggled somewhat with the same thing. > The PHP wrapper really doesn't clue you in as to what types you should be > passing to any given method, so you'll find yourself having to wrap a lot > of > stuff with intval( ) or strval( ) for the methods to work correctly, > which is a side effect of trying to use a wrapper designed for a strongly > typed language with a loosely typed one. > > That being said, the api docs are spot on for the most part, i'd recommend > cracking open xapian.php and reading it in line with > http://xapian.org/docs/apidoc/html/annotated.html, > given any object in the api docs just look up it's declaration in > xapian.php > and you should be able to figure out how to use it correctly, as all the > types are correctly listed in the documentation. > > I'm not going to have a chance for a while, but if i get the opportunity > i'd > like to write some docs for using Xapian in PHP for the wiki. > > > ------------------------------ > > Message: 3 > Date: Wed, 21 Sep 2011 08:51:28 +0100 > From: James Aylett <james-xapian at tartarus.org> > Subject: Re: [Xapian-discuss] Understanding API Documentation for PHP > To: Peter Van Dijk <pvandijk at vision6.com.au> > Cc: xapian-discuss <Xapian-discuss at lists.xapian.org> > Message-ID: <7F621A53-78C6-4F93-A3F9-3A9F5C5A9408 at tartarus.org> > Content-Type: text/plain; charset=us-ascii > > On 21 Sep 2011, at 00:23, Peter Van Dijk wrote: > > > That being said, having recently worked on a PHP based deployment of > Xapian > > i can tell you i struggled somewhat with the same thing. > > The PHP wrapper really doesn't clue you in as to what types you should be > > passing to any given method, so you'll find yourself having to wrap a lot > of > > stuff with intval( ) or strval( ) for the methods to work correctly, > > which is a side effect of trying to use a wrapper designed for a strongly > > typed language with a loosely typed one. > > I think this is because of a lack of attention to the PHP bindings, sadly; > the aim is for idiomatic code in the bindings language to work, and we do > well in say Python but less well in others. > > If there are specific things that would help here, it'd be great to get > them either onto a page on the wiki talking about how the PHP bindings could > be improved, or into tickets. Beyond <http://trac.xapian.org/ticket/520> I > couldn't find anything at the moment. > > Best, > James > > -- > James Aylett > talktorex.co.uk - xapian.org - devfort.com - spacelog.org > > > > > ------------------------------ > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss > > > End of Xapian-discuss Digest, Vol 88, Issue 9 > ********************************************* >
James Aylett
2011-Sep-21 14:00 UTC
[Xapian-discuss] Xapian-discuss Digest, Vol 88, Issue 9
On 21 Sep 2011, at 12:24, Sam Millman wrote:> I am still trying to cover add_value some more though since I seem to not > understand it totally.Values are used for a few specific things; it sounds like you need it for sorting and probably range searches, in which case you need to turn your data (timestamp in this case) into an appropriate string before putting it in a value. See <http://xapian.org/docs/valueranges.html#datevaluerangeprocessor>, for instance. For sorting by value, see <http://xapian.org/docs/sorting.html#sorting-by-value>.> Faceting section is great that shows examples and everything and quite a few > other sections are good but I am kinda stuck when it comes to actually > indexing and defining typesXapian doesn't have types. You turn your document into terms (plus document data for display purposes, and values for sorting and collapsing); terms can be generated by TermGenerator (word splitting and stemming), or by other means. (TermGenerator is designed to work with QueryParser so you don't have to construct Query objects by hand at the other end.) The overview <http://xapian.org/docs/overview.html> covers this, although it's easy to dismiss it from the opening section I suspect :-(> Like in the official documentation it would be awesome if indexing and > adding documents sections would give you examples on indexing in the native > C++ and explaining how the different indexing method worksAn example indexer is covered in the quickstart doc: <http://xapian.org/docs/quickstart.html>. Note that values aren't mentioned in quickstart, because you don't need them to get going (indeed, you can often build entire systems without them, although not if you're working with dates that people need to be able to sort or filter on).> and what > add_value is and how it all kinda binds together with slotno and that and > just in general how Xapian indexing, add of documents, add of values, the > actual methods and all that works. > > I see a lot of documentation based around the methods but none on the > methods themselves.I'm not sure I understand what you mean; the methods are documented, usually in reasonable detail, in the API docs. Here's the documentation for Xapian::Document::add_value <http://xapian.org/docs/apidoc/html/classXapian_1_1Document.html#12857fccd3448ec1db91311c16d67f6c>: void Xapian::Document::add_value ( Xapian::valueno slot, const std::string & value ) Add a new value. The new value will replace any existing value with the same number (or if the new value is empty, it will remove any existing value with the same number).> If I am being blind with the docs do correct me :)One of the problems we have is it's not always obvious where to look in the docs to find things out. If there are places that would obviously benefit from cross links, or entire new documents that you feel should exist, you can put a note in <http://trac.xapian.org/wiki/MissingDocumentation>, or mention it on the list. J -- James Aylett talktorex.co.uk - xapian.org - devfort.com
James Aylett
2011-Sep-21 14:30 UTC
[Xapian-discuss] Xapian-discuss Digest, Vol 88, Issue 9
Please keep replies on-list. On 21 Sep 2011, at 15:19, Sam Millman wrote:>> One of the problems we have is it's not always obvious where to look in the docs to find things out. > > Absolutely. Like when I first saw the documentation I thought "I'll click on indexing to see how to index". But instead of talking about how to form a indexer it just talks about certain considerations you gotta take.Yeah, we should probably push the overview & quickstart before you try to read anything else :)>> The new value will replace any existing value with the same number (or if the new value is empty, it will remove any existing value with the same number). > > That's exactly what is confusing me about that function. It says it requires a Xapian::valueno which I try to find out about but get bombarded by other confusing method descriptions before I have even learnt how to index fully.Hmm. From our point of view, I'd say you're trying to run before you can walk: you're trying to add values before knowing how you're going to use them :) There should probably be a "what values get used for" document to help here.> Also what does it mean by number? How does number work? what does it do in Xapian? Do I need to manipulate it in anyway?It's?a number. Like: an integer. It specifies *which* value in the document you're operating on. I think the problem here goes back to an earlier point: this is more obvious if you're a C++ programmer (because the typedef tells you that Xapian::valueno is unsigned), and we need to cater better for (in this case) PHP programmer. It's possible that rewording the docstring here might help?the formal parameter name is supposed to be a hint that these are numbered slots to put values in. Maybe something like: Set the Document value in the given slot. The new value will replace any existing value in the same slot (or if the new value is empty, it will remove any existing value in the same slot). It doesn't help that value has a dual meaning here because of the name of the formal parameter, but that's difficult to change :/ J -- James Aylett talktorex.co.uk - xapian.org - devfort.com