Matt Barnicle
2007-Jun-01 00:05 UTC
[Xapian-discuss] Searching date range on a custom field
Hi everyone. I'm getting closer to getting our site indexed properly, but I can't figure out the date searching question.. I've read as many things as I can find, but I'm not quite getting the picture of what I have to do in order to be able to search on a date custom field. I created our event pages with the following meta tags: <meta name="dateBegin" content="20070210" /> <meta name="dateEnd" content="20070217" /> These correspond to the start and end dates of an event. I also have a tag for when the rendered page is an event (we have many types of pages on the site): <meta name="pageType" content="event" /> So, I need to search on events given a date. Say the date is 20070213, how do I search for event pages where the supplied date is within the dateBegin .. dateEnd range? I'm using htdig to crawl the site, and htdig2omega to create the index. The index creation and field mapping works just fine, and so does searching on the boolean page type. Here is my htdig2omega.script file: url : field=url hash boolean=Q unique=Q title : weight=3 index truncate=80 field=title lastMod : field=lastmod size : field=size sample : index truncate=300 field=sample metaDesc : field=metadesc index pageType : field=pageType boolean=XPT eventName : field=eventName weight=3 index dateBegin : field=dateBegin date=yyyymmdd dateEnd : field=dateEnd date=yyyymmdd city : field=city weight=3 index state : field=state weight=3 index zip : field=zip weight=3 index I found some posts from the list archives that discuss date ranges, but I can't figure out if they will help me in this situation or not.. I think they're talking about searching on date ranges on indexed documents, that is, the date the document was indexed. http://article.gmane.org/gmane.comp.search.xapian.general/1008 http://article.gmane.org/gmane.comp.search.xapian.general/4268 So, what do I need to do?? Thank you! - Matt
On Thu, May 31, 2007 at 03:12:18PM -0700, Matt Barnicle wrote:> <meta name="dateBegin" content="20070210" /> > <meta name="dateEnd" content="20070217" /> > > These correspond to the start and end dates of an event. I also have a > tag for when the rendered page is an event (we have many types of pages > on the site): > > <meta name="pageType" content="event" /> > > So, I need to search on events given a date. Say the date is 20070213, > how do I search for event pages where the supplied date is within the > dateBegin .. dateEnd range?This is backward to how Omega's date range feature works - that expects that each document has a date and the user wants to restrict their search to documents within a specified date range.> I'm using htdig to crawl the site, and htdig2omega to create the index. > The index creation and field mapping works just fine, and so does > searching on the boolean page type. Here is my htdig2omega.script file: > > url : field=url hash boolean=Q unique=Q > title : weight=3 index truncate=80 field=title > lastMod : field=lastmod > size : field=size > sample : index truncate=300 field=sample > metaDesc : field=metadesc index > pageType : field=pageType boolean=XPT > eventName : field=eventName weight=3 index > dateBegin : field=dateBegin date=yyyymmdd > dateEnd : field=dateEnd date=yyyymmddThe scriptindex "date" action is designed to allow you to do date range filtering when each document has a single date, so this won't really work. You could make it work if you ran the date action on every date in the range, but if your ranges are long, that's going to generate a lot of terms.> I found some posts from the list archives that discuss date ranges, but > I can't figure out if they will help me in this situation or not.. I > think they're talking about searching on date ranges on indexed > documents, that is, the date the document was indexed.Yes, they are. What I'd suggest you do is to put the dateBegin and dateEnd into document values, so you can access them quickly during the match process. For example: dateBegin : field=dateBegin value=0 dateEnd : field=dateEnd value=1 And then write a little MatchDecider subclass which checks takes a date and checks if a document's date range includes it. Something like this totally untested code: class DateRangeMatchDecider : public Xapian::MatchDecider { string date; public: DateRangeMatchDecider(const string & date_) : date(date_) { } bool operator()(const Xapian::Document &doc) const { return doc.get_value(0) <= date && date <= doc.get_value(1); } }; (You might want to swap the order of the checks, depending whether you expect user dates are more likely to fall before or after events in the database.) Then you can instantiate this class with the date the user wants to search for and pass it to Enquire::get_mset(). You'll also want to OP_FILTER with XPTevent to only consider events. If you want the user to be able to search for any event happening within a range of dates, you can easily extend the above class to take a pair of dates and check if it overlaps with the document's range. Cheers, Olly