Jim Razmus II
2011-Nov-06 19:48 UTC
[Xapian-discuss] What is the best way to represent a category hierarchy using term prefixes in Xapian?
Assume I have the following example hierarchy: US>Michigan >>Detroit >>Grand Rapids >>Lansing >Minnesota >>Grand Rapids >>Minneapolis >>St Paul >Ohio >>Columbus >>Grand Rapids >>SanduskyI see two ways that I could index a ?Grand Rapids, Michigan? document with prefixed terms: XFIRSTLEVELus XSECONDLEVELmichigan XTHIRDLEVELgrandrapids or XFIRSTLEVELus XSECONDLEVELus_michigan XTHIRDLEVELus_michigan_grandrapids I?m inclined to use the second approach thinking that it will return more intuitive results. That is, a search that includes Grand Rapids, Michigan search criteria is less likely to include documents from Minnesota and Ohio. However, two aspects of this approach bother me. First, the creation and maintenance of term prefixes for each level of the hierarchy feels wrong. Second, the concatenation of values seems like a surrogate for using weights. So, what is the best way to represent a hierarchy with term prefixes? Note, I posted this question to stackoverflow here: stackoverflow.com/questions/7585948/what-is-the-best-way-to-represent-a- category-hierarchy-using-term-prefixes-in-xa I didn't get any responses so I thought I'd try here next. Best regards, Jim
Justin Finkelstein
2011-Nov-06 22:04 UTC
[Xapian-discuss] What is the best way to represent a category hierarchy using term prefixes in Xapian?
Oddly enough I have a solution for that which has been in use for a few years and works quite well. Our set up is similar, although each of our categories have GUiDs as IDs so we don't store the category names. However, it depends on how you want to get results. In my case, I have categories with products in and, as I move up the hierarchy, I want to retrieve everything below that point; so using your data, US would get me all of the US, Michigan everything in Michigan. To achieve, this, we simply stored all of the relevant category names for each document with the prefix CATEGORY: so in your example, the 'Grand Rapids' doc would contain the following terms: CATEGORY:grand rapids CATEGORY:michigan CATEGORY:us Therefore, searching for any of these will find you data in the Grand Rapids. What would help you, I think, is to know that you can have multiple terms with the same prefix per document - this is what we do and it works very well in this occasion. Knowing which type of term to use can be helpful too; in our case, our terms are GUIDs and we don't want any partial matching - so we should be using add_boolean (boolean terms not affecting the weight of the document, just 'yes' or 'no'). How does this sound? justin On 6 November 2011 19:48, Jim Razmus II <bonetruck at gmail.com> wrote:> Assume I have the following example hierarchy: > > US > >Michigan > >>Detroit > >>Grand Rapids > >>Lansing > >Minnesota > >>Grand Rapids > >>Minneapolis > >>St Paul > >Ohio > >>Columbus > >>Grand Rapids > >>Sandusky > > I see two ways that I could index a ?Grand Rapids, Michigan? document with > prefixed terms: > > XFIRSTLEVELus > XSECONDLEVELmichigan > XTHIRDLEVELgrandrapids > > or > > XFIRSTLEVELus > XSECONDLEVELus_michigan > XTHIRDLEVELus_michigan_grandrapids > > I?m inclined to use the second approach thinking that it will return more > intuitive results. That is, a search that includes Grand Rapids, Michigan > search > criteria is less likely to include documents from Minnesota and Ohio. > > However, two aspects of this approach bother me. First, the creation and > maintenance of term prefixes for each level of the hierarchy feels wrong. > Second, the concatenation of values seems like a surrogate for using > weights. > > So, what is the best way to represent a hierarchy with term prefixes? > > Note, I posted this question to stackoverflow here: > > > stackoverflow.com/questions/7585948/what-is-the-best-way-to-represent-a- > category-hierarchy-using-term-prefixes-in-xa > > I didn't get any responses so I thought I'd try here next. > > Best regards, > Jim > > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > lists.xapian.org/mailman/listinfo/xapian-discuss >-- -- Redwire Design Limited 54 Maltings Place169 Tower Bridge RoadLondon SE1 3LJwww.redwiredesign.com [ 020 7403 1444 ] - voice[ 020 7378 8711 ] - fax [ 07968 180 720 ] - mobile
Olly Betts
2011-Nov-07 00:07 UTC
[Xapian-discuss] What is the best way to represent a category hierarchy using term prefixes in Xapian?
On Sun, Nov 06, 2011 at 07:48:27PM +0000, Jim Razmus II wrote:> XFIRSTLEVELus > XSECONDLEVELus_michigan > XTHIRDLEVELus_michigan_grandrapids[...]> However, two aspects of this approach bother me. First, the creation and > maintenance of term prefixes for each level of the hierarchy feels wrong.There's not actually any need to use different prefixes here, since the different levels can't collide as the number of '_' will be different. So you could just use the same prefix for any level, e.g.: XLOCus XLOCus_michigan XLOCus_michigan_grandrapids> Second, the concatenation of values seems like a surrogate for using weights.I don't really understand what you mean here...> So, what is the best way to represent a hierarchy with term prefixes?Probably with a single prefix, as above, but it perhaps depends what you're wanting to do with the categories. For example, if you want to calculate facets based on locations, you really want the location in a value slot. Cheers, Olly