AgustÃn
2008-Aug-19 00:15 UTC
[Xapian-discuss] Best way of showing what matched the search
Hi, We have an application which has "main" records, like, for example, a person, which may have several "dependent" records such as an attachment, an address, an email, a phone number, a note left by an user. I want the results to look like: john AND smart ---------------------- 2 results: - [John] Smith Attachment: I am very [smart]. Note: He is not as [smart] as he said. - Peter P?rez Phone number: 555-[john]-is-[smart] What I've done is: People and other main records are indexed as one document which contains all the its data and all the data of the dependent records. They also have the Igrouped prefixed term. In the data of the document I keep the type and the id of the record. Notes, attachments, etc are also indexed separately one document at a time. Each of them also has an IdependentOf<type><id> term. To do a search: 1. I do a query with the "Igrouped" prefixed term and whatever the query parser gave me. 2. I collect all the ids and types of the previous results in a list which looks like this: [IdependentOfperson1, IdependentOfcompany5, ...] 3. I remove all operators from the search the user entered (AND OR, parenthesis) and get all the search terms in a list: [john, smart] 4. I build a second query which has a negated Igrouped all the dependentof terms are ORed together and all terms are ANDed with that, example: (IdependentOfperson1 OR IdependentOfcompany5) AND john AND smart AND NOT Igrouped 5. I search for it and use the dependent record's data to locate the master records and group them together in the results. 6. I use my relational db to get the full text of the results which matched the search and use a simple algorithm which tries to look for words which are written close to each other and cut show that text to the user (which looks more or less like google's results). Considerations: 1. I index all things twice, which affects the weight all terms get. To compensate this I index not-grouped terms with a weight of 0. 2. I guess the index is much bigger than it should. 3. I could probably have two separate dbs for grouped and ungrouped items. 4. I probably should have used "collapse keys" but I think that they are essentially filters and don't really achieve what I want (which would be to logically consider all the terms or some documents as being part of one virtual bigger document). Therefore searching for john AND smart wouldn't have found "John Smith" since the words are in separate records. Am I doing something wrong? Is there any better way to do it? . A .
Olly Betts
2008-Aug-19 23:18 UTC
[Xapian-discuss] Best way of showing what matched the search
On Mon, Aug 18, 2008 at 09:15:17PM -0300, Agust?n wrote:> 4. I probably should have used "collapse keys" but I think that they are > essentially filters and don't really achieve what I want (which would be > to logically consider all the terms or some documents as being part of > one virtual bigger document).Then just use that "virtual bigger document" as your "Xapian document". Cheers, Olly