Philip Rhoades
2012-Dec-23 12:50 UTC
[Xapian-discuss] Fwd: Re: Another ue for Recoll/Xapian? - AI/Eliza
People, I sent this note to JF at Recoll and he suggested asking here (his response below) - any suggestions? Thanks, Phil. -------- Original Message -------- Subject: Re: Another ue for Recoll? - AI/Eliza Date: 2012-12-23 19:22 From: jf at dockes.org To: <phil at pricom.com.au> Philip Rhoades writes: > Jean, > > I have been using Recoll happily for some time now but I also have a > need for an AI/Eliza-like facility and I thought Recoll's fast text > searching may allow for fast semantic parsing for such a thing - has > anyone considered this sort of thing before? > > Regards, > > Phil. > > -- > Philip Rhoades Hi, I know nothing about this kind of things. I know that the people from Xapian have been working with different statistical models with students from the Google summer of code this last summer. Semantic models etc. are probably more of their domain of knowledge than of mine. Any code dealing with this (apart from text extraction) will be very close to the Xapian layer in any case. I think that there has been work about improving search engines with semantic models ("concepts") as long as search has existed (much before internet existed). As far as I know, nothing really convincing has ever emerged. Maybe things are more mature now, but it seems that the general tendancy is more towards using sophisticated statistics than explicit semantic models, of which humans are apparently still the exclusive masters. So no real idea, but if you have something more precise in mind, I'm all ready to assist with what Recoll and Xapian knowledge I may have ! Cheers, jf -- Philip Rhoades GPO Box 3411 Sydney NSW 2001 Australia E-mail: phil at pricom.com.au
Olly Betts
2013-Jan-18 03:22 UTC
[Xapian-discuss] Fwd: Re: Another ue for Recoll/Xapian? - AI/Eliza
On Sun, Dec 23, 2012 at 11:50:23PM +1100, Philip Rhoades wrote:> > I have been using Recoll happily for some time now but I also have a > > need for an AI/Eliza-like facility and I thought Recoll's fast text > > searching may allow for fast semantic parsing for such a thing - has > > anyone considered this sort of thing before? > > I know nothing about this kind of things. I know that the people from > Xapian have been working with different statistical models with students > from the Google summer of code this last summer. Semantic models etc. > are probably more of their domain of knowledge than of mine. Any code > dealing with this (apart from text extraction) will be very close to > the Xapian layer in any case. > > I think that there has been work about improving search engines with > semantic models ("concepts") as long as search has existed (much before > internet existed). As far as I know, nothing really convincing has ever > emerged. Maybe things are more mature now, but it seems that the general > tendancy is more towards using sophisticated statistics than explicit > semantic models, of which humans are apparently still the exclusive > masters."Semantic" is one of those words that's been rather abused over the years. I think we've a long way to go before machines can "understand" in the general case, but for some better defined tasks machines can now do a pretty good job - for example, Named Entity Recognition: http://en.wikipedia.org/wiki/Named-entity_recognition The techniques Xapian currently uses are statistical, though if you treat the system as a black box, you might think it "understands" at some level from looking at the output it produces in relation to the input you gave it. The GSoC project jf refers to was probably the one implementing document weighting based on Language Modelling, which is also in tasks like speech recognition and machine translation, though it's in essence a statistical technique. So I'm not really sure what the most useful answer is. I don't think I'd describe anything we do as "semantic", but you can certainly build systems using Xapian that you could apply that term to. There are also other libraries which do part of speech tagging, entity extraction, etc which might be more useful to you (or useful as a source of terms to index with Xapian). Cheers, Olly