thr3ads.net - similar to: "query time stemming and term weights"

Displaying 20 results from an estimated 1000 matches similar to: "query time stemming and term weights"

2016 Mar 10

Introduction and Doubts

Tf-idf is most used used weighting scheme is easy to understand and has been used in other frameworks like lucene and many other places. okapi bm25(implemented in xapian) is theoretically better/improved measure than tf-idf and i am looking into various other weighting scheme which are there in xapian or can be implemented like TF-ICF(term frequecy inverse corpus frequency),TF-RF(term

choosing between probabilistic and boolean prefixes for terms

2018 Jul 19

choosing between probabilistic and boolean prefixes for terms

Hi all, public-inbox allows searching for git blob names (e.g. "badc0ffee") in patches. Initially, I chose to use add_prefix for probabilistic terms, since I assumed it could be a superset of what boolean searching offered. Unfortunately, it doesn't seem to be the case because stemming is interfering. So switching to boolean filtering seems to work; and it is fine for mechanical

Search Algorithm Used for Keyword Search

2017 Apr 08

Search Algorithm Used for Keyword Search

Dear Sir, I'm doing a literature survey on search engines. As Xapian is open source, I think I can get the information required by me. I assume that your system builds a list of keywords and tags to every keyword the documents where it can be found. My questions are as follows: 1. What is the search algorithm used for searching the list of keywords that your search engine has?. Is it the

GSOC : Language Modelling for information retrieval with Diversified Search results

2012 Mar 22

GSOC : Language Modelling for information retrieval with Diversified Search results

Hello, I am a undergraduate student at DA-IICT,India pursuing Btech in Information and Communication Technology.Major field of my Research is Information Retrieval and Natural Language processing. xapain being an powerful Information retrieval library have attracted me towards implementing stuff learned in class for this project.I have worked on entity search on RDF data,SMS based FAQ

Weighting the author of a doc when that term can also appear as a frequent term in other docs

2017 Sep 28

Weighting the author of a doc when that term can also appear as a frequent term in other docs

We have a corpus of academic papers. Sometimes it happens that there is an academic controversy and one paper is a response or rebuttal to another paper. The name of the author of the first paper may appear many times in the second paper. So in light of this, how should we set our weight on the author field? Here is an example: http://www.nber.org/papers/w11215 in which the term

Implementing tf-idf weighting scheme in Xapian

2013 Feb 19

Implementing tf-idf weighting scheme in Xapian

Hello guys.I just read up about tf-idf schemes and want to implement it in Xapian (with some frequently used normalizations) as it will also give me a good hang of implementing a weighting scheme before I start working on implementing DFR schemes. I read the following as references and I think Ive understood it well and can write the hack :- 1.)

Participation in GSOC

2011 Mar 29

Participation in GSOC

Hi, I'm Michael, I would like to participate in this year's Google Summer of Code, and I picked Xapian as the project to code for. Before writing a full proposal, I want to get in contact with the community, as well as introducing myself and discuss my ideas for the contribution to Xapian. First of all I'd like to talk about my motivation. I'm currently working on a webapp

Participation in GSOC

2011 Mar 29

Participation in GSOC

Complete GSOC idea

2014 Mar 01

Complete GSOC idea

Hi everyone, I am thinking of working on the following ideas for my GSOC proposal based on my discussions with Olly and my own understanding. Rather than focusing on an entire perftest module, I have decided to focus on implementing performance tests for weighting schemes based on a wikipedia dump and in addition to that, build a framework to measure the

How to enable stemming with default_op set to OP_NEAR

2011 Dec 14

How to enable stemming with default_op set to OP_NEAR

Hi All, I know that from version 1.2.6, if default_op is OP_NEAR or OP_PHRASE then stemming of the terms is disabled, since positional information isn't indexed for stemmed terms by default. However, I would like to try using OP_NEAR as default_op with stemming because I think the near operator is somehow different from exact phrase. Then I wanna see how the search results looks with this

Learning to rank

2012 Mar 24

Learning to rank

Dear Sir, I am Pankaj Singhal from Jaipur, India. I am very much interested and strongly looking forward in getting involved in this project Learning-to-Rank. My previous experience in this field is good. Last semester I did a similar job of ranking the URLs of the given huge dataset based on their attribute values. The dataset consisted hundreds of thousands of URLs and each url

Stemming, stop words, acts_as_ferret

2006 Nov 13

Stemming, stop words, acts_as_ferret

I''d like to get the following behavior: 1. Stemming. The search is on a database of summaries of California legal cases. Things like a search for "thermal image" needs to hit "thermal imaging." 2. Stop words. Searches for "failing to instruct the jury" should come up with hits on a search for "fail to instruct." 3. Case-insensitive. What I

Get term from document by position

2015 Jul 26

Get term from document by position

> Snippet highlighting is something that was worked on for a GSoC project a > few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>. > It?s not available in the 1.2 series, but as I understand it should work out of the > box in 1.3.3. I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too

Stemming problem

2007 Jul 04

Stemming problem

Does anyone know if xapian stemming support suffix -er? I tried -s and -ing both work, but not -er. _________________________________________________________________ ?????????????? MSN Messenger: http://messenger.msn.com/cn

Stopword addition and stemming

2010 Nov 15

Stopword addition and stemming

Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it

Does OP_NEAR works with stemming?

2011 May 27

Does OP_NEAR works with stemming?

Hi All, I used the OP_NEAR operator for queryparser, and when I searched for "apple store" from my own collection, the query is parsed as "Zappl:(pos=1) NEAR 11 Zstore:(pos=2)" but retrieved nothing. However, if I type in "Apple Store", the query is parsed as Xapian::Query((apple:(pos=1) NEAR 11 store:(pos=2))) and some results are showed. I'm not sure whether

Blacklist stemming

2009 Jun 05

Blacklist stemming

Hi, I need to modify the stemming for a couple of words (a blacklist) and for all the other to use the usual snowball stemmer. The "natural" way of doing it would be to derive from Stem and override operator ()... but I am using *python-bindings*. Would this be possible? If not I have two other solutions in mind: - add a custom stemmer to Xapian - write custom index & search

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in

Help: stemming and stem completion with package tm in R

2011 Nov 04

Help: stemming and stem completion with package tm in R

Hi All I came across a problem below when doing stemming and stem completion with package tm in R. Word "mining" was stemmed to "mine" with stemDocument(), and then completed to "miners"with stemCompletion(). However, I prefer to keep "mining" intact. For stemCompletion(), the default type of completion is "prevalent", which takes the most

Czech stemming

2010 Jul 13

Czech stemming

Hello, I just find Xapian project when looking for some indexing engine in Ruby and was quite impressed. Is there any change for Czech stemming? I found that it is already written in Java as part of Lucene here: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/common/src/ja va/org/apache/lucene/analysis/cz/CzechStemmer.java?view=markup Sadly, I have no experience with C++, but

similar to: query time stemming and term weights