Displaying 20 results from an estimated 1000 matches similar to: "query time stemming and term weights"
2016 Mar 10
2
Introduction and Doubts
Tf-idf is most used used weighting scheme is easy to understand and has
been used in other frameworks like lucene and many other places.
okapi bm25(implemented in xapian) is theoretically better/improved measure
than tf-idf and
i am looking into various other weighting scheme which are there in xapian
or can be implemented like TF-ICF(term frequecy inverse corpus
frequency),TF-RF(term
2018 Jul 19
1
choosing between probabilistic and boolean prefixes for terms
Hi all,
public-inbox allows searching for git blob names (e.g. "badc0ffee")
in patches. Initially, I chose to use add_prefix for probabilistic
terms, since I assumed it could be a superset of what boolean
searching offered. Unfortunately, it doesn't seem to be the case
because stemming is interfering.
So switching to boolean filtering seems to work; and it is
fine for mechanical
2017 Apr 08
2
Search Algorithm Used for Keyword Search
Dear Sir,
I'm doing a literature survey on search engines. As Xapian is open source, I think I can get the information required by me.
I assume that your system builds a list of keywords and tags to every keyword the documents where it can be found. My questions are as follows:
1. What is the search algorithm used for searching the list of keywords that your search engine has?. Is it the
2012 Mar 22
1
GSOC : Language Modelling for information retrieval with Diversified Search results
Hello,
I am a undergraduate student at DA-IICT,India pursuing Btech in
Information and Communication Technology.Major field of my Research is
Information Retrieval and Natural Language processing. xapain being an
powerful Information retrieval library have attracted me towards
implementing stuff learned in class for this project.I have worked on
entity search on RDF data,SMS based FAQ
2017 Sep 28
1
Weighting the author of a doc when that term can also appear as a frequent term in other docs
We have a corpus of academic papers. Sometimes it happens that there is
an academic controversy and one paper is a response or rebuttal to
another paper. The name of the author of the first paper may appear many
times in the second paper. So in light of this, how should we set our
weight on the author field?
Here is an example:
http://www.nber.org/papers/w11215
in which the term
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in
Xapian (with some frequently used normalizations) as it will also give me a
good hang of implementing a weighting scheme before I start working on
implementing DFR schemes.
I read the following as references and I think Ive understood it well and
can write the hack :-
1.)
2011 Mar 29
2
Participation in GSOC
Hi,
I'm Michael, I would like to participate in this year's Google Summer of
Code, and I picked Xapian as the project to code for.
Before writing a full proposal, I want to get in contact with the
community, as well as introducing myself and discuss my ideas for the
contribution to Xapian.
First of all I'd like to talk about my motivation.
I'm currently working on a webapp
2011 Mar 29
2
Participation in GSOC
Hi,
I'm Michael, I would like to participate in this year's Google Summer of
Code, and I picked Xapian as the project to code for.
Before writing a full proposal, I want to get in contact with the
community, as well as introducing myself and discuss my ideas for the
contribution to Xapian.
First of all I'd like to talk about my motivation.
I'm currently working on a webapp
2014 Mar 01
2
Complete GSOC idea
Hi everyone,
I am thinking of working on the
following ideas for my GSOC proposal based on my discussions with Olly and
my own understanding. Rather than focusing on an entire perftest module, I
have decided to focus on implementing performance tests for weighting
schemes based on a wikipedia dump and in addition to that, build a
framework to measure the
2011 Dec 14
1
How to enable stemming with default_op set to OP_NEAR
Hi All,
I know that from version 1.2.6, if default_op is OP_NEAR or OP_PHRASE then stemming of the terms is disabled, since positional information isn't indexed for stemmed terms by default. However, I would like to try using OP_NEAR as default_op with stemming because I think the near operator is somehow different from exact phrase. Then I wanna see how the search results looks with this
2012 Mar 24
3
Learning to rank
Dear Sir,
I am Pankaj Singhal from Jaipur, India. I am very much
interested and strongly looking forward in getting involved in this project
Learning-to-Rank.
My previous experience in this field is good. Last semester I did a similar
job of ranking the URLs of the given huge dataset based on their attribute
values. The dataset consisted hundreds of thousands of URLs and each url
2006 Nov 13
1
Stemming, stop words, acts_as_ferret
I''d like to get the following behavior:
1. Stemming. The search is on a database of summaries of California legal
cases. Things like a search for "thermal image" needs to hit "thermal
imaging."
2. Stop words. Searches for "failing to instruct the jury" should come up
with hits on a search for "fail to instruct."
3. Case-insensitive.
What I
2015 Jul 26
1
Get term from document by position
> Snippet highlighting is something that was worked on for a GSoC project a
> few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>.
> It?s not available in the 1.2 series, but as I understand it should work out of the
> box in 1.3.3.
I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too
2007 Jul 04
3
Stemming problem
Does anyone know if xapian stemming support suffix -er? I tried -s and -ing
both work, but not -er.
_________________________________________________________________
?????????????? MSN Messenger: http://messenger.msn.com/cn
2010 Nov 15
4
Stopword addition and stemming
Hi,
Two questions which I'm unsure about:
Stemming: I've turned on stemming, etc, but how can I confirm that
it's being used in searches? What should I look/search for?
Stopwords: I'm trying out xapian on a regional dataset (searching
data from a *.co.us TLD, eg) . I've noticed that searching for [bob
co.us] results in *very* slow search times (tens of seconds), since it
2011 May 27
1
Does OP_NEAR works with stemming?
Hi All,
I used the OP_NEAR operator for queryparser, and when I searched for "apple store" from my own collection, the query is parsed as "Zappl:(pos=1) NEAR 11 Zstore:(pos=2)" but retrieved nothing. However, if I type in "Apple Store", the query is parsed as Xapian::Query((apple:(pos=1) NEAR 11 store:(pos=2))) and some results are showed. I'm not sure whether
2009 Jun 05
2
Blacklist stemming
Hi,
I need to modify the stemming for a couple of words (a blacklist) and for
all the other to use the usual snowball stemmer.
The "natural" way of doing it would be to derive from Stem and override
operator ()... but I am using *python-bindings*. Would this be possible?
If not I have two other solutions in mind:
- add a custom stemmer to Xapian
- write custom index & search
2012 Jan 13
4
Troubles with stemming (tm + Snowball packages) under MacOS
Dear all,
I have some troubles using the stemming algorithm provided by the tm
(text mining) + Snowball packages.
Here is my config:
MacOS 10.5
R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions)
I have installed all the needed packages (tm, rJava, rWeka, Snowball)
+ dependencies. I have desactivated AWT (like written in
2011 Nov 04
1
Help: stemming and stem completion with package tm in R
Hi All
I came across a problem below when doing stemming and stem completion
with package tm in R. Word "mining" was stemmed to "mine" with
stemDocument(), and then completed to "miners"with stemCompletion().
However, I prefer to keep "mining" intact.
For stemCompletion(), the default type of completion is "prevalent",
which takes the most
2010 Jul 13
1
Czech stemming
Hello,
I just find Xapian project when looking for some indexing engine in Ruby and
was quite impressed. Is there any change for Czech stemming? I found that it
is already written in Java as part of Lucene here:
http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/common/src/ja
va/org/apache/lucene/analysis/cz/CzechStemmer.java?view=markup
Sadly, I have no experience with C++, but