similar to: Get a list of all terms in an indexed corpus

Displaying 20 results from an estimated 400 matches similar to: "Get a list of all terms in an indexed corpus"

2011 Jun 10
1
Where do I stick the PARTIAL flag in xappy?
I want to be able to do searches with FLAG_PARTIAL and some without. Most searches without but with the PARTIAL for an autocomplete widget. I'm using xappy and I can't find where to send the flag when I build up the query. The docs talk of setting up the database with or without FLAG_PARTIAL but that's probably not what I want. Peter
2007 Oct 10
2
Something to think about
I'm planning to add multiple-database support for searches to my "Xappy" python wrapper (more on this wrapper later, but for now, see http://code.google.com/p/xappy for details). This is reasonably straightforward, because Xapian supports this nicely: except that "Xappy" generates a "fieldname->prefix" mapping automatically. The prefix which corresponds
2011 Jan 17
2
DatabaseCorruptError
Hi there, My web app uses Xapian via the PHP bindings. I'm getting this error thrown occasionally when atempting to instantiate a XapianDatabase object for searching. DatabaseCorruptError: Expected block 107 to be level 1, not 0 Here's the line that invokes it: $database = new XapianDatabase(PROJROOT.'/data/xapian/posts'); And my version is xapian-core 1.2.3 with matchspy.
2018 Jul 21
1
Xapian configuration
Hello, I want a tutorial on how to configure xapian on a moinmoin wiki. I install my moinmoin wiki on a virtual env but I m having lots of problem to configure xapian Thanks
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a
2008 Mar 12
1
how can i use stopwords?
Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
2009 Nov 11
2
[python indexer] add meta informations
Hello, I'm trying to index some blog stuff through python bindings. I'd like to know how to add some informations (url, title, date, and so on) so that I can reach them through a xapian.Enquire object.. I believe it's something to be set in xapian.TermGenerator(), but... I can't manage to find which function. I'm waiting for something like : xtermgen.add_meta('url',
2015 Mar 14
2
range query for terms
first, thank you,xapian! then I'd like to ask if it is possible to do a range query on terms(like the range query on values), or if it is just a wildcard(right truncation) match. the case is searching ip address bettween ?10.10.0.0? and ?10.10.255.255? the user want : 1. query "10.10.10.10" < ip < "10.10.10.12" gives "10.10.10.11" 2. query
2013 Oct 30
2
Lucene 3.6.2 backend for xapian (#25)
[Replying to xapian-devel, as I think a wider audience would be useful] On Mon, Oct 21, 2013 at 11:24:51PM +0800, jiangwen jiang wrote: > yes, it's less efficient. Lucene database has multiple segments, each > segment can treat as a independent database. The same term may exists in >= > 1 segments. Sorry for taking a while to respond - I've been both busy and mulling this
2011 Apr 18
0
Help with cleaning a corpus
Hi! I created a corpus and I started to clean through this piece of code: txt <-tm_map(txt,removeWords, stopwords("spanish")) txt <-tm_map(txt,stripWhitespace) txt <-tm_map(txt,tolower) txt <-tm_map(txt,removeNumbers) txt <-tm_map(txt,removePunctuation) But something happpended: some of the documents in the corpus became empty, this is a problem when i try to make a
2010 Nov 15
4
Stopword addition and stemming
Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it
2007 Feb 09
1
Fetching document content by Q term in Python
Hello, I'd like to be able to retrieve the indexes stored copy of the document text and tried the following: terms = self.db.allterms() terms.skip_to('Q' + uri.encode('utf-8')) term = terms.next() doc = self.db.get_document(term[1]) print doc.get_data() I just wildly guessed that [1] was the docid, but of course it isn't. So the question is, how do I
2011 Aug 09
3
what is the fastest way to fetch results which are sorted by timestamp ?
what is the fastest way to fetch results which are sorted by timestamp ? i want to use xapian as my search engine , use add_boolean_term(something) and add_value(0,sortable_serialise(get_timestamp())) to a doc. search through enquire.set_weighting_scheme(xapian.BoolWeight()) and enquire.set_sort_by_value(0,True) to ensure that the results are sorted by the timestamp. This method is ok , but
2009 Nov 12
1
How can this code be improved?
I am running the following code on a MacBook Pro 17" Unibody early 2009 with 8GB RAM, OS X 10.5.8, R 2.10.0 Patch from Nov. 2, 2009, in 64-bit mode. freq.stopwords <- numeric(0) freq.nonstopwords <- numeric(0) token.tables <- list(0) i.ss <- c(0) cat("Beginning at ", date(), ".\n") for (i.d in 1:length(tokens)) { tt <- list(0) for (i.s in
2009 Jan 02
1
Per-namespace proxying?
Hello, Searching in the archives, I saw the following posting from Timo Sirainen: [Dovecot] Roadmap to future (06 Dec 2007): [...] > Proxying > -------- > > - These could be implemented to v1.2. - Log in normally (no proxying) > if destination IP is the server itself. > > - Support for per-namespace proxying: > > namespace public { > prefix = Public/ >
2012 Dec 20
1
Rsync when using --whole-file
I have a question about what happens at the code level when I use --whole-file. I know that it turns off the rolling checksum. I also understand that it only checks the file's mtime and size to identify whether there should be some transfer. Two questions: 1) Could anyone give me a pointer to the correct file so that I can read what happens when --whole-file is used? 2) When using
2010 Oct 21
0
compile xapian-extras
Hi, The xapian-extras supports image similarity.( http://xapian.wordpress.com/2009/03/11/xappy-now-supports-image-similarity-searching/ ) I complie xapian-extras and xapian-extras-bindings with python. import xapian import xapian.imgseek doc = xapian.Document() imgsig = xapian.imgseek.ImgSig.register_Image(JPEG_PATH) imgterms = xapian.imgseek.ImgTerms('A', 300)
2011 Jul 20
1
Phrase search problem
Hi, I'm experiencing problems when doing phrase searches with adjacent repeated terms. Example: if I search for 'curtain curtain' and there are documents that matches the query, they aren't returned. But, if I search for 'curtain nice curtain' and there are documents that matches this query, it works ok. attached there is a python program that shows the problem. I tried
2017 Jul 18
1
Help-Multi class classification for large datasets
Hai all, We are working on Multi-class Classification. Currently up to 1.1 million records Ranger package in R is able to handle. Training time on 128 GB RAM is 12 days, which is not a practically feasible method to proceed further. In future we will have dataset of dimension 10 million records, we are in search for a package or framework which can handle 10 million records with at least 12000
2007 May 09
3
bug when assigning new analyzer?
require ''rubygems'' require ''ferret'' include Ferret PATH = ''/tmp/ferret_stopwords_test'' index = Index::IndexWriter.new(:path => PATH, :create => true) index.analyzer = Analysis::StandardAnalyzer.new([]) index << {:title => ''a few good men'', :language => ''en''} index.analyzer =