similar to: [Fwd: Re: [Fwd: failure delivery]]

Displaying 20 results from an estimated 400 matches similar to: "[Fwd: Re: [Fwd: failure delivery]]"

2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in Xapian (with some frequently used normalizations) as it will also give me a good hang of implementing a weighting scheme before I start working on implementing DFR schemes. I read the following as references and I think Ive understood it well and can write the hack :- 1.)
2023 May 03
1
manual flushing thresholds for deletes?
Olly Betts <olly at survex.com> wrote: > On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > 10 seems too long. You want the mean word length weighted by frequency > > > of occurrence. For English that's typically around 5 characters, which > > > is 5 bytes. If we go for +1 that's:
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes: > On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote: > > I have a user reporting the following error during recoll indexing: > > > > flush() failed: Db block overwritten - are there multiple writers? > > > > "flush() failed" is from recoll, the rest is, I think the text of the Xapian > > exception.
2013 Mar 11
1
Implementation of the PL2 weighting scheme of the DFR Framework
Hello guys.I am working on implementing the PL2 weighting scheme of the DFR framework by Gianni Amati. It uses the Poisson approximation of the Binomial as the probabilistic model (P), the Laplace law of succession to calculate the after effect of sampling or the risk gain (L) and within document frequency normalization H2(2) (as proposed by Amati in his PHD thesis). The formula for w(t,d) in
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > 10 seems too long. You want the mean word length weighted by frequency > > of occurrence. For English that's typically around 5 characters, which > > is 5 bytes. If we go for +1 that's: > > Actually, 10 may be too short in my case since there's a
2008 May 14
4
GPL PV drivers for Windows - WDM version
I''m been busily converting the xenpci and xenvbd drivers from WDF to WDM to resolve a few issues including potential licensing problems with the Microsoft WDF and to (hopefully) allow them to function as boot drivers when doing install and system recovery. It was a fairly major rewrite of xenpci, and xenvbd, which are now working (booting and running without crashes so far). I
2008 May 14
4
GPL PV drivers for Windows - WDM version
I''m been busily converting the xenpci and xenvbd drivers from WDF to WDM to resolve a few issues including potential licensing problems with the Microsoft WDF and to (hopefully) allow them to function as boot drivers when doing install and system recovery. It was a fairly major rewrite of xenpci, and xenvbd, which are now working (booting and running without crashes so far). I
2023 May 03
1
manual flushing thresholds for deletes?
On Wed, May 03, 2023 at 12:38:15PM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > This will also effectively ignore boolean terms, assuming you're giving > > them wdf of 0 (because $3 here is the collection frequency, which is > > sum(wdf(term)) over all documents). > > Should boolean terms be ignored when estimating flushing >
2008 May 18
11
Release 0.9.0 of GPL PV Drivers for Windows
I''ve just put up the latest release of the GPLPV drivers for Windows. This release involved a fairly big rewrite of the stuff that talks to Windows as I changed from WDF to WDM. WDF is a newer framework from Microsoft which makes it easier to write drivers as a lot of the state management stuff is done for you. It also means shipping a great big dll around with the drivers (note the
2008 May 18
11
Release 0.9.0 of GPL PV Drivers for Windows
I''ve just put up the latest release of the GPLPV drivers for Windows. This release involved a fairly big rewrite of the stuff that talks to Windows as I changed from WDF to WDM. WDF is a newer framework from Microsoft which makes it easier to write drivers as a lot of the state management stuff is done for you. It also means shipping a great big dll around with the drivers (note the
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
On Tue, Mar 11, 2014 at 12:02:15PM +0100, Parth Gupta wrote: > During the indexing with omindex, only you need to make sure is indexing > with prefix 'S' for title as explained here in Letor documentation: > xapian-letor/docs/letor.rst > > Previously when I edited omindex.cc it was modified as can be seen >
2013 Aug 25
2
Backend for Lucene format indexes-How to get doclength
On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote: > I think norm(t, d) in Lucene can used to caculate the number which is > similar to doc length(see norm(t,d) in > http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm). It sounds similar (especially if document and field boosts aren't in use), though some places may rely on
2005 Jul 28
1
conversion from SAS
Hi, I wonder if anybody could help me in converting this easy SAS program into R. (I'm still trying to do that!) PROC IMPORT OUT= WORK.CHLA_italian DATAFILE= "C:\Documents and Settings\carleal\My Documents\REBECCA\stat\sas\All&nutrients.xls" DBMS=EXCEL2000 REPLACE; GETNAMES=YES; RUN; data chla_italian; set chla_italian;
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
On Tue, Mar 11, 2014 at 03:20:31PM +0100, Parth Gupta wrote: > > > > On current trunk, we index the title with prefix "S" by default in > > omindex, though with a wdf inc of 5 rather than 1: > > > > indexer.index_text(title, 5, "S"); > > > > So I don't think you need that change to omindex now. > > Yes, but please
2011 Sep 21
2
Weighted Average on More than One Variable in Data Frame
Dear R Users, I have looked for a solution to the following problem and I have not been able to find it on the archive, through Google or in the R documentation. I have a data frame, say df, which has 4 variables, one of which I would like to use as a grouping variable (g), another one that I would like to use for my weights (w) The other two variables are variables (x1 and x2) for which I would
2007 Jan 24
1
how to properly extend s3 data.frames with s4 classes?
Dear R Programmers! After some time of using R I decided to work through John Chambers book "Programming with Data" to learn what these S4 classes are all about and how they work in R. (I regret not having picked up this rather fine book earlier!) I know from the documentation and the mailing archives that S4 in R is not 100% the book and that there are issues especially with
2012 Jun 11
2
Define a variable on a non-standard year interval (Water Years)
Hello, I am trying to define a different interval for a "year". In hydrology, a "water year" is defined as the period between October 1st and September 30 of the following year. I was wondering how I might do this in R. Say I have a data.frame like the following and I want to extract a variable with the water year specs as defined above:
2007 Mar 21
1
scoring question
Hi All I have just realized that if I set a query like 'green jelly bean' xapian will turn that query into 'green OR jelly OR bean' This causes documents containing just one of the words to be considered a 100% hit. The behavior I would like to see is that each word gives a 33.3% hit, so that a document containing all 3 words gets placed above a document with only 1 or 2
2014 Jan 03
1
Tab formatting in dummy.coef.R
Happy New Year I recognize this is a low priority issue, but... I'll fix it if you let me. There are some TABs where R style calls for 4 spaces. For example R-3.0.2/src/library/stats/R/dummy.coef.R. I never noticed this until today, when I was stranded on a deserted island with only the R source code and a Swiss Army knife (vi). Now I realize my ~.vimrc has tabstop set at 2, and it makes
2013 Mar 08
2
Gsoc-2013
Hi, I am Chinmay Naik, an undergraduate in Computer Science at Bangalore Institute of Technology, Bangalore. I am an experienced programmer and good with C,C++,Python,Java,OpenGL and would love to participate in Gsoc-13. >From the ideas listed, i am interested to work on the project "posting list encoding improvements". I am a newbie to Xapian but would like to get involved and get a