thr3ads.net - search: "tokenised"

Displaying 20 results from an estimated 31 matches for "tokenised".

Did you mean: tokenises

puppet-cleaner: makes puppet DSL code comply with a subset of the style guide

2013 Apr 16

puppet-cleaner: makes puppet DSL code comply with a subset of the style guide

FWIW, I''ve wrote puppet-cleaner to help me make comply thousands of lines of puppet 2.6 DSL code to puppet 2.7 style guide and expectations. I''m uploading it to github today for anyone to use. https://github.com/santana/puppet-cleaner Externally, you run puppet-clean file.pp and it can transform this: /* multiline comment trailing white space here -> */ class

Positive experiences with Xapian

2011 Aug 02

Positive experiences with Xapian

Hi Guys, I just wanted to take a moment to give some positive feedback regarding my experiences with Xapian recently. I've been doing a fair amount of research into search engines recently, as we have some fairly specific requirements with what we're attempting to do with them. Long story short, after a few weeks of playing around with just about everything under the sun (or at least,

Query Parser, filenames and compound words

2005 Dec 30

Query Parser, filenames and compound words

When I submit a filename to the query parser it breaks it up Example: /home/user/file_name.ext becomes Xapian::Query((home:(pos=1) PHRASE 5 user:(pos=2) PHRASE 5 file:(pos=3) PHRASE 5 name:(pos=4) PHRASE 5 ext:(pos=5))) which does not find the document. If I do an single term query not using the query parser then I find the document. The Query Parser also breaks up hyphenated terms

problem with searching plurals (with apostrophe)

2007 Nov 16

problem with searching plurals (with apostrophe)

hello guys, i am using acts_as_ferret plugin(0.4.1 Latest) with ferret gem(0.11.4 Latest) on rails 1.2.5 and ruby 1.8.6(UBUNTU Gutsy) i have this :Stores Model acts_as_ferret :fields => {:name => { :boost => 2 ,:store => :yes}, :short_desc => { :boost => 1.5,:store => :yes }, :tag_list => {:boost => 1

Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints

2024 Jan 07

Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints

On Thu, Jan 04, 2024 at 05:50:22PM +0100, Robert Stepanek wrote: > Since I am undecided yet if and how to fix this in Xapian I haven't > come up with a pull request. Because trac currently is offline, I > could not file a bug. I hope it's OK to post my analysis here first, > I'll be happy to follow up reporting that bug proper later (should we > conclude that it actually

Starting work on Perf Test Module

2014 May 14

Starting work on Perf Test Module

Hello, I am beginning work on the perf test module. The initial steps that I aim to accomplish are :- -> Download the wikipedia dumps for multiple languages . -> Write python scripts to tokenize the dump (will probably use something like nltk which has powerful inbuilt tokenizers) -> Discuss and finalize the design of the search and query expansion perf tests as I want to complete them

Error with service: "invalid byte sequence in US-ASCII"

2013 Feb 18

Error with service: "invalid byte sequence in US-ASCII"

I just built a new puppet master, and whenever I run puppet on it, it throws an error while processing a service resource: # puppet agent -t > Info: Retrieving plugin > Info: Caching catalog for i-45dc2b1d > Info: Applying configuration version ''g > 9ea47ad19bc706a754c00f00a024309948d3ea03'' > Error: /Stage[main]/Ipa::Client::Basic/Service[sssd]: Could not

[Bug 3474] New: ssh_config can escape double quotes with a backslash

2022 Sep 22

[Bug 3474] New: ssh_config can escape double quotes with a backslash

https://bugzilla.mindrot.org/show_bug.cgi?id=3474 Bug ID: 3474 Summary: ssh_config can escape double quotes with a backslash Product: Portable OpenSSH Version: v9.0p1 Hardware: Other OS: Linux Status: NEW Severity: enhancement Priority: P5 Component: ssh Assignee:

Stopword addition and stemming

2010 Nov 15

Stopword addition and stemming

Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it

IdentityFile patch

2002 Jan 27

IdentityFile patch

...s.h" #include "canohost.h" -#include "buffer.h" #include "bufaux.h" #include "uidswap.h" #include "tildexpand.h" @@ -239,62 +238,6 @@ return 0; } - -/* - * Given a template and a passwd structure, build a filename - * by substituting % tokenised options. Currently, %% becomes '%', - * %h becomes the home directory and %u the username. - * - * This returns a buffer allocated by xmalloc. - */ -char * -expand_filename(const char *filename, struct passwd *pw) -{ - Buffer buffer; - char *file; - const char *cp; - - /* - * Build the fil...

LLVM Backend for a platform with no (normal) stack

2018 Dec 17

LLVM Backend for a platform with no (normal) stack

Not only do FPGAs not support recursion, we don’t even support calls! All user code must be inlined into one kernel/component, which is then used to create HDL for the FPGA. Mark From: Bruce Hoult <brucehoult at sifive.com> Sent: December 17, 2018 9:28 AM To: Mendell, Mark P <mark.p.mendell at intel.com> Cc: jjones at prc-hsv.com; LLVM Developers Mailing List <llvm-dev at

[PATCH] Add user-dependent IdentityFile to OpenSSH-3.0.2p1

2002 Jan 27

[PATCH] Add user-dependent IdentityFile to OpenSSH-3.0.2p1

Pull requests: CJK words and Snippet generator

2016 Sep 19

Pull requests: CJK words and Snippet generator

Olly, sorry for my delayed reply. Am Mo, 12. Sep 2016, um 05:32, schrieb Olly Betts: > On Wed, Sep 07, 2016 at 02:30:16PM +0200, rsto at paranoia.at wrote: > > On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > > > I think my main concerns are about efficiency [...] > > For the proposed term coverage, the implementation looks up and inserts > > terms into a map. That

Ask for advice on exact requirements to fix #699 mixed CJK numbers

2019 Mar 07

Ask for advice on exact requirements to fix #699 mixed CJK numbers

I am working on "#699 Better tokenisation of mixed CJK numbers", and have implemented a partial patch of Chinese for this ticket. Current code works well with special test cases and all tests in xapian-core could still pass. But I'm confused with exact requirements of the question, for how much we could pay with performance on enabling more cases, and if there are better methods to

[Patch] User-dependent IdentityFile

2003 Jan 18

[Patch] User-dependent IdentityFile

...s.h" #include "canohost.h" -#include "buffer.h" #include "bufaux.h" #include "uidswap.h" #include "tildexpand.h" @@ -214,62 +213,6 @@ return 0; } - -/* - * Given a template and a passwd structure, build a filename - * by substituting % tokenised options. Currently, %% becomes '%', - * %h becomes the home directory and %u the username. - * - * This returns a buffer allocated by xmalloc. - */ -char * -expand_filename(const char *filename, struct passwd *pw) -{ - Buffer buffer; - char *file; - const char *cp; - - /* - * Build the fil...

Pull requests: CJK words and Snippet generator

2016 Sep 07

Pull requests: CJK words and Snippet generator

On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > I think my main concerns are about efficiency (since that a major > motivation for the current implementation, so slowing it down would be > annoying), and whether we can just make this the standard behaviour > rather than adding an option. The current implementation is O(n) and I took care to keep it at that. For the proposed term

[klibc:update-dash] parser: Fix backquote support in here-document EOF mark

2019 Jan 25

[klibc:update-dash] parser: Fix backquote support in here-document EOF mark

Commit-ID: 5048195d282d48b25a9a0164e60cd0e6708ec8a9 Gitweb: http://git.kernel.org/?p=libs/klibc/klibc.git;a=commit;h=5048195d282d48b25a9a0164e60cd0e6708ec8a9 Author: Herbert Xu <herbert at gondor.apana.org.au> AuthorDate: Thu, 15 Mar 2018 18:27:30 +0800 Committer: Ben Hutchings <ben at decadent.org.uk> CommitDate: Fri, 25 Jan 2019 02:57:21 +0000 [klibc] parser: Fix backquote

[klibc:update-dash] dash: parser: Fix backquote support in here-document EOF mark

2020 Mar 28

[klibc:update-dash] dash: parser: Fix backquote support in here-document EOF mark

Commit-ID: e90b159a00304664ddc94fca392146f4bde1bcec Gitweb: http://git.kernel.org/?p=libs/klibc/klibc.git;a=commit;h=e90b159a00304664ddc94fca392146f4bde1bcec Author: Herbert Xu <herbert at gondor.apana.org.au> AuthorDate: Thu, 15 Mar 2018 18:27:30 +0800 Committer: Ben Hutchings <ben at decadent.org.uk> CommitDate: Sat, 28 Mar 2020 21:42:54 +0000 [klibc] dash: parser: Fix

Query parser and stemming of norwegian letters

2005 Jun 09

Query parser and stemming of norwegian letters

Hello, can I get an explanation of the following. Running the following code: .... pqp=new QueryParser(); Stem stem("norwegian"); cout << "DEBUG " << stem.stem_word(_sXapian)<< endl; pqp->set_stemmer(stem); pqp->set_database(*_pdatabase); pqp->set_default_op(Query::OP_AND); //Set the

Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints

2024 Jan 08

Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints

On Sun, Jan 7, 2024, at 7:45 PM, Olly Betts wrote: > I've restarted trac. I now created a pull request: https://github.com/xapian/xapian/pull/329 Should I create a trac issue, too? > Assuming the latter is valid, just removing this block (or removing the > parts of it which are Lu or Ll) should fix the problem as then > tokenisation will switch mode - I tried this and it fixes

search for: tokenised