search for: tokenises

Displaying 20 results from an estimated 31 matches for "tokenises".

2013 Apr 16
7
puppet-cleaner: makes puppet DSL code comply with a subset of the style guide
FWIW, I''ve wrote puppet-cleaner to help me make comply thousands of lines of puppet 2.6 DSL code to puppet 2.7 style guide and expectations. I''m uploading it to github today for anyone to use. https://github.com/santana/puppet-cleaner Externally, you run puppet-clean file.pp and it can transform this: /* multiline comment trailing white space here -> */ class
2011 Aug 02
2
Positive experiences with Xapian
Hi Guys, I just wanted to take a moment to give some positive feedback regarding my experiences with Xapian recently. I've been doing a fair amount of research into search engines recently, as we have some fairly specific requirements with what we're attempting to do with them. Long story short, after a few weeks of playing around with just about everything under the sun (or at least,
2005 Dec 30
1
Query Parser, filenames and compound words
When I submit a filename to the query parser it breaks it up Example: /home/user/file_name.ext becomes Xapian::Query((home:(pos=1) PHRASE 5 user:(pos=2) PHRASE 5 file:(pos=3) PHRASE 5 name:(pos=4) PHRASE 5 ext:(pos=5))) which does not find the document. If I do an single term query not using the query parser then I find the document. The Query Parser also breaks up hyphenated terms
2007 Nov 16
1
problem with searching plurals (with apostrophe)
hello guys, i am using acts_as_ferret plugin(0.4.1 Latest) with ferret gem(0.11.4 Latest) on rails 1.2.5 and ruby 1.8.6(UBUNTU Gutsy) i have this :Stores Model acts_as_ferret :fields => {:name => { :boost => 2 ,:store => :yes}, :short_desc => { :boost => 1.5,:store => :yes }, :tag_list => {:boost => 1
2024 Jan 07
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Thu, Jan 04, 2024 at 05:50:22PM +0100, Robert Stepanek wrote: > Since I am undecided yet if and how to fix this in Xapian I haven't > come up with a pull request. Because trac currently is offline, I > could not file a bug. I hope it's OK to post my analysis here first, > I'll be happy to follow up reporting that bug proper later (should we > conclude that it actually
2014 May 14
2
Starting work on Perf Test Module
Hello, I am beginning work on the perf test module. The initial steps that I aim to accomplish are :- -> Download the wikipedia dumps for multiple languages . -> Write python scripts to tokenize the dump (will probably use something like nltk which has powerful inbuilt tokenizers) -> Discuss and finalize the design of the search and query expansion perf tests as I want to complete them
2013 Feb 18
8
Error with service: "invalid byte sequence in US-ASCII"
I just built a new puppet master, and whenever I run puppet on it, it throws an error while processing a service resource: # puppet agent -t > Info: Retrieving plugin > Info: Caching catalog for i-45dc2b1d > Info: Applying configuration version ''g > 9ea47ad19bc706a754c00f00a024309948d3ea03'' > Error: /Stage[main]/Ipa::Client::Basic/Service[sssd]: Could not
2022 Sep 22
7
[Bug 3474] New: ssh_config can escape double quotes with a backslash
https://bugzilla.mindrot.org/show_bug.cgi?id=3474 Bug ID: 3474 Summary: ssh_config can escape double quotes with a backslash Product: Portable OpenSSH Version: v9.0p1 Hardware: Other OS: Linux Status: NEW Severity: enhancement Priority: P5 Component: ssh Assignee:
2010 Nov 15
4
Stopword addition and stemming
Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it
2002 Jan 27
0
IdentityFile patch
By the way, I noticed in the previous IdentityFile patch I forgot to expand tilde. I fixed this by making the change in ssh.c instead of readconf.c, which is probably where it belongs, as far as the existing code is concerned: diff -ur openssh-3.0.2p1/auth.c openssh-3.0.2p1I/auth.c --- openssh-3.0.2p1/auth.c Sun Nov 11 17:06:07 2001 +++ openssh-3.0.2p1I/auth.c Sun Jan 27 12:05:14 2002 @@ -44,7
2018 Dec 17
2
LLVM Backend for a platform with no (normal) stack
Not only do FPGAs not support recursion, we don’t even support calls! All user code must be inlined into one kernel/component, which is then used to create HDL for the FPGA. Mark From: Bruce Hoult <brucehoult at sifive.com> Sent: December 17, 2018 9:28 AM To: Mendell, Mark P <mark.p.mendell at intel.com> Cc: jjones at prc-hsv.com; LLVM Developers Mailing List <llvm-dev at
2002 Jan 27
1
[PATCH] Add user-dependent IdentityFile to OpenSSH-3.0.2p1
Here is a patch to allow private key files to be placed system wide (for all users) in a secure (non-NFS) mounted location on systems where home directories are NFS mounted. This is especially important for users who use blank passphrases rather than ssh-agent (a good example of where this is necessary is for tunnelling lpd through ssh on systems that run lpd as user lp). IdentityFile now accepts
2016 Sep 19
2
Pull requests: CJK words and Snippet generator
Olly, sorry for my delayed reply. Am Mo, 12. Sep 2016, um 05:32, schrieb Olly Betts: > On Wed, Sep 07, 2016 at 02:30:16PM +0200, rsto at paranoia.at wrote: > > On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > > > I think my main concerns are about efficiency [...] > > For the proposed term coverage, the implementation looks up and inserts > > terms into a map. That
2019 Mar 07
3
Ask for advice on exact requirements to fix #699 mixed CJK numbers
I am working on "#699 Better tokenisation of mixed CJK numbers", and have implemented a partial patch of Chinese for this ticket. Current code works well with special test cases and all tests in xapian-core could still pass. But I'm confused with exact requirements of the question, for how much we could pay with performance on enabling more cases, and if there are better methods to
2003 Jan 18
0
[Patch] User-dependent IdentityFile
Here is the user-dependent IdentityFile patch for openssh3.5 (BSD version), which allows private key files to be placed system wide (for all users) in a secure (non-NFS) mounted location. This addresses an important security hole on systems where home directories are NFS mounted, particularly if there are users who use blank passphrases (or when lpd is tunneled through ssh on systems running lpd
2016 Sep 07
2
Pull requests: CJK words and Snippet generator
On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > I think my main concerns are about efficiency (since that a major > motivation for the current implementation, so slowing it down would be > annoying), and whether we can just make this the standard behaviour > rather than adding an option. The current implementation is O(n) and I took care to keep it at that. For the proposed term
2019 Jan 25
0
[klibc:update-dash] parser: Fix backquote support in here-document EOF mark
Commit-ID: 5048195d282d48b25a9a0164e60cd0e6708ec8a9 Gitweb: http://git.kernel.org/?p=libs/klibc/klibc.git;a=commit;h=5048195d282d48b25a9a0164e60cd0e6708ec8a9 Author: Herbert Xu <herbert at gondor.apana.org.au> AuthorDate: Thu, 15 Mar 2018 18:27:30 +0800 Committer: Ben Hutchings <ben at decadent.org.uk> CommitDate: Fri, 25 Jan 2019 02:57:21 +0000 [klibc] parser: Fix backquote
2020 Mar 28
0
[klibc:update-dash] dash: parser: Fix backquote support in here-document EOF mark
Commit-ID: e90b159a00304664ddc94fca392146f4bde1bcec Gitweb: http://git.kernel.org/?p=libs/klibc/klibc.git;a=commit;h=e90b159a00304664ddc94fca392146f4bde1bcec Author: Herbert Xu <herbert at gondor.apana.org.au> AuthorDate: Thu, 15 Mar 2018 18:27:30 +0800 Committer: Ben Hutchings <ben at decadent.org.uk> CommitDate: Sat, 28 Mar 2020 21:42:54 +0000 [klibc] dash: parser: Fix
2005 Jun 09
1
Query parser and stemming of norwegian letters
Hello, can I get an explanation of the following. Running the following code: .... pqp=new QueryParser(); Stem stem("norwegian"); cout << "DEBUG " << stem.stem_word(_sXapian)<< endl; pqp->set_stemmer(stem); pqp->set_database(*_pdatabase); pqp->set_default_op(Query::OP_AND); //Set the
2024 Jan 08
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Sun, Jan 7, 2024, at 7:45 PM, Olly Betts wrote: > I've restarted trac. I now created a pull request: https://github.com/xapian/xapian/pull/329 Should I create a trac issue, too? > Assuming the latter is valid, just removing this block (or removing the > parts of it which are Lu or Ll) should fix the problem as then > tokenisation will switch mode - I tried this and it fixes