thr3ads.net - similar to: "How to make xapian run in hadoop"

Displaying 20 results from an estimated 10000 matches similar to: "How to make xapian run in hadoop"

2019 Nov 22

How to make xapian run in hadoop

On Thu, Nov 21, 2019 at 10:20:19AM +0800, ??? wrote: > We use xapian as the backend of our system. Now the data need be > indexed ever-increasing, and the local mode is hard to maintain, so we > plan to move the index builder to hadoop. We try to make xapian can be > run in hadoop, and now met a problem that there are many seek > operations when xapian writes the index files, but

Release plans

2023 Mar 08

Release plans

Olly Betts <olly at survex.com> wrote: > The current plan for the next release series includes relicensing > the C++ libxapian library in xapian-core as MPL. The remaining > blockers for this are: > > * adding update support to the new honey backend (to replace glass) Just wondering if there's docs on what improvements users can expect from honey. Mainly smaller size?

Release plans

2023 Mar 06

Release plans

The current plan for the next release series includes relicensing the C++ libxapian library in xapian-core as MPL. The remaining blockers for this are: * adding update support to the new honey backend (to replace glass) * adding support for RAM storage to honey (to replace inmemory) * moving some remote client and server code out of libxapian (or replacing it) I'm certainly still aiming

sorting large msets

2018 Mar 30

sorting large msets

Hello, is there a way to optimize sorting by certain values for queries which return a huge amount of results? For example, I just want a simple query that gives me the 200 most recent emails out of millions. The elapsed time for get_mset increases as the number of documents ($n * 2000) increases. I suppose I could store a pre-sorted set using SQLite or similar. Thanks in advance for any

how to build 64bit xapian using MSVC2017?

2018 Mar 20

how to build 64bit xapian using MSVC2017?

On Tue, Mar 20, 2018 at 06:30:07PM +0000, Olly Betts wrote: > https://lists.xapian.org/pipermail/xapian-discuss/2018-January/009585.html Related to this, the appveyor build is currently failing on git master. Unfortunately the change at which is started to fail was the addition of the new "honey" backend, which doesn't narrow things down to a useful degree. I've checked over

Large data sets with R (binding to hadoop available?)

2008 Aug 21

Large data sets with R (binding to hadoop available?)

Dear R community, I find R fantastic and use R whenever I can for my data analytic needs. Certain data sets, however, are so large that other tools seem to be needed to pre-process data such that it can be brought into R for further analysis. Questions I have for the many expert contributors on this list are: 1. How do others handle situations of large data sets (gigabytes, terabytes)

Using R with Hadoop/Hive for Big Data

2009 Jul 31

Using R with Hadoop/Hive for Big Data

Hive <http://hadoop.apache.org/hive/> is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called QL which is based on SQL and which enables users familiar with

Glusterfs-Hadoop

2013 May 20

Glusterfs-Hadoop

Hi, Where can I find glusterfs-hadoop-0.20.2-0.1.x86_64.rpm? The following link is from the Gluster FS Admin Guide, but it doesn't exist: http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/glusterfs-hadoop-0.20.2-0.1.x86_64.rpm Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL:

R + Hadoop on Amazon

2012 Nov 07

R + Hadoop on Amazon

Hello All, Having some issue with local machine, I need to locate myself on Amazon for running R and Hadoop with Amazon instance. After searching a lot, I can't able to take a decision for choosing Image for Amazon instance. Can any one using R + Hadoop on Amazon. Thanks [[alternative HTML version deleted]]

Xapian 1.4.0 released

2016 Jul 06

Xapian 1.4.0 released

I have installed the new Xapian 1.4.0 , during the installation, I haven't seen any problems, however, when I execute commands quest and delve I get different versions, and my Perl-based searches return Exception: Couldn't detect type of database ... and what are these glass things in the index directories? There is a no new version of Perl Search::Xapian. $ quest -version quest -

SVM hadoop

2015 Dec 09

SVM hadoop

Buenos días, alguien sabe si hay alguna manera de implementar una máquina de soporte vectorial (svm) con R-hadoop?? Mi interés es hacer procesamiento big data con svm. Se que en R, existen los paquetes {RtextTools} y {e1071} que permiten hacer svm. Pero no estoy segura de que el algoritmo sea paralelizable, es decir, que pueda correr en paralelo a través de la plataforma R-hadoop. Muchas

Xapian 1.4.5 "Db block overwritten - are there multiple writers?" with Glass

2018 Mar 07

Xapian 1.4.5 "Db block overwritten - are there multiple writers?" with Glass

On Mon, Mar 05, 2018 at 09:48:52PM +0000, Olly Betts wrote: > On Mon, Mar 05, 2018 at 08:52:47PM +0100, Sylvain Taverne wrote: > > I've remarked the error occur when i'm trying to get stored values from a > > database with a lot of stored values. I can reproduce the error with simple > > python2 script i've posted on github > > > >

Xapian 1.4.5 "Db block overwritten - are there multiple writers?" with Glass

2018 Jul 10

Xapian 1.4.5 "Db block overwritten - are there multiple writers?" with Glass

On Mon, Jul 09, 2018 at 10:29:18AM +0100, Olly Betts wrote: > The attached patch reset this cursor each time commit() is called, and > that fixes my C++ reproducer, though I think this ought to work as-is > and the real bug is at a lower level. I've dug deeper and that was indeed the case. Here's a patch which addresses the root cause:

How to enhance the query performance for large boolean attribute

2017 Dec 05

How to enhance the query performance for large boolean attribute

Hi all, I am a new user to Xapian, and now we met such problem. In our case, a document has many attributes which is boolean value, for example( A, B, C ) , and our search query will use certain filter logic ( A == true and B == false ..) to combine with other search logic. We use MatchDecider to implement the filter logic, and now we met some performance problem, because our self-defined

Xapian 1.4.0 released

2016 Jun 25

Xapian 1.4.0 released

I'm delighted to announce the release of 1.4.0. You can download from: http://xapian.org/download This is a major milestone release, but the last development release (1.3.7) was essentially a release candidate so the changes arefairly minor - the only notable change is the update to Unicode 9.0.0. That means a short thank you list for this release - thanks to Andy Chilton! As always, if

SVM hadoop

2015 Dec 10

SVM hadoop

Hola, Puedes poner un RStudio en Amazon, poner "caret" y a correr.... No sé si tendrás suficiente con lo que te pueda ofrecer Amazon para tu problema... creo que sí... ;-).... O directamente hacerlo aquí, que toda esta instalación ya la tienen hecha: http://www.teraproc.com/front-page-posts/r-on-demand/ Gracias, Carlos. El 10 de diciembre de 2015, 14:43, MªLuz Morales <mlzmrls

SVM hadoop

2015 Dec 10

SVM hadoop

Estimados Un día leí algo en el siguiente hipervínculo, pero nunca lo use. http://blog.revolutionanalytics.com/2015/06/using-hadoop-with-r-it-depends.html Javier Rubén Marcuzzi De: Carlos J. Gil Bellosta Enviado: miércoles, 9 de diciembre de 2015 14:33 Para: MªLuz Morales CC: r-help-es Asunto: Re: [R-es] SVM hadoop No, no correrán en paralelo si usas los SVM de paquetes como e1071. No

SVM hadoop

2015 Dec 11

SVM hadoop

Hola Mª Luz, Te cuento un poco mi visión: Lo primero de todo es tener claro qué quiero hacer exactamente en paralelo, se me ocurren 3 escenarios: (1) Aplicar un modelo en este caso SVM sobre unos datos muy grandes y por eso necesito hadoop/spark (2) Realizar muchos modelos SVM sobre datos pequeños (por ejemplo uno por usuario) y por eso necesito hadoop/spark para parelilizar estos procesos

MultiDatabase shard count limitations

2020 Aug 21

MultiDatabase shard count limitations

Going back to the "prioritizing aggregated DBs" thread from February 2020, I've got 390 Xapian shards for 130 public inboxes I want to search against(*). There's more on the horizon (we're expecting tens of thousands of public inboxes). After bumping RLIMIT_NOFILE and running ->add_database a bunch, the actual queries seem to be taking ~30s (not good :x). Now I'm

Running scripts in hadoop

2010 Dec 24

Running scripts in hadoop

R-help group, I'm looking for some assistance on using an R-script to read STDIN from hadoop. Example, say I have two tables. One is a student table, the other is a class roster table (tables join on student_id). Student SAT score is in the student table, whether the student passed or not is in the roster table. So to determine if a student passed or failed based on their SAT score, I'd

similar to: How to make xapian run in hadoop